You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ramprakash Ramamoorthy <yo...@gmail.com> on 2013/03/21 11:30:43 UTC

WildCardTermEnum in Lucene 4.1

Team,

       We are in the process of migrating our codebase from lucene
2.3(Yeah, its way older) to lucene 4.1. We had previously used
WildCardTermEnum
<http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/all/org/apache/lucene/search/WildcardTermEnum.html>
in
our code base.

       I don't find this in 4.1 and did some googling, but in vein. May be
some one can help with the equivalent of this WildCardTermEnum in 4.1?
Thanks in advance.

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420

Re: WildCardTermEnum in Lucene 4.1

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.

On Thu, Mar 21, 2013 at 4:17 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> You can do the following:
>
> Automaton automaton = WildcardQuery.toAutomaton(wildcardTerm); // this
> transforms the wildcard syntax with ? and * to a state machine
> CompiledAutomaton compiled = new CompiledAutomaton(automaton); // copiles
> the state machine
> TermsEnum enum = compiled.getTermsEnum(terms); // "terms" can be retrieved
> from AtomicReader
>
> The old WildcardTermsEnum no longer exists, because you can create a
> finite state meachine from every wildcard or regexp. The automaton support
> is part of Lucene's term dictionary, so the above code uses the index
> reader to get the filtered terms.
>
> The above code was copied from WildCardQuery and its superclass
> AutomatonQuery.
>
> Uwe
>

Thank you Uwe. It worked!

>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> > Sent: Thursday, March 21, 2013 11:31 AM
> > To: java-user@lucene.apache.org
> > Subject: WildCardTermEnum in Lucene 4.1
> >
> > Team,
> >
> >        We are in the process of migrating our codebase from lucene
> 2.3(Yeah, its
> > way older) to lucene 4.1. We had previously used WildCardTermEnum
> > <http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/all
> > /org/apache/lucene/search/WildcardTermEnum.html>
> > in
> > our code base.
> >
> >        I don't find this in 4.1 and did some googling, but in vein. May
> be some
> > one can help with the equivalent of this WildCardTermEnum in 4.1?
> > Thanks in advance.
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > India,
> > +91 9626975420
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420

Re: Multi-value fields in Lucene 4.1

Posted by Michael McCandless <lu...@mikemccandless.com>.

You might be able to get close if you use PostingsHighlighter: it
tells you the offset of each matched Passage, and you can correlate
that to which field value (assuming you stored the multi-valued
fields).

You must index offsets into your postings.

But there are caveats ... if you use positional queries,
PostingsHighlighter will find highlights that didn't necessarily match
the query ... and if you use MultiTermQueries (author:Be*) you have to
pre-rewrite this otherwise PH won't highlight the terms ...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Mar 22, 2013 at 5:57 AM, Chris Bamford
<ch...@talktalk.net> wrote:
> Hi,
>
> If I index several similar values in a multivalued field (e.g. many authors to one book), is there any way to know which of these matched during a query?
> e.g.
>
>   Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda Bootstrap"
>
> If we queried for +(author:Be*) and matched this document, is there a way of drilling down and identifying the specific sub-field that actually triggered the match ("Belinda Bootstrap") ?  I was wondering what the lowest granularity of matching actually is - document / field / sub-field ...
>
> I am happy to index with term vectors and positions if it helps.
>
> Thanks,
>
> - Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Multi-value fields in Lucene 4.1

Posted by Jack Krupansky <ja...@basetechnology.com>.

I don't think there is a way of identifying which of the values of a 
multivalued field matched. But... I haven't checked the code to be 
absolutely certain whether their isn't some expert way.

Also, realize that multiple values could match, such as if you queried for 
"B*".

-- Jack Krupansky

-----Original Message----- 
From: Chris Bamford
Sent: Friday, March 22, 2013 5:57 AM
To: java-user@lucene.apache.org
Subject: Multi-value fields in Lucene 4.1

Hi,

If I index several similar values in a multivalued field (e.g. many authors 
to one book), is there any way to know which of these matched during a 
query?
e.g.

  Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda 
Bootstrap"

If we queried for +(author:Be*) and matched this document, is there a way of 
drilling down and identifying the specific sub-field that actually triggered 
the match ("Belinda Bootstrap") ?  I was wondering what the lowest 
granularity of matching actually is - document / field / sub-field ...

I am happy to index with term vectors and positions if it helps.

Thanks,

- Chris 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Multi-value fields in Lucene 4.1

Posted by Chris Bamford <ch...@talktalk.net>.

Hi,

If I index several similar values in a multivalued field (e.g. many authors to one book), is there any way to know which of these matched during a query?
e.g.

  Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda Bootstrap"

If we queried for +(author:Be*) and matched this document, is there a way of drilling down and identifying the specific sub-field that actually triggered the match ("Belinda Bootstrap") ?  I was wondering what the lowest granularity of matching actually is - document / field / sub-field ...

I am happy to index with term vectors and positions if it helps.

Thanks,

- Chris

RE: WildCardTermEnum in Lucene 4.1

Posted by Uwe Schindler <uw...@thetaphi.de>.

You can do the following:

Automaton automaton = WildcardQuery.toAutomaton(wildcardTerm); // this transforms the wildcard syntax with ? and * to a state machine
CompiledAutomaton compiled = new CompiledAutomaton(automaton); // copiles the state machine
TermsEnum enum = compiled.getTermsEnum(terms); // "terms" can be retrieved from AtomicReader

The old WildcardTermsEnum no longer exists, because you can create a finite state meachine from every wildcard or regexp. The automaton support is part of Lucene's term dictionary, so the above code uses the index reader to get the filtered terms.

The above code was copied from WildCardQuery and its superclass AutomatonQuery.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Ramprakash Ramamoorthy [mailto:youngestachiever@gmail.com]
> Sent: Thursday, March 21, 2013 11:31 AM
> To: java-user@lucene.apache.org
> Subject: WildCardTermEnum in Lucene 4.1
> 
> Team,
> 
>        We are in the process of migrating our codebase from lucene 2.3(Yeah, its
> way older) to lucene 4.1. We had previously used WildCardTermEnum
> <http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/all
> /org/apache/lucene/search/WildcardTermEnum.html>
> in
> our code base.
> 
>        I don't find this in 4.1 and did some googling, but in vein. May be some
> one can help with the equivalent of this WildCardTermEnum in 4.1?
> Thanks in advance.
> 
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> India,
> +91 9626975420


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org