You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by java8964 java8964 <ja...@hotmail.com> on 2010/02/02 21:56:26 UTC
confused by the lucene boolean query with wildcard result
Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.
Lucene version: 2.9.0
JDK 1.6.0_18
public class IndexTest1 {
public static void main(String[] args) {
try {
FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
IndexSearcher searcher = new IndexSearcher(directory, true);
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
System.out.println("query = " + query);
System.out.println("hits = " + searcher.search(query, 100).totalHits);
searcher.close();
} catch (Exception e) {
System.out.println(e);
}
}
}
Output:
query = f2string_ti:subbank*
hits = 6
If I change the line to the following:
Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
Output:
query = f2string_ti:rdmap*
hits = 4
The above result are both correct based on my data.
Now if I change the line to:
Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");
Output:
query = f2string_ti:subbank* f2string_ti:rdmap*
hits = 2
I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?
Thanks
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/
RE: confused by the lucene boolean query with wildcard result
Posted by java8964 java8964 <ja...@hotmail.com>.
Thanks for you help.
I upgrade the lucene to 2.9.1, the problem is gone. It looks like a boolean query bug in the lucene 2.9.0 and fixed in the 2.9.1
Thanks
> From: ian.lea@gmail.com
> Date: Wed, 3 Feb 2010 10:02:27 +0000
> Subject: Re: confused by the lucene boolean query with wildcard result
> To: java-user@lucene.apache.org
>
> You should probably be using your PerFieldAnalyzerWrapper in your
> calls to QueryParser but apart from that I can't see any obvious
> reason. General advice: use Luke to check what has been indexed and
> read http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
>
> If none of these help, post again but showing what you are indexing as
> well as how you are searching - the smallest possible test case or
> self-contained program that shows the problem.
>
> Or maybe someone else will spot the problem.
>
>
> --
> Ian.
>
>
>
> On Tue, Feb 2, 2010 at 8:56 PM, java8964 java8964 <ja...@hotmail.com> wrote:
> >
> > Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.
> >
> > Lucene version: 2.9.0
> > JDK 1.6.0_18
> >
> > public class IndexTest1 {
> > public static void main(String[] args) {
> > try {
> > FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
> > IndexSearcher searcher = new IndexSearcher(directory, true);
> > PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
> > wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
> > wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
> > Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
> > System.out.println("query = " + query);
> > System.out.println("hits = " + searcher.search(query, 100).totalHits);
> > searcher.close();
> > } catch (Exception e) {
> > System.out.println(e);
> > }
> > }
> > }
> >
> > Output:
> > query = f2string_ti:subbank*
> > hits = 6
> >
> > If I change the line to the following:
> >
> > Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
> >
> > Output:
> > query = f2string_ti:rdmap*
> > hits = 4
> >
> > The above result are both correct based on my data.
> >
> > Now if I change the line to:
> >
> > Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");
> >
> > Output:
> > query = f2string_ti:subbank* f2string_ti:rdmap*
> > hits = 2
> >
> >
> > I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?
> >
> > Thanks
> >
> >
> > _________________________________________________________________
> > Hotmail: Trusted email with powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469227/direct/01/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/
Re: confused by the lucene boolean query with wildcard result
Posted by Ian Lea <ia...@gmail.com>.
You should probably be using your PerFieldAnalyzerWrapper in your
calls to QueryParser but apart from that I can't see any obvious
reason. General advice: use Luke to check what has been indexed and
read http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
If none of these help, post again but showing what you are indexing as
well as how you are searching - the smallest possible test case or
self-contained program that shows the problem.
Or maybe someone else will spot the problem.
--
Ian.
On Tue, Feb 2, 2010 at 8:56 PM, java8964 java8964 <ja...@hotmail.com> wrote:
>
> Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.
>
> Lucene version: 2.9.0
> JDK 1.6.0_18
>
> public class IndexTest1 {
> public static void main(String[] args) {
> try {
> FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
> IndexSearcher searcher = new IndexSearcher(directory, true);
> PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
> wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
> wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
> Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
> System.out.println("query = " + query);
> System.out.println("hits = " + searcher.search(query, 100).totalHits);
> searcher.close();
> } catch (Exception e) {
> System.out.println(e);
> }
> }
> }
>
> Output:
> query = f2string_ti:subbank*
> hits = 6
>
> If I change the line to the following:
>
> Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
>
> Output:
> query = f2string_ti:rdmap*
> hits = 4
>
> The above result are both correct based on my data.
>
> Now if I change the line to:
>
> Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");
>
> Output:
> query = f2string_ti:subbank* f2string_ti:rdmap*
> hits = 2
>
>
> I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?
>
> Thanks
>
>
> _________________________________________________________________
> Hotmail: Trusted email with powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469227/direct/01/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org