You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by java8964 java8964 <ja...@hotmail.com> on 2010/02/02 21:56:26 UTC

confused by the lucene boolean query with wildcard result

Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.

Lucene version: 2.9.0
JDK 1.6.0_18

public class IndexTest1 {
    public static void main(String[] args) {
        try {
            FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
            IndexSearcher searcher = new IndexSearcher(directory, true);
            PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
            wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
            wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
            Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
            System.out.println("query = " + query);
            System.out.println("hits = " + searcher.search(query, 100).totalHits);
            searcher.close();
        } catch (Exception e) {
            System.out.println(e);
        }
    }
}

Output:
query = f2string_ti:subbank*
hits = 6

If I change the line to the following:

Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");

Output:
query = f2string_ti:rdmap*
hits = 4

The above result are both correct based on my data.

Now if I change the line to:

Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");

Output:
query = f2string_ti:subbank* f2string_ti:rdmap*
hits = 2


I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?

Thanks

 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/

RE: confused by the lucene boolean query with wildcard result

Posted by java8964 java8964 <ja...@hotmail.com>.
Thanks for you help.

I upgrade the lucene to 2.9.1, the problem is gone. It looks like a boolean query bug in the lucene 2.9.0 and fixed in the 2.9.1

Thanks

> From: ian.lea@gmail.com
> Date: Wed, 3 Feb 2010 10:02:27 +0000
> Subject: Re: confused by the lucene boolean query with wildcard result
> To: java-user@lucene.apache.org
> 
> You should probably be using your PerFieldAnalyzerWrapper in your
> calls to QueryParser but apart from that I can't see any obvious
> reason.  General advice: use Luke to check what has been indexed and
> read http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
> 
> If none of these help, post again but showing what you are indexing as
> well as how you are searching - the smallest possible test case or
> self-contained program that shows the problem.
> 
> Or maybe someone else will spot the problem.
> 
> 
> --
> Ian.
> 
> 
> 
> On Tue, Feb 2, 2010 at 8:56 PM, java8964 java8964 <ja...@hotmail.com> wrote:
> >
> > Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.
> >
> > Lucene version: 2.9.0
> > JDK 1.6.0_18
> >
> > public class IndexTest1 {
> >    public static void main(String[] args) {
> >        try {
> >            FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
> >            IndexSearcher searcher = new IndexSearcher(directory, true);
> >            PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
> >            wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
> >            wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
> >            Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
> >            System.out.println("query = " + query);
> >            System.out.println("hits = " + searcher.search(query, 100).totalHits);
> >            searcher.close();
> >        } catch (Exception e) {
> >            System.out.println(e);
> >        }
> >    }
> > }
> >
> > Output:
> > query = f2string_ti:subbank*
> > hits = 6
> >
> > If I change the line to the following:
> >
> > Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
> >
> > Output:
> > query = f2string_ti:rdmap*
> > hits = 4
> >
> > The above result are both correct based on my data.
> >
> > Now if I change the line to:
> >
> > Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");
> >
> > Output:
> > query = f2string_ti:subbank* f2string_ti:rdmap*
> > hits = 2
> >
> >
> > I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?
> >
> > Thanks
> >
> >
> > _________________________________________________________________
> > Hotmail: Trusted email with powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/

Re: confused by the lucene boolean query with wildcard result

Posted by Ian Lea <ia...@gmail.com>.
You should probably be using your PerFieldAnalyzerWrapper in your
calls to QueryParser but apart from that I can't see any obvious
reason.  General advice: use Luke to check what has been indexed and
read http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

If none of these help, post again but showing what you are indexing as
well as how you are searching - the smallest possible test case or
self-contained program that shows the problem.

Or maybe someone else will spot the problem.


--
Ian.



On Tue, Feb 2, 2010 at 8:56 PM, java8964 java8964 <ja...@hotmail.com> wrote:
>
> Hi, I have the following test case point to the index generated in our application. The result is confusing me and I don't know the reason.
>
> Lucene version: 2.9.0
> JDK 1.6.0_18
>
> public class IndexTest1 {
>    public static void main(String[] args) {
>        try {
>            FSDirectory directory = FSDirectory.open(new File("/path_to_index_files"));
>            IndexSearcher searcher = new IndexSearcher(directory, true);
>            PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer());
>            wrapper.addAnalyzer("f1string_sif", new KeywordAnalyzer());
>            wrapper.addAnalyzer("f2string_ti", new StandardAnalyzer(Version.LUCENE_CURRENT));
>            Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank*");
>            System.out.println("query = " + query);
>            System.out.println("hits = " + searcher.search(query, 100).totalHits);
>            searcher.close();
>        } catch (Exception e) {
>            System.out.println(e);
>        }
>    }
> }
>
> Output:
> query = f2string_ti:subbank*
> hits = 6
>
> If I change the line to the following:
>
> Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:rdmap*");
>
> Output:
> query = f2string_ti:rdmap*
> hits = 4
>
> The above result are both correct based on my data.
>
> Now if I change the line to:
>
> Query query = new QueryParser("f1string_sif", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("f2string_ti:subbank* OR f2string_ti:rdmap*");
>
> Output:
> query = f2string_ti:subbank* f2string_ti:rdmap*
> hits = 2
>
>
> I assume the count in the last result should be larger than max(6,4), but it is 2. Any reason for that?
>
> Thanks
>
>
> _________________________________________________________________
> Hotmail: Trusted email with powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469227/direct/01/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org