You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by Rob Cecil <ro...@gmail.com> on 2012/06/26 20:48:07 UTC

Re: SPAM-HIGH: Disparity between API usage and Luke

Same results, apparently, when I use Luke 1.0.1.

When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom app,
zero.

On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:

> You appear to be using Luke 3.5 which per the information on the Luke
> homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>
> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> different behavior between the API and executing in Luke.
>
> If you use a version of Luke which more closely aligns with the version of
> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> since the 2.9.x releases were previews of the 3.0.x releases as I
> understood it) what behavior do you see?
>
> Hope this helps,
>
> Rob
>
> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>
> >If I run a query against my index using QueryParser to query a field:
> >
> >                var query = _parser.Parse("Id:BAUER*");
> >                var topDocs = searcher.Search(query, 10);
> >                Assert.AreEqual(count, topDocs.TotalHits);
> >
> >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> >results, what am I doing wrong? I use the StandardAnalyzer both to
> >create the index and to query.
> >
> >The field is defined as:
> >
> >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> >
> >and is a string field. The result set back from Luke looks like
> >(screencap):
> >
> >http://screencast.com/t/NooMK2Rf
> >
> >Thanks!
>
>
>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Itamar Syn-Hershko <it...@code972.com>.

It is, if you have any uppercase letters for example. If StandardAnalyzer
is passed to QueryParser and you search on a non-analyzed field, you won't
be able to find it.

On Wed, Jun 27, 2012 at 12:15 AM, Rob Cecil <ro...@gmail.com> wrote:

> Well the field is "Id" - which contains unique, non-recurring terms. So I
> don't think mapping it to Field.Index.ANALYZED makes sense, does it? If it
> is mapped as NOT_ANALYZED, it is still indexed, so should I be able to
> issue a query like "Id:BAUER*" ?
>
> On Tue, Jun 26, 2012 at 3:06 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > QueryParser has no knowledge of how data was indexed.  For your scenario,
> > I don't believe you would be able to use Query Parser with standard
> > analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
> > option.
> >
> > Interesting question is why is luke working/finding the match?  I would
> > have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect the
> > case of the source string. I was hoping to avoid using the Query API
> > per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > >> var query = _parser.Parse("Id:BAUER*");
> > >
> > > In your code, most likely, the value got converted to lower case (i.e.
> > > bauer*) by the parse statement.
> > > Whereas indexed value is in upper case as it is not analyzed (from
> > > screen shot).
> > >
> > > Can you explicitly try using prefix query?
> > >
> > >
> > >
> > > > Same results, apparently, when I use Luke 1.0.1.
> > > >
> > > > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > > custom app, zero.
> > > >
> > > > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > > wrote:
> > > >
> > > > > You appear to be using Luke 3.5 which per the information on the
> > > > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > > > >
> > > > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > > > see different behavior between the API and executing in Luke.
> > > > >
> > > > > If you use a version of Luke which more closely aligns with the
> > > > > version
> > > > of
> > > > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > > > enough since the 2.9.x releases were previews of the 3.0.x
> > > > > releases as I understood it) what behavior do you see?
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Rob
> > > > >
> > > > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > > >
> > > > > >If I run a query against my index using QueryParser to query a
> > field:
> > > > > >
> > > > > >                var query = _parser.Parse("Id:BAUER*");
> > > > > >                var topDocs = searcher.Search(query, 10);
> > > > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > > > >
> > > > > >I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > > > >yields
> > > > > >15 results, what am I doing wrong? I use the StandardAnalyzer
> > > > > >both to create the index and to query.
> > > > > >
> > > > > >The field is defined as:
> > > > > >
> > > > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > > > >Field.Index.NOT_ANALYZED)
> > > > > >
> > > > > >and is a string field. The result set back from Luke looks like
> > > > > >(screencap):
> > > > > >
> > > > > >http://screencast.com/t/NooMK2Rf
> > > > > >
> > > > > >Thanks!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Well the field is "Id" - which contains unique, non-recurring terms. So I
don't think mapping it to Field.Index.ANALYZED makes sense, does it? If it
is mapped as NOT_ANALYZED, it is still indexed, so should I be able to
issue a query like "Id:BAUER*" ?

On Tue, Jun 26, 2012 at 3:06 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> QueryParser has no knowledge of how data was indexed.  For your scenario,
> I don't believe you would be able to use Query Parser with standard
> analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
> option.
>
> Interesting question is why is luke working/finding the match?  I would
> have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the
> case of the source string. I was hoping to avoid using the Query API
> per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > >> var query = _parser.Parse("Id:BAUER*");
> >
> > In your code, most likely, the value got converted to lower case (i.e.
> > bauer*) by the parse statement.
> > Whereas indexed value is in upper case as it is not analyzed (from
> > screen shot).
> >
> > Can you explicitly try using prefix query?
> >
> >
> >
> > > Same results, apparently, when I use Luke 1.0.1.
> > >
> > > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > custom app, zero.
> > >
> > > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > wrote:
> > >
> > > > You appear to be using Luke 3.5 which per the information on the
> > > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > > >
> > > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > > see different behavior between the API and executing in Luke.
> > > >
> > > > If you use a version of Luke which more closely aligns with the
> > > > version
> > > of
> > > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > > enough since the 2.9.x releases were previews of the 3.0.x
> > > > releases as I understood it) what behavior do you see?
> > > >
> > > > Hope this helps,
> > > >
> > > > Rob
> > > >
> > > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >
> > > > >If I run a query against my index using QueryParser to query a
> field:
> > > > >
> > > > >                var query = _parser.Parse("Id:BAUER*");
> > > > >                var topDocs = searcher.Search(query, 10);
> > > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > > >
> > > > >I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > > >yields
> > > > >15 results, what am I doing wrong? I use the StandardAnalyzer
> > > > >both to create the index and to query.
> > > > >
> > > > >The field is defined as:
> > > > >
> > > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > > >Field.Index.NOT_ANALYZED)
> > > > >
> > > > >and is a string field. The result set back from Luke looks like
> > > > >(screencap):
> > > > >
> > > > >http://screencast.com/t/NooMK2Rf
> > > > >
> > > > >Thanks!
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Yeah sorry, I should have created 7 documents in the testindex - in my rush to get a standalone test done and emailed out I botched that. Thanks for the insight into the case issue with the KeywordAnalyzer. I'm starting to think how I might structure my application to possibly use the Query API in conjunction with the QueryParser. But, QueryParser is very compelling. 

Sent from my iPhone

On Jun 26, 2012, at 9:28 PM, "Lingam, ChandraMohan J" <ch...@intel.com> wrote:

> Interestingly, the query generated from this var query = queryParser.Parse("Id:BAUER*") is converted to lower case "bauer*" eventhough you are using KeywordAnalyzer.  I am not sure if this is the intended behavior of the keyword analyzer.
> 
> So, best option to make this example work is to index in lowercase:
>            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, Field.Index.NOT_ANALYZED));
> 
> Also, the assert will always fail because hit count even when it matches will be 1 since there is only one document with several values associated with the field.  You would need to iterate thru the fields.  If you want to match 6 documents, then you have to add as six separate documents instead one document will all the values.
> 
> 
> 
> 
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com] 
> Sent: Tuesday, June 26, 2012 6:55 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> 
> Sure, this is self-contained:
> 
> [Test]
>        public void QueryNonAnalyzedField()
>        {
>            var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>            var analyzer = new KeywordAnalyzer();
>            var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
>            var document = new Document();
>            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            writer.AddDocument(document);
>            writer.Optimize();
>            writer.Close();
> 
>            IndexReader reader = IndexReader.Open(directory, true);
>            var queryParser = new QueryParser(Version.LUCENE_29, "content", analyzer);
>            var query = queryParser.Parse("Id:BAUER*");
>            var indexSearch = new IndexSearcher(reader);
>            var hits = indexSearch.Search(query);
>            Assert.AreEqual(6, hits.Length());
>        }
> 
> 
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
> 
>> Just did a simple test and Keywordanalyzer does indeed work like a 
>> prefix query if you put a star at the end. Agree with Simon.  Most 
>> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>> 
>> Please post a small snippet of your index code and query code...
>> 
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>> 
>> Thanks, and there is no equivalent QueryParser syntax for that?
>> 
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
>> chandramohan.j.lingam@intel.com> wrote:
>> 
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>> Since you are looking for prefix based search, your best option is 
>>> to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>> 
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>> 
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers 
>>> produce the same results.
>>> 
>>> To make it interesting, back in my code, I switched over to using 
>>> the KeywordAnalyzer, and I'm still not getting any results against 
>>> that NOT_ANALYZED field.
>>> 
>>> ?
>>> 
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
>>> chandramohan.j.lingam@intel.com> wrote:
>>> 
>>>> Luke using keyword analyzer as default makes sense. However, in 
>>>> the original post, there was a link to luke output screenshot 
>>>> which showed that standard analyzer was in use for query parsing.
>>>> 
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>> 
>>>> Luke defaults to KeywordAnalyzer which wont change your term in 
>>>> any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can 
>>>> have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
>>>> for everything else) using a PerFieldAnalyzerWrapper.
>>>> 
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser 
>>>> with standard analyzer when data was originally indexed with 
>>>> Field.Index.NOT_ANALYZED option.
>>>>> 
>>>>> Interesting question is why is luke working/finding the match?  
>>>>> I would
>>>> have expected Luke to not find any matches.
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>> 
>>>>> I can definitely try that. I just expected QueryParser would 
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query 
>>>> API per-se, and just let the parser to the work for me.
>>>>> 
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.com> wrote:
>>>>> 
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower 
>>>>>> case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed 
>>>>>> (from screen shot).
>>>>>> 
>>>>>> Can you explicitly try using prefix query?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>> 
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
>>>>>>> custom app, zero.
>>>>>>> 
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
>>>>>>> <rv...@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information on 
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses 
>>>>>>>> Lucene
>>>>>>>> 3.5
>>>>>>>> 
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
>>>>>>>> surprised to see different behavior between the API and executing in Luke.
>>>>>>>> 
>>>>>>>> If you use a version of Luke which more closely aligns with 
>>>>>>>> the version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
>>>>>>>> close enough since the 2.9.x releases were previews of the 
>>>>>>>> 3.0.x releases as I understood it) what behavior do you see?
>>>>>>>> 
>>>>>>>> Hope this helps,
>>>>>>>> 
>>>>>>>> Rob
>>>>>>>> 
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> If I run a query against my index using QueryParser to query 
>>>>>>>>> a
>>> field:
>>>>>>>>> 
>>>>>>>>>                var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                var topDocs = searcher.Search(query, 10);
>>>>>>>>>                Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>> 
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the 
>>>>>>>>> StandardAnalyzer both to create the index and to query.
>>>>>>>>> 
>>>>>>>>> The field is defined as:
>>>>>>>>> 
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>> 
>>>>>>>>> and is a string field. The result set back from Luke looks 
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>> 
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Interestingly, the query generated from this var query = queryParser.Parse("Id:BAUER*") is converted to lower case "bauer*" eventhough you are using KeywordAnalyzer.  I am not sure if this is the intended behavior of the keyword analyzer.

So, best option to make this example work is to index in lowercase:
            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, Field.Index.NOT_ANALYZED));

Also, the assert will always fail because hit count even when it matches will be 1 since there is only one document with several values associated with the field.  You would need to iterate thru the fields.  If you want to match 6 documents, then you have to add as six separate documents instead one document will all the values.




-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 6:55 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Sure, this is self-contained:

[Test]
        public void QueryNonAnalyzedField()
        {
            var indexPath = Path.Combine(Environment.CurrentDirectory,
"testindex");
            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
            var analyzer = new KeywordAnalyzer();
            var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
            var document = new Document();
            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, Field.Index.NOT_ANALYZED));
            writer.AddDocument(document);
            writer.Optimize();
            writer.Close();

            IndexReader reader = IndexReader.Open(directory, true);
            var queryParser = new QueryParser(Version.LUCENE_29, "content", analyzer);
            var query = queryParser.Parse("Id:BAUER*");
            var indexSearch = new IndexSearcher(reader);
            var hits = indexSearch.Search(query);
            Assert.AreEqual(6, hits.Length());
        }


On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> Just did a simple test and Keywordanalyzer does indeed work like a 
> prefix query if you put a star at the end. Agree with Simon.  Most 
> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>
> Please post a small snippet of your index code and query code...
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 5:25 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Thanks, and there is no equivalent QueryParser syntax for that?
>
> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lingam@intel.com> wrote:
>
> > actually, that makes sense. Keyword analyzer would try for an exact
> match.
> >  Since you are looking for prefix based search, your best option is 
> > to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 4:57 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > That is correct. I've verified in Luke 1.0.1 that both analyzers 
> > produce the same results.
> >
> > To make it interesting, back in my code, I switched over to using 
> > the KeywordAnalyzer, and I'm still not getting any results against 
> > that NOT_ANALYZED field.
> >
> > ?
> >
> > On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > Luke using keyword analyzer as default makes sense. However, in 
> > > the original post, there was a link to luke output screenshot 
> > > which showed that standard analyzer was in use for query parsing.
> > >
> > > -----Original Message-----
> > > From: Simon Svensson [mailto:sisve@devhost.se]
> > > Sent: Tuesday, June 26, 2012 2:56 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > Luke defaults to KeywordAnalyzer which wont change your term in 
> > > any
> way.
> > > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > > would become (Name:Jack DefaultField:Bauer). I believe you can 
> > > have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
> > > for everything else) using a PerFieldAnalyzerWrapper.
> > >
> > > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > > QueryParser has no knowledge of how data was indexed.  For your
> > > scenario, I don't believe you would be able to use Query Parser 
> > > with standard analyzer when data was originally indexed with 
> > > Field.Index.NOT_ANALYZED option.
> > > >
> > > > Interesting question is why is luke working/finding the match?  
> > > > I would
> > > have expected Luke to not find any matches.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > > >
> > > > I can definitely try that. I just expected QueryParser would 
> > > > respect the
> > > case of the source string. I was hoping to avoid using the Query 
> > > API per-se, and just let the parser to the work for me.
> > > >
> > > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > > chandramohan.j.lingam@intel.com> wrote:
> > > >
> > > >>>> var query = _parser.Parse("Id:BAUER*");
> > > >> In your code, most likely, the value got converted to lower 
> > > >> case
> (i.e.
> > > >> bauer*) by the parse statement.
> > > >> Whereas indexed value is in upper case as it is not analyzed 
> > > >> (from screen shot).
> > > >>
> > > >> Can you explicitly try using prefix query?
> > > >>
> > > >>
> > > >>
> > > >>> Same results, apparently, when I use Luke 1.0.1.
> > > >>>
> > > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > > >>> custom app, zero.
> > > >>>
> > > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
> > > >>> <rv...@dotnetrdf.org>
> > > >> wrote:
> > > >>>> You appear to be using Luke 3.5 which per the information on 
> > > >>>> the Luke homepage (http://code.google.com/p/luke/) uses 
> > > >>>> Lucene
> > > >>>> 3.5
> > > >>>>
> > > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
> > > >>>> surprised to see different behavior between the API and executing in Luke.
> > > >>>>
> > > >>>> If you use a version of Luke which more closely aligns with 
> > > >>>> the version
> > > >>> of
> > > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
> > > >>>> close enough since the 2.9.x releases were previews of the 
> > > >>>> 3.0.x releases as I understood it) what behavior do you see?
> > > >>>>
> > > >>>> Hope this helps,
> > > >>>>
> > > >>>> Rob
> > > >>>>
> > > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >>>>
> > > >>>>> If I run a query against my index using QueryParser to query 
> > > >>>>> a
> > field:
> > > >>>>>
> > > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > > >>>>>                 var topDocs = searcher.Search(query, 10);
> > > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > > >>>>>
> > > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > > >>>>> yields
> > > >>>>> 15 results, what am I doing wrong? I use the 
> > > >>>>> StandardAnalyzer both to create the index and to query.
> > > >>>>>
> > > >>>>> The field is defined as:
> > > >>>>>
> > > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > > >>>>> Field.Index.NOT_ANALYZED)
> > > >>>>>
> > > >>>>> and is a string field. The result set back from Luke looks 
> > > >>>>> like
> > > >>>>> (screencap):
> > > >>>>>
> > > >>>>> http://screencast.com/t/NooMK2Rf
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > >
> > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

And the prize goes to Simon for figuring out the quandary about why Luke
behaved differently. Indeed Luke seems to default its QP to have
SetLowercaseExpandedTerms set to false also. Check out this screenshot:

http://screencast.com/t/zb2jNT3wAM

Notice the checkbox "Lowercase expanded terms..." is unchecked.

On Wed, Jun 27, 2012 at 10:05 AM, Rob Cecil <ro...@gmail.com> wrote:

> Thanks Simon that works - even with StandardAnalyzer! :)
>
>
> On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <si...@devhost.se> wrote:
>
>> Set queryParser.**SetLowercaseExpandedTerms(**false);
>>
>>
>> On 2012-06-27 03:55, Rob Cecil wrote:
>>
>>> Sure, this is self-contained:
>>>
>>> [Test]
>>>         public void QueryNonAnalyzedField()
>>>         {
>>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>>> "testindex");
>>>             var directory = FSDirectory.Open(new
>>> DirectoryInfo(indexPath));
>>>             var analyzer = new KeywordAnalyzer();
>>>             var writer = new IndexWriter(directory, analyzer, true,
>>> IndexWriter.MaxFieldLength.**LIMITED);
>>>             var document = new Document();
>>>             document.Add(new Field("Id", "BAUERREVENUE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERLOCATION",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCT",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERSTATE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERTOTAL",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>             writer.AddDocument(document);
>>>             writer.Optimize();
>>>             writer.Close();
>>>
>>>             IndexReader reader = IndexReader.Open(directory, true);
>>>             var queryParser = new QueryParser(Version.LUCENE_29,
>>> "content", analyzer);
>>>             var query = queryParser.Parse("Id:BAUER*")**;
>>>             var indexSearch = new IndexSearcher(reader);
>>>             var hits = indexSearch.Search(query);
>>>             Assert.AreEqual(6, hits.Length());
>>>         }
>>>
>>>
>>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>> wrote:
>>>
>>>  Just did a simple test and Keywordanalyzer does indeed work like a
>>>> prefix
>>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>>> was
>>>> using keyword analyzer and somehow UI was not reflecting it?
>>>>
>>>> Please post a small snippet of your index code and query code...
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>>
>>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>> wrote:
>>>>
>>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>>
>>>> match.
>>>>
>>>>>  Since you are looking for prefix based search, your best option is to
>>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>>> prefixquery.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>>> produce the same results.
>>>>>
>>>>> To make it interesting, back in my code, I switched over to using the
>>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>>> NOT_ANALYZED field.
>>>>>
>>>>> ?
>>>>>
>>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>> wrote:
>>>>>
>>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>>> original post, there was a link to luke output screenshot which
>>>>>> showed that standard analyzer was in use for query parsing.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>>
>>>>> way.
>>>>
>>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>>
>>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>>
>>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>>
>>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>>> standard analyzer when data was originally indexed with
>>>>>> Field.Index.NOT_ANALYZED option.
>>>>>>
>>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>>> would
>>>>>>>
>>>>>> have expected Luke to not find any matches.
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>>
>>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>>> respect the
>>>>>>>
>>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>>> per-se, and just let the parser to the work for me.
>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>>
>>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>>> wrote:
>>>>>>
>>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>
>>>>>>>>> In your code, most likely, the value got converted to lower case
>>>>>>>>
>>>>>>> (i.e.
>>>>
>>>>>  bauer*) by the parse statement.
>>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>>> (from screen shot).
>>>>>>>>
>>>>>>>> Can you explicitly try using prefix query?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>>
>>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>>>> custom app, zero.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>>> <rv...@dotnetrdf.org>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>>>> the Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>>> uses Lucene
>>>>>>>>>> 3.5
>>>>>>>>>>
>>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>>>
>>>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>>>> version
>>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>  If I run a query against my index using QueryParser to query a
>>>>>>>>>>>
>>>>>>>>>> field:
>>>>>
>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>>
>>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>>>> yields
>>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>>
>>>>>>>>>>> The field is defined as:
>>>>>>>>>>>
>>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>>
>>>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>>>> like
>>>>>>>>>>> (screencap):
>>>>>>>>>>>
>>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>
>>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Thanks Simon that works - even with StandardAnalyzer! :)

On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <si...@devhost.se> wrote:

> Set queryParser.**SetLowercaseExpandedTerms(**false);
>
>
> On 2012-06-27 03:55, Rob Cecil wrote:
>
>> Sure, this is self-contained:
>>
>> [Test]
>>         public void QueryNonAnalyzedField()
>>         {
>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>> "testindex");
>>             var directory = FSDirectory.Open(new
>> DirectoryInfo(indexPath));
>>             var analyzer = new KeywordAnalyzer();
>>             var writer = new IndexWriter(directory, analyzer, true,
>> IndexWriter.MaxFieldLength.**LIMITED);
>>             var document = new Document();
>>             document.Add(new Field("Id", "BAUERREVENUE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERLOCATION",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCT",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERSTATE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERTOTAL",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>> Field.Index.NOT_ANALYZED));
>>             writer.AddDocument(document);
>>             writer.Optimize();
>>             writer.Close();
>>
>>             IndexReader reader = IndexReader.Open(directory, true);
>>             var queryParser = new QueryParser(Version.LUCENE_29,
>> "content", analyzer);
>>             var query = queryParser.Parse("Id:BAUER*")**;
>>             var indexSearch = new IndexSearcher(reader);
>>             var hits = indexSearch.Search(query);
>>             Assert.AreEqual(6, hits.Length());
>>         }
>>
>>
>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>> wrote:
>>
>>  Just did a simple test and Keywordanalyzer does indeed work like a prefix
>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>> was
>>> using keyword analyzer and somehow UI was not reflecting it?
>>>
>>> Please post a small snippet of your index code and query code...
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>
>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>> wrote:
>>>
>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>
>>> match.
>>>
>>>>  Since you are looking for prefix based search, your best option is to
>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>> prefixquery.
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>> produce the same results.
>>>>
>>>> To make it interesting, back in my code, I switched over to using the
>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>> NOT_ANALYZED field.
>>>>
>>>> ?
>>>>
>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>> wrote:
>>>>
>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>> original post, there was a link to luke output screenshot which
>>>>> showed that standard analyzer was in use for query parsing.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>
>>>> way.
>>>
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>
>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>
>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>
>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>> standard analyzer when data was originally indexed with
>>>>> Field.Index.NOT_ANALYZED option.
>>>>>
>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>> would
>>>>>>
>>>>> have expected Luke to not find any matches.
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>> respect the
>>>>>>
>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>> per-se, and just let the parser to the work for me.
>>>>>
>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>
>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>> wrote:
>>>>>
>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>
>>>>>>>> In your code, most likely, the value got converted to lower case
>>>>>>>
>>>>>> (i.e.
>>>
>>>>  bauer*) by the parse statement.
>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>> (from screen shot).
>>>>>>>
>>>>>>> Can you explicitly try using prefix query?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>
>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>>> custom app, zero.
>>>>>>>>
>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>> <rv...@dotnetrdf.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>>> the Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>> uses Lucene
>>>>>>>>> 3.5
>>>>>>>>>
>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>>
>>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>>> version
>>>>>>>>>
>>>>>>>> of
>>>>>>>>
>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>  If I run a query against my index using QueryParser to query a
>>>>>>>>>>
>>>>>>>>> field:
>>>>
>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>
>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>>> yields
>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>
>>>>>>>>>> The field is defined as:
>>>>>>>>>>
>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>
>>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>>> like
>>>>>>>>>> (screencap):
>>>>>>>>>>
>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Simon Svensson <si...@devhost.se>.

Set queryParser.SetLowercaseExpandedTerms(false);

On 2012-06-27 03:55, Rob Cecil wrote:
> Sure, this is self-contained:
>
> [Test]
>          public void QueryNonAnalyzedField()
>          {
>              var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>              var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>              var analyzer = new KeywordAnalyzer();
>              var writer = new IndexWriter(directory, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
>              var document = new Document();
>              document.Add(new Field("Id", "BAUERREVENUE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERLOCATION",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCT",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCTLINE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERSTATE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERTOTAL",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>              writer.AddDocument(document);
>              writer.Optimize();
>              writer.Close();
>
>              IndexReader reader = IndexReader.Open(directory, true);
>              var queryParser = new QueryParser(Version.LUCENE_29,
> "content", analyzer);
>              var query = queryParser.Parse("Id:BAUER*");
>              var indexSearch = new IndexSearcher(reader);
>              var hits = indexSearch.Search(query);
>              Assert.AreEqual(6, hits.Length());
>          }
>
>
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
>> Just did a simple test and Keywordanalyzer does indeed work like a prefix
>> query if you put a star at the end. Agree with Simon.  Most likely luke was
>> using keyword analyzer and somehow UI was not reflecting it?
>>
>> Please post a small snippet of your index code and query code...
>>
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>
>> Thanks, and there is no equivalent QueryParser syntax for that?
>>
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.com> wrote:
>>
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>>   Since you are looking for prefix based search, your best option is to
>>> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>> produce the same results.
>>>
>>> To make it interesting, back in my code, I switched over to using the
>>> KeywordAnalyzer, and I'm still not getting any results against that
>>> NOT_ANALYZED field.
>>>
>>> ?
>>>
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.com> wrote:
>>>
>>>> Luke using keyword analyzer as default makes sense. However, in the
>>>> original post, there was a link to luke output screenshot which
>>>> showed that standard analyzer was in use for query parsing.
>>>>
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser with
>>>> standard analyzer when data was originally indexed with
>>>> Field.Index.NOT_ANALYZED option.
>>>>> Interesting question is why is luke working/finding the match?  I
>>>>> would
>>>> have expected Luke to not find any matches.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> I can definitely try that. I just expected QueryParser would
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query API
>>>> per-se, and just let the parser to the work for me.
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.com> wrote:
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>> (from screen shot).
>>>>>>
>>>>>> Can you explicitly try using prefix query?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>>
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>> custom app, zero.
>>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>> <rv...@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene
>>>>>>>> 3.5
>>>>>>>>
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>
>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>> version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> If I run a query against my index using QueryParser to query a
>>> field:
>>>>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                  var topDocs = searcher.Search(query, 10);
>>>>>>>>>                  Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>> both to create the index and to query.
>>>>>>>>>
>>>>>>>>> The field is defined as:
>>>>>>>>>
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>
>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>>
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Sure, this is self-contained:

[Test]
        public void QueryNonAnalyzedField()
        {
            var indexPath = Path.Combine(Environment.CurrentDirectory,
"testindex");
            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
            var analyzer = new KeywordAnalyzer();
            var writer = new IndexWriter(directory, analyzer, true,
IndexWriter.MaxFieldLength.LIMITED);
            var document = new Document();
            document.Add(new Field("Id", "BAUERREVENUE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERLOCATION",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCT",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCTLINE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERSTATE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERTOTAL",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
Field.Index.NOT_ANALYZED));
            writer.AddDocument(document);
            writer.Optimize();
            writer.Close();

            IndexReader reader = IndexReader.Open(directory, true);
            var queryParser = new QueryParser(Version.LUCENE_29,
"content", analyzer);
            var query = queryParser.Parse("Id:BAUER*");
            var indexSearch = new IndexSearcher(reader);
            var hits = indexSearch.Search(query);
            Assert.AreEqual(6, hits.Length());
        }


On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> Just did a simple test and Keywordanalyzer does indeed work like a prefix
> query if you put a star at the end. Agree with Simon.  Most likely luke was
> using keyword analyzer and somehow UI was not reflecting it?
>
> Please post a small snippet of your index code and query code...
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 5:25 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Thanks, and there is no equivalent QueryParser syntax for that?
>
> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > actually, that makes sense. Keyword analyzer would try for an exact
> match.
> >  Since you are looking for prefix based search, your best option is to
> > simply use PrefixQuery and there is no need to put a "*" for prefixquery.
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 4:57 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > That is correct. I've verified in Luke 1.0.1 that both analyzers
> > produce the same results.
> >
> > To make it interesting, back in my code, I switched over to using the
> > KeywordAnalyzer, and I'm still not getting any results against that
> > NOT_ANALYZED field.
> >
> > ?
> >
> > On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > Luke using keyword analyzer as default makes sense. However, in the
> > > original post, there was a link to luke output screenshot which
> > > showed that standard analyzer was in use for query parsing.
> > >
> > > -----Original Message-----
> > > From: Simon Svensson [mailto:sisve@devhost.se]
> > > Sent: Tuesday, June 26, 2012 2:56 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > Luke defaults to KeywordAnalyzer which wont change your term in any
> way.
> > > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > > would become (Name:Jack DefaultField:Bauer). I believe you can have
> > > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> > > everything else) using a PerFieldAnalyzerWrapper.
> > >
> > > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > > QueryParser has no knowledge of how data was indexed.  For your
> > > scenario, I don't believe you would be able to use Query Parser with
> > > standard analyzer when data was originally indexed with
> > > Field.Index.NOT_ANALYZED option.
> > > >
> > > > Interesting question is why is luke working/finding the match?  I
> > > > would
> > > have expected Luke to not find any matches.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > > >
> > > > I can definitely try that. I just expected QueryParser would
> > > > respect the
> > > case of the source string. I was hoping to avoid using the Query API
> > > per-se, and just let the parser to the work for me.
> > > >
> > > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > > chandramohan.j.lingam@intel.com> wrote:
> > > >
> > > >>>> var query = _parser.Parse("Id:BAUER*");
> > > >> In your code, most likely, the value got converted to lower case
> (i.e.
> > > >> bauer*) by the parse statement.
> > > >> Whereas indexed value is in upper case as it is not analyzed
> > > >> (from screen shot).
> > > >>
> > > >> Can you explicitly try using prefix query?
> > > >>
> > > >>
> > > >>
> > > >>> Same results, apparently, when I use Luke 1.0.1.
> > > >>>
> > > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > >>> custom app, zero.
> > > >>>
> > > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
> > > >>> <rv...@dotnetrdf.org>
> > > >> wrote:
> > > >>>> You appear to be using Luke 3.5 which per the information on
> > > >>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene
> > > >>>> 3.5
> > > >>>>
> > > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
> > > >>>> to see different behavior between the API and executing in Luke.
> > > >>>>
> > > >>>> If you use a version of Luke which more closely aligns with the
> > > >>>> version
> > > >>> of
> > > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > >>>> enough since the 2.9.x releases were previews of the 3.0.x
> > > >>>> releases as I understood it) what behavior do you see?
> > > >>>>
> > > >>>> Hope this helps,
> > > >>>>
> > > >>>> Rob
> > > >>>>
> > > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >>>>
> > > >>>>> If I run a query against my index using QueryParser to query a
> > field:
> > > >>>>>
> > > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > > >>>>>                 var topDocs = searcher.Search(query, 10);
> > > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > > >>>>>
> > > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > >>>>> yields
> > > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
> > > >>>>> both to create the index and to query.
> > > >>>>>
> > > >>>>> The field is defined as:
> > > >>>>>
> > > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > > >>>>> Field.Index.NOT_ANALYZED)
> > > >>>>>
> > > >>>>> and is a string field. The result set back from Luke looks
> > > >>>>> like
> > > >>>>> (screencap):
> > > >>>>>
> > > >>>>> http://screencast.com/t/NooMK2Rf
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > >
> > >
> > >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Just did a simple test and Keywordanalyzer does indeed work like a prefix query if you put a star at the end. Agree with Simon.  Most likely luke was using keyword analyzer and somehow UI was not reflecting it?

Please post a small snippet of your index code and query code...

-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 5:25 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Thanks, and there is no equivalent QueryParser syntax for that?

On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> actually, that makes sense. Keyword analyzer would try for an exact match.
>  Since you are looking for prefix based search, your best option is to 
> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 4:57 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> That is correct. I've verified in Luke 1.0.1 that both analyzers 
> produce the same results.
>
> To make it interesting, back in my code, I switched over to using the 
> KeywordAnalyzer, and I'm still not getting any results against that 
> NOT_ANALYZED field.
>
> ?
>
> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lingam@intel.com> wrote:
>
> > Luke using keyword analyzer as default makes sense. However, in the 
> > original post, there was a link to luke output screenshot which 
> > showed that standard analyzer was in use for query parsing.
> >
> > -----Original Message-----
> > From: Simon Svensson [mailto:sisve@devhost.se]
> > Sent: Tuesday, June 26, 2012 2:56 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > Luke defaults to KeywordAnalyzer which wont change your term in any way.
> > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > would become (Name:Jack DefaultField:Bauer). I believe you can have 
> > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
> > everything else) using a PerFieldAnalyzerWrapper.
> >
> > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > QueryParser has no knowledge of how data was indexed.  For your
> > scenario, I don't believe you would be able to use Query Parser with 
> > standard analyzer when data was originally indexed with 
> > Field.Index.NOT_ANALYZED option.
> > >
> > > Interesting question is why is luke working/finding the match?  I 
> > > would
> > have expected Luke to not find any matches.
> > >
> > >
> > > -----Original Message-----
> > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > I can definitely try that. I just expected QueryParser would 
> > > respect the
> > case of the source string. I was hoping to avoid using the Query API 
> > per-se, and just let the parser to the work for me.
> > >
> > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> > >
> > >>>> var query = _parser.Parse("Id:BAUER*");
> > >> In your code, most likely, the value got converted to lower case (i.e.
> > >> bauer*) by the parse statement.
> > >> Whereas indexed value is in upper case as it is not analyzed 
> > >> (from screen shot).
> > >>
> > >> Can you explicitly try using prefix query?
> > >>
> > >>
> > >>
> > >>> Same results, apparently, when I use Luke 1.0.1.
> > >>>
> > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > >>> custom app, zero.
> > >>>
> > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
> > >>> <rv...@dotnetrdf.org>
> > >> wrote:
> > >>>> You appear to be using Luke 3.5 which per the information on 
> > >>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene 
> > >>>> 3.5
> > >>>>
> > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised 
> > >>>> to see different behavior between the API and executing in Luke.
> > >>>>
> > >>>> If you use a version of Luke which more closely aligns with the 
> > >>>> version
> > >>> of
> > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > >>>> enough since the 2.9.x releases were previews of the 3.0.x 
> > >>>> releases as I understood it) what behavior do you see?
> > >>>>
> > >>>> Hope this helps,
> > >>>>
> > >>>> Rob
> > >>>>
> > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >>>>
> > >>>>> If I run a query against my index using QueryParser to query a
> field:
> > >>>>>
> > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > >>>>>                 var topDocs = searcher.Search(query, 10);
> > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > >>>>>
> > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > >>>>> yields
> > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer 
> > >>>>> both to create the index and to query.
> > >>>>>
> > >>>>> The field is defined as:
> > >>>>>
> > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > >>>>> Field.Index.NOT_ANALYZED)
> > >>>>>
> > >>>>> and is a string field. The result set back from Luke looks 
> > >>>>> like
> > >>>>> (screencap):
> > >>>>>
> > >>>>> http://screencast.com/t/NooMK2Rf
> > >>>>>
> > >>>>> Thanks!
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> >
> >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Thanks, and there is no equivalent QueryParser syntax for that?

On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> actually, that makes sense. Keyword analyzer would try for an exact match.
>  Since you are looking for prefix based search, your best option is to
> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 4:57 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> That is correct. I've verified in Luke 1.0.1 that both analyzers produce
> the same results.
>
> To make it interesting, back in my code, I switched over to using the
> KeywordAnalyzer, and I'm still not getting any results against that
> NOT_ANALYZED field.
>
> ?
>
> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > Luke using keyword analyzer as default makes sense. However, in the
> > original post, there was a link to luke output screenshot which showed
> > that standard analyzer was in use for query parsing.
> >
> > -----Original Message-----
> > From: Simon Svensson [mailto:sisve@devhost.se]
> > Sent: Tuesday, June 26, 2012 2:56 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > Luke defaults to KeywordAnalyzer which wont change your term in any way.
> > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > would become (Name:Jack DefaultField:Bauer). I believe you can have
> > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> > everything else) using a PerFieldAnalyzerWrapper.
> >
> > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > QueryParser has no knowledge of how data was indexed.  For your
> > scenario, I don't believe you would be able to use Query Parser with
> > standard analyzer when data was originally indexed with
> > Field.Index.NOT_ANALYZED option.
> > >
> > > Interesting question is why is luke working/finding the match?  I
> > > would
> > have expected Luke to not find any matches.
> > >
> > >
> > > -----Original Message-----
> > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > I can definitely try that. I just expected QueryParser would respect
> > > the
> > case of the source string. I was hoping to avoid using the Query API
> > per-se, and just let the parser to the work for me.
> > >
> > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> > >
> > >>>> var query = _parser.Parse("Id:BAUER*");
> > >> In your code, most likely, the value got converted to lower case (i.e.
> > >> bauer*) by the parse statement.
> > >> Whereas indexed value is in upper case as it is not analyzed (from
> > >> screen shot).
> > >>
> > >> Can you explicitly try using prefix query?
> > >>
> > >>
> > >>
> > >>> Same results, apparently, when I use Luke 1.0.1.
> > >>>
> > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > >>> custom app, zero.
> > >>>
> > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > >> wrote:
> > >>>> You appear to be using Luke 3.5 which per the information on the
> > >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >>>>
> > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > >>>> see different behavior between the API and executing in Luke.
> > >>>>
> > >>>> If you use a version of Luke which more closely aligns with the
> > >>>> version
> > >>> of
> > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > >>>> enough since the 2.9.x releases were previews of the 3.0.x
> > >>>> releases as I understood it) what behavior do you see?
> > >>>>
> > >>>> Hope this helps,
> > >>>>
> > >>>> Rob
> > >>>>
> > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >>>>
> > >>>>> If I run a query against my index using QueryParser to query a
> field:
> > >>>>>
> > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > >>>>>                 var topDocs = searcher.Search(query, 10);
> > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > >>>>>
> > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> > >>>>> yields
> > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
> > >>>>> both to create the index and to query.
> > >>>>>
> > >>>>> The field is defined as:
> > >>>>>
> > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > >>>>> Field.Index.NOT_ANALYZED)
> > >>>>>
> > >>>>> and is a string field. The result set back from Luke looks like
> > >>>>> (screencap):
> > >>>>>
> > >>>>> http://screencast.com/t/NooMK2Rf
> > >>>>>
> > >>>>> Thanks!
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> >
> >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

actually, that makes sense. Keyword analyzer would try for an exact match.  Since you are looking for prefix based search, your best option is to simply use PrefixQuery and there is no need to put a "*" for prefixquery.

-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 4:57 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

That is correct. I've verified in Luke 1.0.1 that both analyzers produce the same results.

To make it interesting, back in my code, I switched over to using the KeywordAnalyzer, and I'm still not getting any results against that NOT_ANALYZED field.

?

On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> Luke using keyword analyzer as default makes sense. However, in the 
> original post, there was a link to luke output screenshot which showed 
> that standard analyzer was in use for query parsing.
>
> -----Original Message-----
> From: Simon Svensson [mailto:sisve@devhost.se]
> Sent: Tuesday, June 26, 2012 2:56 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer"
> would become (Name:Jack DefaultField:Bauer). I believe you can have 
> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
> everything else) using a PerFieldAnalyzerWrapper.
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > QueryParser has no knowledge of how data was indexed.  For your
> scenario, I don't believe you would be able to use Query Parser with 
> standard analyzer when data was originally indexed with 
> Field.Index.NOT_ANALYZED option.
> >
> > Interesting question is why is luke working/finding the match?  I 
> > would
> have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect 
> > the
> case of the source string. I was hoping to avoid using the Query API 
> per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
> >
> >>>> var query = _parser.Parse("Id:BAUER*");
> >> In your code, most likely, the value got converted to lower case (i.e.
> >> bauer*) by the parse statement.
> >> Whereas indexed value is in upper case as it is not analyzed (from 
> >> screen shot).
> >>
> >> Can you explicitly try using prefix query?
> >>
> >>
> >>
> >>> Same results, apparently, when I use Luke 1.0.1.
> >>>
> >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> >>> custom app, zero.
> >>>
> >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> >> wrote:
> >>>> You appear to be using Luke 3.5 which per the information on the 
> >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >>>>
> >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> >>>> see different behavior between the API and executing in Luke.
> >>>>
> >>>> If you use a version of Luke which more closely aligns with the 
> >>>> version
> >>> of
> >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> >>>> enough since the 2.9.x releases were previews of the 3.0.x 
> >>>> releases as I understood it) what behavior do you see?
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rob
> >>>>
> >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >>>>
> >>>>> If I run a query against my index using QueryParser to query a field:
> >>>>>
> >>>>>                 var query = _parser.Parse("Id:BAUER*");
> >>>>>                 var topDocs = searcher.Search(query, 10);
> >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> >>>>>
> >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> >>>>> yields
> >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer 
> >>>>> both to create the index and to query.
> >>>>>
> >>>>> The field is defined as:
> >>>>>
> >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> >>>>> Field.Index.NOT_ANALYZED)
> >>>>>
> >>>>> and is a string field. The result set back from Luke looks like
> >>>>> (screencap):
> >>>>>
> >>>>> http://screencast.com/t/NooMK2Rf
> >>>>>
> >>>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>
>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

That is correct. I've verified in Luke 1.0.1 that both analyzers produce
the same results.

To make it interesting, back in my code, I switched over to using the
KeywordAnalyzer, and I'm still not getting any results against that
NOT_ANALYZED field.

?

On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> Luke using keyword analyzer as default makes sense. However, in the
> original post, there was a link to luke output screenshot which showed that
> standard analyzer was in use for query parsing.
>
> -----Original Message-----
> From: Simon Svensson [mailto:sisve@devhost.se]
> Sent: Tuesday, June 26, 2012 2:56 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer"
> would become (Name:Jack DefaultField:Bauer). I believe you can have
> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> everything else) using a PerFieldAnalyzerWrapper.
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > QueryParser has no knowledge of how data was indexed.  For your
> scenario, I don't believe you would be able to use Query Parser with
> standard analyzer when data was originally indexed with
> Field.Index.NOT_ANALYZED option.
> >
> > Interesting question is why is luke working/finding the match?  I would
> have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect the
> case of the source string. I was hoping to avoid using the Query API
> per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
> >
> >>>> var query = _parser.Parse("Id:BAUER*");
> >> In your code, most likely, the value got converted to lower case (i.e.
> >> bauer*) by the parse statement.
> >> Whereas indexed value is in upper case as it is not analyzed (from
> >> screen shot).
> >>
> >> Can you explicitly try using prefix query?
> >>
> >>
> >>
> >>> Same results, apparently, when I use Luke 1.0.1.
> >>>
> >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> >>> custom app, zero.
> >>>
> >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> >> wrote:
> >>>> You appear to be using Luke 3.5 which per the information on the
> >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >>>>
> >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> >>>> see different behavior between the API and executing in Luke.
> >>>>
> >>>> If you use a version of Luke which more closely aligns with the
> >>>> version
> >>> of
> >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> >>>> enough since the 2.9.x releases were previews of the 3.0.x releases
> >>>> as I understood it) what behavior do you see?
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rob
> >>>>
> >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >>>>
> >>>>> If I run a query against my index using QueryParser to query a field:
> >>>>>
> >>>>>                 var query = _parser.Parse("Id:BAUER*");
> >>>>>                 var topDocs = searcher.Search(query, 10);
> >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> >>>>>
> >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> >>>>> yields
> >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer both
> >>>>> to create the index and to query.
> >>>>>
> >>>>> The field is defined as:
> >>>>>
> >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> >>>>> Field.Index.NOT_ANALYZED)
> >>>>>
> >>>>> and is a string field. The result set back from Luke looks like
> >>>>> (screencap):
> >>>>>
> >>>>> http://screencast.com/t/NooMK2Rf
> >>>>>
> >>>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>
>
>
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Luke using keyword analyzer as default makes sense. However, in the original post, there was a link to luke output screenshot which showed that standard analyzer was in use for query parsing. 

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se] 
Sent: Tuesday, June 26, 2012 2:56 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Luke defaults to KeywordAnalyzer which wont change your term in any way. 
The QueryParser will still break up your query, so "Name:Jack Bauer" 
would become (Name:Jack DefaultField:Bauer). I believe you can have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for everything else) using a PerFieldAnalyzerWrapper.

On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.
>
> Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
>
>>>> var query = _parser.Parse("Id:BAUER*");
>> In your code, most likely, the value got converted to lower case (i.e.
>> bauer*) by the parse statement.
>> Whereas indexed value is in upper case as it is not analyzed (from 
>> screen shot).
>>
>> Can you explicitly try using prefix query?
>>
>>
>>
>>> Same results, apparently, when I use Luke 1.0.1.
>>>
>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
>>> custom app, zero.
>>>
>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>> wrote:
>>>> You appear to be using Luke 3.5 which per the information on the 
>>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>>>>
>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
>>>> see different behavior between the API and executing in Luke.
>>>>
>>>> If you use a version of Luke which more closely aligns with the 
>>>> version
>>> of
>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
>>>> enough since the 2.9.x releases were previews of the 3.0.x releases 
>>>> as I understood it) what behavior do you see?
>>>>
>>>> Hope this helps,
>>>>
>>>> Rob
>>>>
>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>
>>>>> If I run a query against my index using QueryParser to query a field:
>>>>>
>>>>>                 var query = _parser.Parse("Id:BAUER*");
>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>
>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
>>>>> yields
>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer both 
>>>>> to create the index and to query.
>>>>>
>>>>> The field is defined as:
>>>>>
>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>> Field.Index.NOT_ANALYZED)
>>>>>
>>>>> and is a string field. The result set back from Luke looks like
>>>>> (screencap):
>>>>>
>>>>> http://screencast.com/t/NooMK2Rf
>>>>>
>>>>> Thanks!
>>>>
>>>>
>>>>
>>>>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

So if you want to search a non-analyzed (non-tokenized) field, you should
not use StandardAnalyzer, but something like KeywordAnalyzer?

On Tue, Jun 26, 2012 at 3:56 PM, Simon Svensson <si...@devhost.se> wrote:

> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer" would
> become (Name:Jack DefaultField:Bauer). I believe you can have per-field
> analyzers (KeywordAnalyzer for Id, StandardAnalyzer for everything else)
> using a PerFieldAnalyzerWrapper.
>
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>
>> QueryParser has no knowledge of how data was indexed.  For your scenario,
>> I don't believe you would be able to use Query Parser with standard
>> analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
>> option.
>>
>> Interesting question is why is luke working/finding the match?  I would
>> have expected Luke to not find any matches.
>>
>>
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 12:54 PM
>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>
>> I can definitely try that. I just expected QueryParser would respect the
>> case of the source string. I was hoping to avoid using the Query API
>> per-se, and just let the parser to the work for me.
>>
>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>> wrote:
>>
>>  var query = _parser.Parse("Id:BAUER*");
>>>>>
>>>> In your code, most likely, the value got converted to lower case (i.e.
>>> bauer*) by the parse statement.
>>> Whereas indexed value is in upper case as it is not analyzed (from
>>> screen shot).
>>>
>>> Can you explicitly try using prefix query?
>>>
>>>
>>>
>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>
>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>> custom app, zero.
>>>>
>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>>>>
>>> wrote:
>>>
>>>> You appear to be using Luke 3.5 which per the information on the
>>>>> Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>> uses Lucene 3.5
>>>>>
>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
>>>>> see different behavior between the API and executing in Luke.
>>>>>
>>>>> If you use a version of Luke which more closely aligns with the
>>>>> version
>>>>>
>>>> of
>>>>
>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>> releases as I understood it) what behavior do you see?
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rob
>>>>>
>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>
>>>>>  If I run a query against my index using QueryParser to query a field:
>>>>>>
>>>>>>                var query = _parser.Parse("Id:BAUER*");
>>>>>>                var topDocs = searcher.Search(query, 10);
>>>>>>                Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>
>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>> yields
>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>> both to create the index and to query.
>>>>>>
>>>>>> The field is defined as:
>>>>>>
>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>
>>>>>> and is a string field. The result set back from Luke looks like
>>>>>> (screencap):
>>>>>>
>>>>>> http://screencast.com/t/**NooMK2Rf <http://screencast.com/t/NooMK2Rf>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Simon Svensson <si...@devhost.se>.

Luke defaults to KeywordAnalyzer which wont change your term in any way. 
The QueryParser will still break up your query, so "Name:Jack Bauer" 
would become (Name:Jack DefaultField:Bauer). I believe you can have 
per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
everything else) using a PerFieldAnalyzerWrapper.

On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.
>
> Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
>
>>>> var query = _parser.Parse("Id:BAUER*");
>> In your code, most likely, the value got converted to lower case (i.e.
>> bauer*) by the parse statement.
>> Whereas indexed value is in upper case as it is not analyzed (from
>> screen shot).
>>
>> Can you explicitly try using prefix query?
>>
>>
>>
>>> Same results, apparently, when I use Luke 1.0.1.
>>>
>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>> custom app, zero.
>>>
>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>> wrote:
>>>> You appear to be using Luke 3.5 which per the information on the
>>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>>>>
>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
>>>> see different behavior between the API and executing in Luke.
>>>>
>>>> If you use a version of Luke which more closely aligns with the
>>>> version
>>> of
>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>> releases as I understood it) what behavior do you see?
>>>>
>>>> Hope this helps,
>>>>
>>>> Rob
>>>>
>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>
>>>>> If I run a query against my index using QueryParser to query a field:
>>>>>
>>>>>                 var query = _parser.Parse("Id:BAUER*");
>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>
>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>> yields
>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>> both to create the index and to query.
>>>>>
>>>>> The field is defined as:
>>>>>
>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>> Field.Index.NOT_ANALYZED)
>>>>>
>>>>> and is a string field. The result set back from Luke looks like
>>>>> (screencap):
>>>>>
>>>>> http://screencast.com/t/NooMK2Rf
>>>>>
>>>>> Thanks!
>>>>
>>>>
>>>>
>>>>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.

Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.


-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 12:54 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.

On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> >> var query = _parser.Parse("Id:BAUER*");
>
> In your code, most likely, the value got converted to lower case (i.e.
> bauer*) by the parse statement.
> Whereas indexed value is in upper case as it is not analyzed (from 
> screen shot).
>
> Can you explicitly try using prefix query?
>
>
>
> > Same results, apparently, when I use Luke 1.0.1.
> >
> > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > custom app, zero.
> >
> > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> wrote:
> >
> > > You appear to be using Luke 3.5 which per the information on the 
> > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >
> > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> > > see different behavior between the API and executing in Luke.
> > >
> > > If you use a version of Luke which more closely aligns with the 
> > > version
> > of
> > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > > enough since the 2.9.x releases were previews of the 3.0.x 
> > > releases as I understood it) what behavior do you see?
> > >
> > > Hope this helps,
> > >
> > > Rob
> > >
> > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >
> > > >If I run a query against my index using QueryParser to query a field:
> > > >
> > > >                var query = _parser.Parse("Id:BAUER*");
> > > >                var topDocs = searcher.Search(query, 10);
> > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > >
> > > >I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > > >yields
> > > >15 results, what am I doing wrong? I use the StandardAnalyzer 
> > > >both to create the index and to query.
> > > >
> > > >The field is defined as:
> > > >
> > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > >Field.Index.NOT_ANALYZED)
> > > >
> > > >and is a string field. The result set back from Luke looks like
> > > >(screencap):
> > > >
> > > >http://screencast.com/t/NooMK2Rf
> > > >
> > > >Thanks!
> > >
> > >
> > >
> > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

I can definitely try that. I just expected QueryParser would respect the
case of the source string. I was hoping to avoid using the Query API
per-se, and just let the parser to the work for me.

On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> >> var query = _parser.Parse("Id:BAUER*");
>
> In your code, most likely, the value got converted to lower case (i.e.
> bauer*) by the parse statement.
> Whereas indexed value is in upper case as it is not analyzed (from screen
> shot).
>
> Can you explicitly try using prefix query?
>
>
>
> > Same results, apparently, when I use Luke 1.0.1.
> >
> > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom
> > app, zero.
> >
> > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> wrote:
> >
> > > You appear to be using Luke 3.5 which per the information on the
> > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >
> > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > see different behavior between the API and executing in Luke.
> > >
> > > If you use a version of Luke which more closely aligns with the
> > > version
> > of
> > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > enough since the 2.9.x releases were previews of the 3.0.x releases
> > > as I understood it) what behavior do you see?
> > >
> > > Hope this helps,
> > >
> > > Rob
> > >
> > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >
> > > >If I run a query against my index using QueryParser to query a field:
> > > >
> > > >                var query = _parser.Parse("Id:BAUER*");
> > > >                var topDocs = searcher.Search(query, 10);
> > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > >
> > > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields
> > > >15 results, what am I doing wrong? I use the StandardAnalyzer both
> > > >to create the index and to query.
> > > >
> > > >The field is defined as:
> > > >
> > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > >Field.Index.NOT_ANALYZED)
> > > >
> > > >and is a string field. The result set back from Luke looks like
> > > >(screencap):
> > > >
> > > >http://screencast.com/t/NooMK2Rf
> > > >
> > > >Thanks!
> > >
> > >
> > >
> > >
> > >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

>> var query = _parser.Parse("Id:BAUER*");

In your code, most likely, the value got converted to lower case (i.e. bauer*) by the parse statement.
Whereas indexed value is in upper case as it is not analyzed (from screen shot).

Can you explicitly try using prefix query?



> Same results, apparently, when I use Luke 1.0.1.
>
> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom 
> app, zero.
>
> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>
> > You appear to be using Luke 3.5 which per the information on the 
> > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >
> > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> > see different behavior between the API and executing in Luke.
> >
> > If you use a version of Luke which more closely aligns with the 
> > version
> of
> > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > enough since the 2.9.x releases were previews of the 3.0.x releases 
> > as I understood it) what behavior do you see?
> >
> > Hope this helps,
> >
> > Rob
> >
> > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >
> > >If I run a query against my index using QueryParser to query a field:
> > >
> > >                var query = _parser.Parse("Id:BAUER*");
> > >                var topDocs = searcher.Search(query, 10);
> > >                Assert.AreEqual(count, topDocs.TotalHits);
> > >
> > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 
> > >15 results, what am I doing wrong? I use the StandardAnalyzer both 
> > >to create the index and to query.
> > >
> > >The field is defined as:
> > >
> > >new Field("Id", myObject.Id, Field.Store.YES, 
> > >Field.Index.NOT_ANALYZED)
> > >
> > >and is a string field. The result set back from Luke looks like
> > >(screencap):
> > >
> > >http://screencast.com/t/NooMK2Rf
> > >
> > >Thanks!
> >
> >
> >
> >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Itamar Syn-Hershko <it...@code972.com>.

It doesn't matter what analyzer you use if you do  Field.Index.NOT_ANALYZED

On Tue, Jun 26, 2012 at 9:48 PM, Rob Cecil <ro...@gmail.com> wrote:

> Same results, apparently, when I use Luke 1.0.1.
>
> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom app,
> zero.
>
> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>
> > You appear to be using Luke 3.5 which per the information on the Luke
> > homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >
> > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> > different behavior between the API and executing in Luke.
> >
> > If you use a version of Luke which more closely aligns with the version
> of
> > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> > since the 2.9.x releases were previews of the 3.0.x releases as I
> > understood it) what behavior do you see?
> >
> > Hope this helps,
> >
> > Rob
> >
> > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >
> > >If I run a query against my index using QueryParser to query a field:
> > >
> > >                var query = _parser.Parse("Id:BAUER*");
> > >                var topDocs = searcher.Search(query, 10);
> > >                Assert.AreEqual(count, topDocs.TotalHits);
> > >
> > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> > >results, what am I doing wrong? I use the StandardAnalyzer both to
> > >create the index and to query.
> > >
> > >The field is defined as:
> > >
> > >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> > >
> > >and is a string field. The result set back from Luke looks like
> > >(screencap):
> > >
> > >http://screencast.com/t/NooMK2Rf
> > >
> > >Thanks!
> >
> >
> >
> >
> >
>