You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by Rob Cecil <ro...@gmail.com> on 2012/06/26 19:50:59 UTC

Disparity between API usage and Luke

If I run a query against my index using QueryParser to query a field:

                var query = _parser.Parse("Id:BAUER*");
                var topDocs = searcher.Search(query, 10);
                Assert.AreEqual(count, topDocs.TotalHits);

I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
results, what am I doing wrong? I use the StandardAnalyzer both to
create the index and to query.

The field is defined as:

new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)

and is a string field. The result set back from Luke looks like (screencap):

http://screencast.com/t/NooMK2Rf

Thanks!

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

It certainly isn't intuitive to me. I would have expected the exact
opposite (almost). Why do I get results in Luke but not when I use the API?

On Tue, Jun 26, 2012 at 12:48 PM, Prescott Nasser <ge...@hotmail.com>wrote:

>
> Might also be some minor file format changes ( I have not had a chance to
> closely examine them against your issue):
>
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
>
>
> ----------------------------------------
> > Date: Tue, 26 Jun 2012 11:31:25 -0700
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > From: rvesse@dotnetrdf.org
> > To: lucene-net-user@lucene.apache.org
> >
> > You appear to be using Luke 3.5 which per the information on the Luke
> > homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >
> > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> > different behavior between the API and executing in Luke.
> >
> > If you use a version of Luke which more closely aligns with the version
> of
> > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> > since the 2.9.x releases were previews of the 3.0.x releases as I
> > understood it) what behavior do you see?
> >
> > Hope this helps,
> >
> > Rob
> >
> > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >
> > >If I run a query against my index using QueryParser to query a field:
> > >
> > > var query = _parser.Parse("Id:BAUER*");
> > > var topDocs = searcher.Search(query, 10);
> > > Assert.AreEqual(count, topDocs.TotalHits);
> > >
> > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> > >results, what am I doing wrong? I use the StandardAnalyzer both to
> > >create the index and to query.
> > >
> > >The field is defined as:
> > >
> > >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> > >
> > >and is a string field. The result set back from Luke looks like
> > >(screencap):
> > >
> > >http://screencast.com/t/NooMK2Rf
> > >
> > >Thanks!
> >
> >
> >
> >
>
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by Prescott Nasser <ge...@hotmail.com>.

Might also be some minor file format changes ( I have not had a chance to closely examine them against your issue):
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html 

----------------------------------------
> Date: Tue, 26 Jun 2012 11:31:25 -0700
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> From: rvesse@dotnetrdf.org
> To: lucene-net-user@lucene.apache.org
>
> You appear to be using Luke 3.5 which per the information on the Luke
> homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>
> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> different behavior between the API and executing in Luke.
>
> If you use a version of Luke which more closely aligns with the version of
> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> since the 2.9.x releases were previews of the 3.0.x releases as I
> understood it) what behavior do you see?
>
> Hope this helps,
>
> Rob
>
> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>
> >If I run a query against my index using QueryParser to query a field:
> >
> > var query = _parser.Parse("Id:BAUER*");
> > var topDocs = searcher.Search(query, 10);
> > Assert.AreEqual(count, topDocs.TotalHits);
> >
> >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> >results, what am I doing wrong? I use the StandardAnalyzer both to
> >create the index and to query.
> >
> >The field is defined as:
> >
> >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> >
> >and is a string field. The result set back from Luke looks like
> >(screencap):
> >
> >http://screencast.com/t/NooMK2Rf
> >
> >Thanks!
>
>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Itamar Syn-Hershko <it...@code972.com>.

It is, if you have any uppercase letters for example. If StandardAnalyzer
is passed to QueryParser and you search on a non-analyzed field, you won't
be able to find it.

On Wed, Jun 27, 2012 at 12:15 AM, Rob Cecil <ro...@gmail.com> wrote:

> Well the field is "Id" - which contains unique, non-recurring terms. So I
> don't think mapping it to Field.Index.ANALYZED makes sense, does it? If it
> is mapped as NOT_ANALYZED, it is still indexed, so should I be able to
> issue a query like "Id:BAUER*" ?
>
> On Tue, Jun 26, 2012 at 3:06 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > QueryParser has no knowledge of how data was indexed.  For your scenario,
> > I don't believe you would be able to use Query Parser with standard
> > analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
> > option.
> >
> > Interesting question is why is luke working/finding the match?  I would
> > have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect the
> > case of the source string. I was hoping to avoid using the Query API
> > per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > >> var query = _parser.Parse("Id:BAUER*");
> > >
> > > In your code, most likely, the value got converted to lower case (i.e.
> > > bauer*) by the parse statement.
> > > Whereas indexed value is in upper case as it is not analyzed (from
> > > screen shot).
> > >
> > > Can you explicitly try using prefix query?
> > >
> > >
> > >
> > > > Same results, apparently, when I use Luke 1.0.1.
> > > >
> > > > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > > custom app, zero.
> > > >
> > > > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > > wrote:
> > > >
> > > > > You appear to be using Luke 3.5 which per the information on the
> > > > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > > > >
> > > > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > > > see different behavior between the API and executing in Luke.
> > > > >
> > > > > If you use a version of Luke which more closely aligns with the
> > > > > version
> > > > of
> > > > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > > > enough since the 2.9.x releases were previews of the 3.0.x
> > > > > releases as I understood it) what behavior do you see?
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Rob
> > > > >
> > > > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > > >
> > > > > >If I run a query against my index using QueryParser to query a
> > field:
> > > > > >
> > > > > >                var query = _parser.Parse("Id:BAUER*");
> > > > > >                var topDocs = searcher.Search(query, 10);
> > > > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > > > >
> > > > > >I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > > > >yields
> > > > > >15 results, what am I doing wrong? I use the StandardAnalyzer
> > > > > >both to create the index and to query.
> > > > > >
> > > > > >The field is defined as:
> > > > > >
> > > > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > > > >Field.Index.NOT_ANALYZED)
> > > > > >
> > > > > >and is a string field. The result set back from Luke looks like
> > > > > >(screencap):
> > > > > >
> > > > > >http://screencast.com/t/NooMK2Rf
> > > > > >
> > > > > >Thanks!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Well the field is "Id" - which contains unique, non-recurring terms. So I
don't think mapping it to Field.Index.ANALYZED makes sense, does it? If it
is mapped as NOT_ANALYZED, it is still indexed, so should I be able to
issue a query like "Id:BAUER*" ?

On Tue, Jun 26, 2012 at 3:06 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> QueryParser has no knowledge of how data was indexed.  For your scenario,
> I don't believe you would be able to use Query Parser with standard
> analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
> option.
>
> Interesting question is why is luke working/finding the match?  I would
> have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the
> case of the source string. I was hoping to avoid using the Query API
> per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > >> var query = _parser.Parse("Id:BAUER*");
> >
> > In your code, most likely, the value got converted to lower case (i.e.
> > bauer*) by the parse statement.
> > Whereas indexed value is in upper case as it is not analyzed (from
> > screen shot).
> >
> > Can you explicitly try using prefix query?
> >
> >
> >
> > > Same results, apparently, when I use Luke 1.0.1.
> > >
> > > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > custom app, zero.
> > >
> > > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > wrote:
> > >
> > > > You appear to be using Luke 3.5 which per the information on the
> > > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > > >
> > > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > > see different behavior between the API and executing in Luke.
> > > >
> > > > If you use a version of Luke which more closely aligns with the
> > > > version
> > > of
> > > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > > enough since the 2.9.x releases were previews of the 3.0.x
> > > > releases as I understood it) what behavior do you see?
> > > >
> > > > Hope this helps,
> > > >
> > > > Rob
> > > >
> > > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >
> > > > >If I run a query against my index using QueryParser to query a
> field:
> > > > >
> > > > >                var query = _parser.Parse("Id:BAUER*");
> > > > >                var topDocs = searcher.Search(query, 10);
> > > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > > >
> > > > >I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > > >yields
> > > > >15 results, what am I doing wrong? I use the StandardAnalyzer
> > > > >both to create the index and to query.
> > > > >
> > > > >The field is defined as:
> > > > >
> > > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > > >Field.Index.NOT_ANALYZED)
> > > > >
> > > > >and is a string field. The result set back from Luke looks like
> > > > >(screencap):
> > > > >
> > > > >http://screencast.com/t/NooMK2Rf
> > > > >
> > > > >Thanks!
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Yeah sorry, I should have created 7 documents in the testindex - in my rush to get a standalone test done and emailed out I botched that. Thanks for the insight into the case issue with the KeywordAnalyzer. I'm starting to think how I might structure my application to possibly use the Query API in conjunction with the QueryParser. But, QueryParser is very compelling. 

Sent from my iPhone

On Jun 26, 2012, at 9:28 PM, "Lingam, ChandraMohan J" <ch...@intel.com> wrote:

> Interestingly, the query generated from this var query = queryParser.Parse("Id:BAUER*") is converted to lower case "bauer*" eventhough you are using KeywordAnalyzer.  I am not sure if this is the intended behavior of the keyword analyzer.
> 
> So, best option to make this example work is to index in lowercase:
>            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, Field.Index.NOT_ANALYZED));
> 
> Also, the assert will always fail because hit count even when it matches will be 1 since there is only one document with several values associated with the field.  You would need to iterate thru the fields.  If you want to match 6 documents, then you have to add as six separate documents instead one document will all the values.
> 
> 
> 
> 
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com] 
> Sent: Tuesday, June 26, 2012 6:55 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> 
> Sure, this is self-contained:
> 
> [Test]
>        public void QueryNonAnalyzedField()
>        {
>            var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>            var analyzer = new KeywordAnalyzer();
>            var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
>            var document = new Document();
>            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, Field.Index.NOT_ANALYZED));
>            writer.AddDocument(document);
>            writer.Optimize();
>            writer.Close();
> 
>            IndexReader reader = IndexReader.Open(directory, true);
>            var queryParser = new QueryParser(Version.LUCENE_29, "content", analyzer);
>            var query = queryParser.Parse("Id:BAUER*");
>            var indexSearch = new IndexSearcher(reader);
>            var hits = indexSearch.Search(query);
>            Assert.AreEqual(6, hits.Length());
>        }
> 
> 
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
> 
>> Just did a simple test and Keywordanalyzer does indeed work like a 
>> prefix query if you put a star at the end. Agree with Simon.  Most 
>> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>> 
>> Please post a small snippet of your index code and query code...
>> 
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>> 
>> Thanks, and there is no equivalent QueryParser syntax for that?
>> 
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
>> chandramohan.j.lingam@intel.com> wrote:
>> 
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>> Since you are looking for prefix based search, your best option is 
>>> to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>> 
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>> 
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers 
>>> produce the same results.
>>> 
>>> To make it interesting, back in my code, I switched over to using 
>>> the KeywordAnalyzer, and I'm still not getting any results against 
>>> that NOT_ANALYZED field.
>>> 
>>> ?
>>> 
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
>>> chandramohan.j.lingam@intel.com> wrote:
>>> 
>>>> Luke using keyword analyzer as default makes sense. However, in 
>>>> the original post, there was a link to luke output screenshot 
>>>> which showed that standard analyzer was in use for query parsing.
>>>> 
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>> 
>>>> Luke defaults to KeywordAnalyzer which wont change your term in 
>>>> any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can 
>>>> have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
>>>> for everything else) using a PerFieldAnalyzerWrapper.
>>>> 
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser 
>>>> with standard analyzer when data was originally indexed with 
>>>> Field.Index.NOT_ANALYZED option.
>>>>> 
>>>>> Interesting question is why is luke working/finding the match?  
>>>>> I would
>>>> have expected Luke to not find any matches.
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>> 
>>>>> I can definitely try that. I just expected QueryParser would 
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query 
>>>> API per-se, and just let the parser to the work for me.
>>>>> 
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.com> wrote:
>>>>> 
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower 
>>>>>> case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed 
>>>>>> (from screen shot).
>>>>>> 
>>>>>> Can you explicitly try using prefix query?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>> 
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
>>>>>>> custom app, zero.
>>>>>>> 
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
>>>>>>> <rv...@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information on 
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses 
>>>>>>>> Lucene
>>>>>>>> 3.5
>>>>>>>> 
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
>>>>>>>> surprised to see different behavior between the API and executing in Luke.
>>>>>>>> 
>>>>>>>> If you use a version of Luke which more closely aligns with 
>>>>>>>> the version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
>>>>>>>> close enough since the 2.9.x releases were previews of the 
>>>>>>>> 3.0.x releases as I understood it) what behavior do you see?
>>>>>>>> 
>>>>>>>> Hope this helps,
>>>>>>>> 
>>>>>>>> Rob
>>>>>>>> 
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> If I run a query against my index using QueryParser to query 
>>>>>>>>> a
>>> field:
>>>>>>>>> 
>>>>>>>>>                var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                var topDocs = searcher.Search(query, 10);
>>>>>>>>>                Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>> 
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the 
>>>>>>>>> StandardAnalyzer both to create the index and to query.
>>>>>>>>> 
>>>>>>>>> The field is defined as:
>>>>>>>>> 
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>> 
>>>>>>>>> and is a string field. The result set back from Luke looks 
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>> 
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Interestingly, the query generated from this var query = queryParser.Parse("Id:BAUER*") is converted to lower case "bauer*" eventhough you are using KeywordAnalyzer.  I am not sure if this is the intended behavior of the keyword analyzer.

So, best option to make this example work is to index in lowercase:
            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, Field.Index.NOT_ANALYZED));

Also, the assert will always fail because hit count even when it matches will be 1 since there is only one document with several values associated with the field.  You would need to iterate thru the fields.  If you want to match 6 documents, then you have to add as six separate documents instead one document will all the values.




-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 6:55 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Sure, this is self-contained:

[Test]
        public void QueryNonAnalyzedField()
        {
            var indexPath = Path.Combine(Environment.CurrentDirectory,
"testindex");
            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
            var analyzer = new KeywordAnalyzer();
            var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
            var document = new Document();
            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, Field.Index.NOT_ANALYZED));
            writer.AddDocument(document);
            writer.Optimize();
            writer.Close();

            IndexReader reader = IndexReader.Open(directory, true);
            var queryParser = new QueryParser(Version.LUCENE_29, "content", analyzer);
            var query = queryParser.Parse("Id:BAUER*");
            var indexSearch = new IndexSearcher(reader);
            var hits = indexSearch.Search(query);
            Assert.AreEqual(6, hits.Length());
        }


On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> Just did a simple test and Keywordanalyzer does indeed work like a 
> prefix query if you put a star at the end. Agree with Simon.  Most 
> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>
> Please post a small snippet of your index code and query code...
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 5:25 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Thanks, and there is no equivalent QueryParser syntax for that?
>
> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lingam@intel.com> wrote:
>
> > actually, that makes sense. Keyword analyzer would try for an exact
> match.
> >  Since you are looking for prefix based search, your best option is 
> > to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 4:57 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > That is correct. I've verified in Luke 1.0.1 that both analyzers 
> > produce the same results.
> >
> > To make it interesting, back in my code, I switched over to using 
> > the KeywordAnalyzer, and I'm still not getting any results against 
> > that NOT_ANALYZED field.
> >
> > ?
> >
> > On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > Luke using keyword analyzer as default makes sense. However, in 
> > > the original post, there was a link to luke output screenshot 
> > > which showed that standard analyzer was in use for query parsing.
> > >
> > > -----Original Message-----
> > > From: Simon Svensson [mailto:sisve@devhost.se]
> > > Sent: Tuesday, June 26, 2012 2:56 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > Luke defaults to KeywordAnalyzer which wont change your term in 
> > > any
> way.
> > > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > > would become (Name:Jack DefaultField:Bauer). I believe you can 
> > > have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
> > > for everything else) using a PerFieldAnalyzerWrapper.
> > >
> > > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > > QueryParser has no knowledge of how data was indexed.  For your
> > > scenario, I don't believe you would be able to use Query Parser 
> > > with standard analyzer when data was originally indexed with 
> > > Field.Index.NOT_ANALYZED option.
> > > >
> > > > Interesting question is why is luke working/finding the match?  
> > > > I would
> > > have expected Luke to not find any matches.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > > >
> > > > I can definitely try that. I just expected QueryParser would 
> > > > respect the
> > > case of the source string. I was hoping to avoid using the Query 
> > > API per-se, and just let the parser to the work for me.
> > > >
> > > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > > chandramohan.j.lingam@intel.com> wrote:
> > > >
> > > >>>> var query = _parser.Parse("Id:BAUER*");
> > > >> In your code, most likely, the value got converted to lower 
> > > >> case
> (i.e.
> > > >> bauer*) by the parse statement.
> > > >> Whereas indexed value is in upper case as it is not analyzed 
> > > >> (from screen shot).
> > > >>
> > > >> Can you explicitly try using prefix query?
> > > >>
> > > >>
> > > >>
> > > >>> Same results, apparently, when I use Luke 1.0.1.
> > > >>>
> > > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > > >>> custom app, zero.
> > > >>>
> > > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
> > > >>> <rv...@dotnetrdf.org>
> > > >> wrote:
> > > >>>> You appear to be using Luke 3.5 which per the information on 
> > > >>>> the Luke homepage (http://code.google.com/p/luke/) uses 
> > > >>>> Lucene
> > > >>>> 3.5
> > > >>>>
> > > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
> > > >>>> surprised to see different behavior between the API and executing in Luke.
> > > >>>>
> > > >>>> If you use a version of Luke which more closely aligns with 
> > > >>>> the version
> > > >>> of
> > > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
> > > >>>> close enough since the 2.9.x releases were previews of the 
> > > >>>> 3.0.x releases as I understood it) what behavior do you see?
> > > >>>>
> > > >>>> Hope this helps,
> > > >>>>
> > > >>>> Rob
> > > >>>>
> > > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >>>>
> > > >>>>> If I run a query against my index using QueryParser to query 
> > > >>>>> a
> > field:
> > > >>>>>
> > > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > > >>>>>                 var topDocs = searcher.Search(query, 10);
> > > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > > >>>>>
> > > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > > >>>>> yields
> > > >>>>> 15 results, what am I doing wrong? I use the 
> > > >>>>> StandardAnalyzer both to create the index and to query.
> > > >>>>>
> > > >>>>> The field is defined as:
> > > >>>>>
> > > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > > >>>>> Field.Index.NOT_ANALYZED)
> > > >>>>>
> > > >>>>> and is a string field. The result set back from Luke looks 
> > > >>>>> like
> > > >>>>> (screencap):
> > > >>>>>
> > > >>>>> http://screencast.com/t/NooMK2Rf
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > >
> > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

And the prize goes to Simon for figuring out the quandary about why Luke
behaved differently. Indeed Luke seems to default its QP to have
SetLowercaseExpandedTerms set to false also. Check out this screenshot:

http://screencast.com/t/zb2jNT3wAM

Notice the checkbox "Lowercase expanded terms..." is unchecked.

On Wed, Jun 27, 2012 at 10:05 AM, Rob Cecil <ro...@gmail.com> wrote:

> Thanks Simon that works - even with StandardAnalyzer! :)
>
>
> On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <si...@devhost.se> wrote:
>
>> Set queryParser.**SetLowercaseExpandedTerms(**false);
>>
>>
>> On 2012-06-27 03:55, Rob Cecil wrote:
>>
>>> Sure, this is self-contained:
>>>
>>> [Test]
>>>         public void QueryNonAnalyzedField()
>>>         {
>>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>>> "testindex");
>>>             var directory = FSDirectory.Open(new
>>> DirectoryInfo(indexPath));
>>>             var analyzer = new KeywordAnalyzer();
>>>             var writer = new IndexWriter(directory, analyzer, true,
>>> IndexWriter.MaxFieldLength.**LIMITED);
>>>             var document = new Document();
>>>             document.Add(new Field("Id", "BAUERREVENUE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERLOCATION",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCT",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERSTATE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERTOTAL",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>             writer.AddDocument(document);
>>>             writer.Optimize();
>>>             writer.Close();
>>>
>>>             IndexReader reader = IndexReader.Open(directory, true);
>>>             var queryParser = new QueryParser(Version.LUCENE_29,
>>> "content", analyzer);
>>>             var query = queryParser.Parse("Id:BAUER*")**;
>>>             var indexSearch = new IndexSearcher(reader);
>>>             var hits = indexSearch.Search(query);
>>>             Assert.AreEqual(6, hits.Length());
>>>         }
>>>
>>>
>>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>> wrote:
>>>
>>>  Just did a simple test and Keywordanalyzer does indeed work like a
>>>> prefix
>>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>>> was
>>>> using keyword analyzer and somehow UI was not reflecting it?
>>>>
>>>> Please post a small snippet of your index code and query code...
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>>
>>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>> wrote:
>>>>
>>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>>
>>>> match.
>>>>
>>>>>  Since you are looking for prefix based search, your best option is to
>>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>>> prefixquery.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>>> produce the same results.
>>>>>
>>>>> To make it interesting, back in my code, I switched over to using the
>>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>>> NOT_ANALYZED field.
>>>>>
>>>>> ?
>>>>>
>>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>> wrote:
>>>>>
>>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>>> original post, there was a link to luke output screenshot which
>>>>>> showed that standard analyzer was in use for query parsing.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>>
>>>>> way.
>>>>
>>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>>
>>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>>
>>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>>
>>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>>> standard analyzer when data was originally indexed with
>>>>>> Field.Index.NOT_ANALYZED option.
>>>>>>
>>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>>> would
>>>>>>>
>>>>>> have expected Luke to not find any matches.
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>>
>>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>>> respect the
>>>>>>>
>>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>>> per-se, and just let the parser to the work for me.
>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>>
>>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>>> wrote:
>>>>>>
>>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>
>>>>>>>>> In your code, most likely, the value got converted to lower case
>>>>>>>>
>>>>>>> (i.e.
>>>>
>>>>>  bauer*) by the parse statement.
>>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>>> (from screen shot).
>>>>>>>>
>>>>>>>> Can you explicitly try using prefix query?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>>
>>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>>>> custom app, zero.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>>> <rv...@dotnetrdf.org>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>>>> the Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>>> uses Lucene
>>>>>>>>>> 3.5
>>>>>>>>>>
>>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>>>
>>>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>>>> version
>>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>  If I run a query against my index using QueryParser to query a
>>>>>>>>>>>
>>>>>>>>>> field:
>>>>>
>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>>
>>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>>>> yields
>>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>>
>>>>>>>>>>> The field is defined as:
>>>>>>>>>>>
>>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>>
>>>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>>>> like
>>>>>>>>>>> (screencap):
>>>>>>>>>>>
>>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>
>>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Thanks Simon that works - even with StandardAnalyzer! :)

On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <si...@devhost.se> wrote:

> Set queryParser.**SetLowercaseExpandedTerms(**false);
>
>
> On 2012-06-27 03:55, Rob Cecil wrote:
>
>> Sure, this is self-contained:
>>
>> [Test]
>>         public void QueryNonAnalyzedField()
>>         {
>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>> "testindex");
>>             var directory = FSDirectory.Open(new
>> DirectoryInfo(indexPath));
>>             var analyzer = new KeywordAnalyzer();
>>             var writer = new IndexWriter(directory, analyzer, true,
>> IndexWriter.MaxFieldLength.**LIMITED);
>>             var document = new Document();
>>             document.Add(new Field("Id", "BAUERREVENUE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERLOCATION",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCT",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERSTATE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERTOTAL",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>> Field.Index.NOT_ANALYZED));
>>             writer.AddDocument(document);
>>             writer.Optimize();
>>             writer.Close();
>>
>>             IndexReader reader = IndexReader.Open(directory, true);
>>             var queryParser = new QueryParser(Version.LUCENE_29,
>> "content", analyzer);
>>             var query = queryParser.Parse("Id:BAUER*")**;
>>             var indexSearch = new IndexSearcher(reader);
>>             var hits = indexSearch.Search(query);
>>             Assert.AreEqual(6, hits.Length());
>>         }
>>
>>
>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>> wrote:
>>
>>  Just did a simple test and Keywordanalyzer does indeed work like a prefix
>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>> was
>>> using keyword analyzer and somehow UI was not reflecting it?
>>>
>>> Please post a small snippet of your index code and query code...
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>
>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>> wrote:
>>>
>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>
>>> match.
>>>
>>>>  Since you are looking for prefix based search, your best option is to
>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>> prefixquery.
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>> produce the same results.
>>>>
>>>> To make it interesting, back in my code, I switched over to using the
>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>> NOT_ANALYZED field.
>>>>
>>>> ?
>>>>
>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>> wrote:
>>>>
>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>> original post, there was a link to luke output screenshot which
>>>>> showed that standard analyzer was in use for query parsing.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>
>>>> way.
>>>
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>
>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>
>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>
>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>> standard analyzer when data was originally indexed with
>>>>> Field.Index.NOT_ANALYZED option.
>>>>>
>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>> would
>>>>>>
>>>>> have expected Luke to not find any matches.
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>> respect the
>>>>>>
>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>> per-se, and just let the parser to the work for me.
>>>>>
>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>
>>>>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>>>>> wrote:
>>>>>
>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>
>>>>>>>> In your code, most likely, the value got converted to lower case
>>>>>>>
>>>>>> (i.e.
>>>
>>>>  bauer*) by the parse statement.
>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>> (from screen shot).
>>>>>>>
>>>>>>> Can you explicitly try using prefix query?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>
>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>>> custom app, zero.
>>>>>>>>
>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>> <rv...@dotnetrdf.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>>> the Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>> uses Lucene
>>>>>>>>> 3.5
>>>>>>>>>
>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>>
>>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>>> version
>>>>>>>>>
>>>>>>>> of
>>>>>>>>
>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>  If I run a query against my index using QueryParser to query a
>>>>>>>>>>
>>>>>>>>> field:
>>>>
>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>
>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>>> yields
>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>
>>>>>>>>>> The field is defined as:
>>>>>>>>>>
>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>
>>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>>> like
>>>>>>>>>> (screencap):
>>>>>>>>>>
>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Simon Svensson <si...@devhost.se>.

Set queryParser.SetLowercaseExpandedTerms(false);

On 2012-06-27 03:55, Rob Cecil wrote:
> Sure, this is self-contained:
>
> [Test]
>          public void QueryNonAnalyzedField()
>          {
>              var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>              var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>              var analyzer = new KeywordAnalyzer();
>              var writer = new IndexWriter(directory, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
>              var document = new Document();
>              document.Add(new Field("Id", "BAUERREVENUE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERLOCATION",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCT",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCTLINE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERSTATE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERTOTAL",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>              writer.AddDocument(document);
>              writer.Optimize();
>              writer.Close();
>
>              IndexReader reader = IndexReader.Open(directory, true);
>              var queryParser = new QueryParser(Version.LUCENE_29,
> "content", analyzer);
>              var query = queryParser.Parse("Id:BAUER*");
>              var indexSearch = new IndexSearcher(reader);
>              var hits = indexSearch.Search(query);
>              Assert.AreEqual(6, hits.Length());
>          }
>
>
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
>> Just did a simple test and Keywordanalyzer does indeed work like a prefix
>> query if you put a star at the end. Agree with Simon.  Most likely luke was
>> using keyword analyzer and somehow UI was not reflecting it?
>>
>> Please post a small snippet of your index code and query code...
>>
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>
>> Thanks, and there is no equivalent QueryParser syntax for that?
>>
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.com> wrote:
>>
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>>   Since you are looking for prefix based search, your best option is to
>>> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>> produce the same results.
>>>
>>> To make it interesting, back in my code, I switched over to using the
>>> KeywordAnalyzer, and I'm still not getting any results against that
>>> NOT_ANALYZED field.
>>>
>>> ?
>>>
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.com> wrote:
>>>
>>>> Luke using keyword analyzer as default makes sense. However, in the
>>>> original post, there was a link to luke output screenshot which
>>>> showed that standard analyzer was in use for query parsing.
>>>>
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser with
>>>> standard analyzer when data was originally indexed with
>>>> Field.Index.NOT_ANALYZED option.
>>>>> Interesting question is why is luke working/finding the match?  I
>>>>> would
>>>> have expected Luke to not find any matches.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> I can definitely try that. I just expected QueryParser would
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query API
>>>> per-se, and just let the parser to the work for me.
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.com> wrote:
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>> (from screen shot).
>>>>>>
>>>>>> Can you explicitly try using prefix query?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>>
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>> custom app, zero.
>>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>> <rv...@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene
>>>>>>>> 3.5
>>>>>>>>
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>
>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>> version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> If I run a query against my index using QueryParser to query a
>>> field:
>>>>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                  var topDocs = searcher.Search(query, 10);
>>>>>>>>>                  Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>> both to create the index and to query.
>>>>>>>>>
>>>>>>>>> The field is defined as:
>>>>>>>>>
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>
>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>>
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Sure, this is self-contained:

[Test]
        public void QueryNonAnalyzedField()
        {
            var indexPath = Path.Combine(Environment.CurrentDirectory,
"testindex");
            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
            var analyzer = new KeywordAnalyzer();
            var writer = new IndexWriter(directory, analyzer, true,
IndexWriter.MaxFieldLength.LIMITED);
            var document = new Document();
            document.Add(new Field("Id", "BAUERREVENUE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERLOCATION",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCT",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCTLINE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERSTATE",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERTOTAL",
Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
Field.Index.NOT_ANALYZED));
            writer.AddDocument(document);
            writer.Optimize();
            writer.Close();

            IndexReader reader = IndexReader.Open(directory, true);
            var queryParser = new QueryParser(Version.LUCENE_29,
"content", analyzer);
            var query = queryParser.Parse("Id:BAUER*");
            var indexSearch = new IndexSearcher(reader);
            var hits = indexSearch.Search(query);
            Assert.AreEqual(6, hits.Length());
        }


On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> Just did a simple test and Keywordanalyzer does indeed work like a prefix
> query if you put a star at the end. Agree with Simon.  Most likely luke was
> using keyword analyzer and somehow UI was not reflecting it?
>
> Please post a small snippet of your index code and query code...
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 5:25 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Thanks, and there is no equivalent QueryParser syntax for that?
>
> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > actually, that makes sense. Keyword analyzer would try for an exact
> match.
> >  Since you are looking for prefix based search, your best option is to
> > simply use PrefixQuery and there is no need to put a "*" for prefixquery.
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 4:57 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > That is correct. I've verified in Luke 1.0.1 that both analyzers
> > produce the same results.
> >
> > To make it interesting, back in my code, I switched over to using the
> > KeywordAnalyzer, and I'm still not getting any results against that
> > NOT_ANALYZED field.
> >
> > ?
> >
> > On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > Luke using keyword analyzer as default makes sense. However, in the
> > > original post, there was a link to luke output screenshot which
> > > showed that standard analyzer was in use for query parsing.
> > >
> > > -----Original Message-----
> > > From: Simon Svensson [mailto:sisve@devhost.se]
> > > Sent: Tuesday, June 26, 2012 2:56 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > Luke defaults to KeywordAnalyzer which wont change your term in any
> way.
> > > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > > would become (Name:Jack DefaultField:Bauer). I believe you can have
> > > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> > > everything else) using a PerFieldAnalyzerWrapper.
> > >
> > > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > > QueryParser has no knowledge of how data was indexed.  For your
> > > scenario, I don't believe you would be able to use Query Parser with
> > > standard analyzer when data was originally indexed with
> > > Field.Index.NOT_ANALYZED option.
> > > >
> > > > Interesting question is why is luke working/finding the match?  I
> > > > would
> > > have expected Luke to not find any matches.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > > >
> > > > I can definitely try that. I just expected QueryParser would
> > > > respect the
> > > case of the source string. I was hoping to avoid using the Query API
> > > per-se, and just let the parser to the work for me.
> > > >
> > > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > > chandramohan.j.lingam@intel.com> wrote:
> > > >
> > > >>>> var query = _parser.Parse("Id:BAUER*");
> > > >> In your code, most likely, the value got converted to lower case
> (i.e.
> > > >> bauer*) by the parse statement.
> > > >> Whereas indexed value is in upper case as it is not analyzed
> > > >> (from screen shot).
> > > >>
> > > >> Can you explicitly try using prefix query?
> > > >>
> > > >>
> > > >>
> > > >>> Same results, apparently, when I use Luke 1.0.1.
> > > >>>
> > > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > > >>> custom app, zero.
> > > >>>
> > > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
> > > >>> <rv...@dotnetrdf.org>
> > > >> wrote:
> > > >>>> You appear to be using Luke 3.5 which per the information on
> > > >>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene
> > > >>>> 3.5
> > > >>>>
> > > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
> > > >>>> to see different behavior between the API and executing in Luke.
> > > >>>>
> > > >>>> If you use a version of Luke which more closely aligns with the
> > > >>>> version
> > > >>> of
> > > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > >>>> enough since the 2.9.x releases were previews of the 3.0.x
> > > >>>> releases as I understood it) what behavior do you see?
> > > >>>>
> > > >>>> Hope this helps,
> > > >>>>
> > > >>>> Rob
> > > >>>>
> > > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > > >>>>
> > > >>>>> If I run a query against my index using QueryParser to query a
> > field:
> > > >>>>>
> > > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > > >>>>>                 var topDocs = searcher.Search(query, 10);
> > > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > > >>>>>
> > > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> > > >>>>> yields
> > > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
> > > >>>>> both to create the index and to query.
> > > >>>>>
> > > >>>>> The field is defined as:
> > > >>>>>
> > > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > > >>>>> Field.Index.NOT_ANALYZED)
> > > >>>>>
> > > >>>>> and is a string field. The result set back from Luke looks
> > > >>>>> like
> > > >>>>> (screencap):
> > > >>>>>
> > > >>>>> http://screencast.com/t/NooMK2Rf
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > >
> > >
> > >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Just did a simple test and Keywordanalyzer does indeed work like a prefix query if you put a star at the end. Agree with Simon.  Most likely luke was using keyword analyzer and somehow UI was not reflecting it?

Please post a small snippet of your index code and query code...

-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 5:25 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Thanks, and there is no equivalent QueryParser syntax for that?

On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> actually, that makes sense. Keyword analyzer would try for an exact match.
>  Since you are looking for prefix based search, your best option is to 
> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 4:57 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> That is correct. I've verified in Luke 1.0.1 that both analyzers 
> produce the same results.
>
> To make it interesting, back in my code, I switched over to using the 
> KeywordAnalyzer, and I'm still not getting any results against that 
> NOT_ANALYZED field.
>
> ?
>
> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lingam@intel.com> wrote:
>
> > Luke using keyword analyzer as default makes sense. However, in the 
> > original post, there was a link to luke output screenshot which 
> > showed that standard analyzer was in use for query parsing.
> >
> > -----Original Message-----
> > From: Simon Svensson [mailto:sisve@devhost.se]
> > Sent: Tuesday, June 26, 2012 2:56 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > Luke defaults to KeywordAnalyzer which wont change your term in any way.
> > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > would become (Name:Jack DefaultField:Bauer). I believe you can have 
> > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
> > everything else) using a PerFieldAnalyzerWrapper.
> >
> > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > QueryParser has no knowledge of how data was indexed.  For your
> > scenario, I don't believe you would be able to use Query Parser with 
> > standard analyzer when data was originally indexed with 
> > Field.Index.NOT_ANALYZED option.
> > >
> > > Interesting question is why is luke working/finding the match?  I 
> > > would
> > have expected Luke to not find any matches.
> > >
> > >
> > > -----Original Message-----
> > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > I can definitely try that. I just expected QueryParser would 
> > > respect the
> > case of the source string. I was hoping to avoid using the Query API 
> > per-se, and just let the parser to the work for me.
> > >
> > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> > >
> > >>>> var query = _parser.Parse("Id:BAUER*");
> > >> In your code, most likely, the value got converted to lower case (i.e.
> > >> bauer*) by the parse statement.
> > >> Whereas indexed value is in upper case as it is not analyzed 
> > >> (from screen shot).
> > >>
> > >> Can you explicitly try using prefix query?
> > >>
> > >>
> > >>
> > >>> Same results, apparently, when I use Luke 1.0.1.
> > >>>
> > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > >>> custom app, zero.
> > >>>
> > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
> > >>> <rv...@dotnetrdf.org>
> > >> wrote:
> > >>>> You appear to be using Luke 3.5 which per the information on 
> > >>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene 
> > >>>> 3.5
> > >>>>
> > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised 
> > >>>> to see different behavior between the API and executing in Luke.
> > >>>>
> > >>>> If you use a version of Luke which more closely aligns with the 
> > >>>> version
> > >>> of
> > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > >>>> enough since the 2.9.x releases were previews of the 3.0.x 
> > >>>> releases as I understood it) what behavior do you see?
> > >>>>
> > >>>> Hope this helps,
> > >>>>
> > >>>> Rob
> > >>>>
> > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >>>>
> > >>>>> If I run a query against my index using QueryParser to query a
> field:
> > >>>>>
> > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > >>>>>                 var topDocs = searcher.Search(query, 10);
> > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > >>>>>
> > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > >>>>> yields
> > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer 
> > >>>>> both to create the index and to query.
> > >>>>>
> > >>>>> The field is defined as:
> > >>>>>
> > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > >>>>> Field.Index.NOT_ANALYZED)
> > >>>>>
> > >>>>> and is a string field. The result set back from Luke looks 
> > >>>>> like
> > >>>>> (screencap):
> > >>>>>
> > >>>>> http://screencast.com/t/NooMK2Rf
> > >>>>>
> > >>>>> Thanks!
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> >
> >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Thanks, and there is no equivalent QueryParser syntax for that?

On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> actually, that makes sense. Keyword analyzer would try for an exact match.
>  Since you are looking for prefix based search, your best option is to
> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 4:57 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> That is correct. I've verified in Luke 1.0.1 that both analyzers produce
> the same results.
>
> To make it interesting, back in my code, I switched over to using the
> KeywordAnalyzer, and I'm still not getting any results against that
> NOT_ANALYZED field.
>
> ?
>
> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
> > Luke using keyword analyzer as default makes sense. However, in the
> > original post, there was a link to luke output screenshot which showed
> > that standard analyzer was in use for query parsing.
> >
> > -----Original Message-----
> > From: Simon Svensson [mailto:sisve@devhost.se]
> > Sent: Tuesday, June 26, 2012 2:56 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > Luke defaults to KeywordAnalyzer which wont change your term in any way.
> > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > would become (Name:Jack DefaultField:Bauer). I believe you can have
> > per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> > everything else) using a PerFieldAnalyzerWrapper.
> >
> > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > QueryParser has no knowledge of how data was indexed.  For your
> > scenario, I don't believe you would be able to use Query Parser with
> > standard analyzer when data was originally indexed with
> > Field.Index.NOT_ANALYZED option.
> > >
> > > Interesting question is why is luke working/finding the match?  I
> > > would
> > have expected Luke to not find any matches.
> > >
> > >
> > > -----Original Message-----
> > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > I can definitely try that. I just expected QueryParser would respect
> > > the
> > case of the source string. I was hoping to avoid using the Query API
> > per-se, and just let the parser to the work for me.
> > >
> > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > chandramohan.j.lingam@intel.com> wrote:
> > >
> > >>>> var query = _parser.Parse("Id:BAUER*");
> > >> In your code, most likely, the value got converted to lower case (i.e.
> > >> bauer*) by the parse statement.
> > >> Whereas indexed value is in upper case as it is not analyzed (from
> > >> screen shot).
> > >>
> > >> Can you explicitly try using prefix query?
> > >>
> > >>
> > >>
> > >>> Same results, apparently, when I use Luke 1.0.1.
> > >>>
> > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> > >>> custom app, zero.
> > >>>
> > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> > >> wrote:
> > >>>> You appear to be using Luke 3.5 which per the information on the
> > >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >>>>
> > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > >>>> see different behavior between the API and executing in Luke.
> > >>>>
> > >>>> If you use a version of Luke which more closely aligns with the
> > >>>> version
> > >>> of
> > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > >>>> enough since the 2.9.x releases were previews of the 3.0.x
> > >>>> releases as I understood it) what behavior do you see?
> > >>>>
> > >>>> Hope this helps,
> > >>>>
> > >>>> Rob
> > >>>>
> > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >>>>
> > >>>>> If I run a query against my index using QueryParser to query a
> field:
> > >>>>>
> > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > >>>>>                 var topDocs = searcher.Search(query, 10);
> > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > >>>>>
> > >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> > >>>>> yields
> > >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
> > >>>>> both to create the index and to query.
> > >>>>>
> > >>>>> The field is defined as:
> > >>>>>
> > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > >>>>> Field.Index.NOT_ANALYZED)
> > >>>>>
> > >>>>> and is a string field. The result set back from Luke looks like
> > >>>>> (screencap):
> > >>>>>
> > >>>>> http://screencast.com/t/NooMK2Rf
> > >>>>>
> > >>>>> Thanks!
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> >
> >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

actually, that makes sense. Keyword analyzer would try for an exact match.  Since you are looking for prefix based search, your best option is to simply use PrefixQuery and there is no need to put a "*" for prefixquery.

-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 4:57 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

That is correct. I've verified in Luke 1.0.1 that both analyzers produce the same results.

To make it interesting, back in my code, I switched over to using the KeywordAnalyzer, and I'm still not getting any results against that NOT_ANALYZED field.

?

On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> Luke using keyword analyzer as default makes sense. However, in the 
> original post, there was a link to luke output screenshot which showed 
> that standard analyzer was in use for query parsing.
>
> -----Original Message-----
> From: Simon Svensson [mailto:sisve@devhost.se]
> Sent: Tuesday, June 26, 2012 2:56 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer"
> would become (Name:Jack DefaultField:Bauer). I believe you can have 
> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
> everything else) using a PerFieldAnalyzerWrapper.
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > QueryParser has no knowledge of how data was indexed.  For your
> scenario, I don't believe you would be able to use Query Parser with 
> standard analyzer when data was originally indexed with 
> Field.Index.NOT_ANALYZED option.
> >
> > Interesting question is why is luke working/finding the match?  I 
> > would
> have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect 
> > the
> case of the source string. I was hoping to avoid using the Query API 
> per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
> >
> >>>> var query = _parser.Parse("Id:BAUER*");
> >> In your code, most likely, the value got converted to lower case (i.e.
> >> bauer*) by the parse statement.
> >> Whereas indexed value is in upper case as it is not analyzed (from 
> >> screen shot).
> >>
> >> Can you explicitly try using prefix query?
> >>
> >>
> >>
> >>> Same results, apparently, when I use Luke 1.0.1.
> >>>
> >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> >>> custom app, zero.
> >>>
> >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> >> wrote:
> >>>> You appear to be using Luke 3.5 which per the information on the 
> >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >>>>
> >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> >>>> see different behavior between the API and executing in Luke.
> >>>>
> >>>> If you use a version of Luke which more closely aligns with the 
> >>>> version
> >>> of
> >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> >>>> enough since the 2.9.x releases were previews of the 3.0.x 
> >>>> releases as I understood it) what behavior do you see?
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rob
> >>>>
> >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >>>>
> >>>>> If I run a query against my index using QueryParser to query a field:
> >>>>>
> >>>>>                 var query = _parser.Parse("Id:BAUER*");
> >>>>>                 var topDocs = searcher.Search(query, 10);
> >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> >>>>>
> >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
> >>>>> yields
> >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer 
> >>>>> both to create the index and to query.
> >>>>>
> >>>>> The field is defined as:
> >>>>>
> >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> >>>>> Field.Index.NOT_ANALYZED)
> >>>>>
> >>>>> and is a string field. The result set back from Luke looks like
> >>>>> (screencap):
> >>>>>
> >>>>> http://screencast.com/t/NooMK2Rf
> >>>>>
> >>>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>
>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

That is correct. I've verified in Luke 1.0.1 that both analyzers produce
the same results.

To make it interesting, back in my code, I switched over to using the
KeywordAnalyzer, and I'm still not getting any results against that
NOT_ANALYZED field.

?

On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> Luke using keyword analyzer as default makes sense. However, in the
> original post, there was a link to luke output screenshot which showed that
> standard analyzer was in use for query parsing.
>
> -----Original Message-----
> From: Simon Svensson [mailto:sisve@devhost.se]
> Sent: Tuesday, June 26, 2012 2:56 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer"
> would become (Name:Jack DefaultField:Bauer). I believe you can have
> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
> everything else) using a PerFieldAnalyzerWrapper.
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > QueryParser has no knowledge of how data was indexed.  For your
> scenario, I don't believe you would be able to use Query Parser with
> standard analyzer when data was originally indexed with
> Field.Index.NOT_ANALYZED option.
> >
> > Interesting question is why is luke working/finding the match?  I would
> have expected Luke to not find any matches.
> >
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 12:54 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > I can definitely try that. I just expected QueryParser would respect the
> case of the source string. I was hoping to avoid using the Query API
> per-se, and just let the parser to the work for me.
> >
> > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
> >
> >>>> var query = _parser.Parse("Id:BAUER*");
> >> In your code, most likely, the value got converted to lower case (i.e.
> >> bauer*) by the parse statement.
> >> Whereas indexed value is in upper case as it is not analyzed (from
> >> screen shot).
> >>
> >> Can you explicitly try using prefix query?
> >>
> >>
> >>
> >>> Same results, apparently, when I use Luke 1.0.1.
> >>>
> >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
> >>> custom app, zero.
> >>>
> >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> >> wrote:
> >>>> You appear to be using Luke 3.5 which per the information on the
> >>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >>>>
> >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> >>>> see different behavior between the API and executing in Luke.
> >>>>
> >>>> If you use a version of Luke which more closely aligns with the
> >>>> version
> >>> of
> >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> >>>> enough since the 2.9.x releases were previews of the 3.0.x releases
> >>>> as I understood it) what behavior do you see?
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rob
> >>>>
> >>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >>>>
> >>>>> If I run a query against my index using QueryParser to query a field:
> >>>>>
> >>>>>                 var query = _parser.Parse("Id:BAUER*");
> >>>>>                 var topDocs = searcher.Search(query, 10);
> >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> >>>>>
> >>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
> >>>>> yields
> >>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer both
> >>>>> to create the index and to query.
> >>>>>
> >>>>> The field is defined as:
> >>>>>
> >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> >>>>> Field.Index.NOT_ANALYZED)
> >>>>>
> >>>>> and is a string field. The result set back from Luke looks like
> >>>>> (screencap):
> >>>>>
> >>>>> http://screencast.com/t/NooMK2Rf
> >>>>>
> >>>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>
>
>
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

Luke using keyword analyzer as default makes sense. However, in the original post, there was a link to luke output screenshot which showed that standard analyzer was in use for query parsing. 

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se] 
Sent: Tuesday, June 26, 2012 2:56 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Luke defaults to KeywordAnalyzer which wont change your term in any way. 
The QueryParser will still break up your query, so "Name:Jack Bauer" 
would become (Name:Jack DefaultField:Bauer). I believe you can have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for everything else) using a PerFieldAnalyzerWrapper.

On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.
>
> Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
>
>>>> var query = _parser.Parse("Id:BAUER*");
>> In your code, most likely, the value got converted to lower case (i.e.
>> bauer*) by the parse statement.
>> Whereas indexed value is in upper case as it is not analyzed (from 
>> screen shot).
>>
>> Can you explicitly try using prefix query?
>>
>>
>>
>>> Same results, apparently, when I use Luke 1.0.1.
>>>
>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
>>> custom app, zero.
>>>
>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>> wrote:
>>>> You appear to be using Luke 3.5 which per the information on the 
>>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>>>>
>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
>>>> see different behavior between the API and executing in Luke.
>>>>
>>>> If you use a version of Luke which more closely aligns with the 
>>>> version
>>> of
>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
>>>> enough since the 2.9.x releases were previews of the 3.0.x releases 
>>>> as I understood it) what behavior do you see?
>>>>
>>>> Hope this helps,
>>>>
>>>> Rob
>>>>
>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>
>>>>> If I run a query against my index using QueryParser to query a field:
>>>>>
>>>>>                 var query = _parser.Parse("Id:BAUER*");
>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>
>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
>>>>> yields
>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer both 
>>>>> to create the index and to query.
>>>>>
>>>>> The field is defined as:
>>>>>
>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>> Field.Index.NOT_ANALYZED)
>>>>>
>>>>> and is a string field. The result set back from Luke looks like
>>>>> (screencap):
>>>>>
>>>>> http://screencast.com/t/NooMK2Rf
>>>>>
>>>>> Thanks!
>>>>
>>>>
>>>>
>>>>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

So if you want to search a non-analyzed (non-tokenized) field, you should
not use StandardAnalyzer, but something like KeywordAnalyzer?

On Tue, Jun 26, 2012 at 3:56 PM, Simon Svensson <si...@devhost.se> wrote:

> Luke defaults to KeywordAnalyzer which wont change your term in any way.
> The QueryParser will still break up your query, so "Name:Jack Bauer" would
> become (Name:Jack DefaultField:Bauer). I believe you can have per-field
> analyzers (KeywordAnalyzer for Id, StandardAnalyzer for everything else)
> using a PerFieldAnalyzerWrapper.
>
>
> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>
>> QueryParser has no knowledge of how data was indexed.  For your scenario,
>> I don't believe you would be able to use Query Parser with standard
>> analyzer when data was originally indexed with Field.Index.NOT_ANALYZED
>> option.
>>
>> Interesting question is why is luke working/finding the match?  I would
>> have expected Luke to not find any matches.
>>
>>
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 12:54 PM
>> To: lucene-net-user@lucene.apache.**org<lu...@lucene.apache.org>
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>
>> I can definitely try that. I just expected QueryParser would respect the
>> case of the source string. I was hoping to avoid using the Query API
>> per-se, and just let the parser to the work for me.
>>
>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.**com <ch...@intel.com>>
>> wrote:
>>
>>  var query = _parser.Parse("Id:BAUER*");
>>>>>
>>>> In your code, most likely, the value got converted to lower case (i.e.
>>> bauer*) by the parse statement.
>>> Whereas indexed value is in upper case as it is not analyzed (from
>>> screen shot).
>>>
>>> Can you explicitly try using prefix query?
>>>
>>>
>>>
>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>
>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>> custom app, zero.
>>>>
>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>>>>
>>> wrote:
>>>
>>>> You appear to be using Luke 3.5 which per the information on the
>>>>> Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>> uses Lucene 3.5
>>>>>
>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
>>>>> see different behavior between the API and executing in Luke.
>>>>>
>>>>> If you use a version of Luke which more closely aligns with the
>>>>> version
>>>>>
>>>> of
>>>>
>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>> releases as I understood it) what behavior do you see?
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rob
>>>>>
>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>>
>>>>>  If I run a query against my index using QueryParser to query a field:
>>>>>>
>>>>>>                var query = _parser.Parse("Id:BAUER*");
>>>>>>                var topDocs = searcher.Search(query, 10);
>>>>>>                Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>
>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>> yields
>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>> both to create the index and to query.
>>>>>>
>>>>>> The field is defined as:
>>>>>>
>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>
>>>>>> and is a string field. The result set back from Luke looks like
>>>>>> (screencap):
>>>>>>
>>>>>> http://screencast.com/t/**NooMK2Rf <http://screencast.com/t/NooMK2Rf>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Simon Svensson <si...@devhost.se>.

Luke defaults to KeywordAnalyzer which wont change your term in any way. 
The QueryParser will still break up your query, so "Name:Jack Bauer" 
would become (Name:Jack DefaultField:Bauer). I believe you can have 
per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for 
everything else) using a PerFieldAnalyzerWrapper.

On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.
>
> Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.
>
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 12:54 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.
>
> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:
>
>>>> var query = _parser.Parse("Id:BAUER*");
>> In your code, most likely, the value got converted to lower case (i.e.
>> bauer*) by the parse statement.
>> Whereas indexed value is in upper case as it is not analyzed (from
>> screen shot).
>>
>> Can you explicitly try using prefix query?
>>
>>
>>
>>> Same results, apparently, when I use Luke 1.0.1.
>>>
>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>> custom app, zero.
>>>
>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
>> wrote:
>>>> You appear to be using Luke 3.5 which per the information on the
>>>> Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>>>>
>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
>>>> see different behavior between the API and executing in Luke.
>>>>
>>>> If you use a version of Luke which more closely aligns with the
>>>> version
>>> of
>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>> releases as I understood it) what behavior do you see?
>>>>
>>>> Hope this helps,
>>>>
>>>> Rob
>>>>
>>>> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>>>>
>>>>> If I run a query against my index using QueryParser to query a field:
>>>>>
>>>>>                 var query = _parser.Parse("Id:BAUER*");
>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>
>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>> yields
>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>> both to create the index and to query.
>>>>>
>>>>> The field is defined as:
>>>>>
>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>> Field.Index.NOT_ANALYZED)
>>>>>
>>>>> and is a string field. The result set back from Luke looks like
>>>>> (screencap):
>>>>>
>>>>> http://screencast.com/t/NooMK2Rf
>>>>>
>>>>> Thanks!
>>>>
>>>>
>>>>
>>>>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

QueryParser has no knowledge of how data was indexed.  For your scenario, I don't believe you would be able to use Query Parser with standard analyzer when data was originally indexed with Field.Index.NOT_ANALYZED option.

Interesting question is why is luke working/finding the match?  I would have expected Luke to not find any matches.


-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 12:54 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

I can definitely try that. I just expected QueryParser would respect the case of the source string. I was hoping to avoid using the Query API per-se, and just let the parser to the work for me.

On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com> wrote:

> >> var query = _parser.Parse("Id:BAUER*");
>
> In your code, most likely, the value got converted to lower case (i.e.
> bauer*) by the parse statement.
> Whereas indexed value is in upper case as it is not analyzed (from 
> screen shot).
>
> Can you explicitly try using prefix query?
>
>
>
> > Same results, apparently, when I use Luke 1.0.1.
> >
> > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
> > custom app, zero.
> >
> > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> wrote:
> >
> > > You appear to be using Luke 3.5 which per the information on the 
> > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >
> > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> > > see different behavior between the API and executing in Luke.
> > >
> > > If you use a version of Luke which more closely aligns with the 
> > > version
> > of
> > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > > enough since the 2.9.x releases were previews of the 3.0.x 
> > > releases as I understood it) what behavior do you see?
> > >
> > > Hope this helps,
> > >
> > > Rob
> > >
> > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >
> > > >If I run a query against my index using QueryParser to query a field:
> > > >
> > > >                var query = _parser.Parse("Id:BAUER*");
> > > >                var topDocs = searcher.Search(query, 10);
> > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > >
> > > >I get 0 for my TotalHits, yet in Luke, the same query phrase 
> > > >yields
> > > >15 results, what am I doing wrong? I use the StandardAnalyzer 
> > > >both to create the index and to query.
> > > >
> > > >The field is defined as:
> > > >
> > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > >Field.Index.NOT_ANALYZED)
> > > >
> > > >and is a string field. The result set back from Luke looks like
> > > >(screencap):
> > > >
> > > >http://screencast.com/t/NooMK2Rf
> > > >
> > > >Thanks!
> > >
> > >
> > >
> > >
> > >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

I can definitely try that. I just expected QueryParser would respect the
case of the source string. I was hoping to avoid using the Query API
per-se, and just let the parser to the work for me.

On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
chandramohan.j.lingam@intel.com> wrote:

> >> var query = _parser.Parse("Id:BAUER*");
>
> In your code, most likely, the value got converted to lower case (i.e.
> bauer*) by the parse statement.
> Whereas indexed value is in upper case as it is not analyzed (from screen
> shot).
>
> Can you explicitly try using prefix query?
>
>
>
> > Same results, apparently, when I use Luke 1.0.1.
> >
> > When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom
> > app, zero.
> >
> > On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org>
> wrote:
> >
> > > You appear to be using Luke 3.5 which per the information on the
> > > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> > >
> > > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to
> > > see different behavior between the API and executing in Luke.
> > >
> > > If you use a version of Luke which more closely aligns with the
> > > version
> > of
> > > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
> > > enough since the 2.9.x releases were previews of the 3.0.x releases
> > > as I understood it) what behavior do you see?
> > >
> > > Hope this helps,
> > >
> > > Rob
> > >
> > > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> > >
> > > >If I run a query against my index using QueryParser to query a field:
> > > >
> > > >                var query = _parser.Parse("Id:BAUER*");
> > > >                var topDocs = searcher.Search(query, 10);
> > > >                Assert.AreEqual(count, topDocs.TotalHits);
> > > >
> > > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields
> > > >15 results, what am I doing wrong? I use the StandardAnalyzer both
> > > >to create the index and to query.
> > > >
> > > >The field is defined as:
> > > >
> > > >new Field("Id", myObject.Id, Field.Store.YES,
> > > >Field.Index.NOT_ANALYZED)
> > > >
> > > >and is a string field. The result set back from Luke looks like
> > > >(screencap):
> > > >
> > > >http://screencast.com/t/NooMK2Rf
> > > >
> > > >Thanks!
> > >
> > >
> > >
> > >
> > >
> >
>

RE: SPAM-HIGH: Disparity between API usage and Luke

Posted by "Lingam, ChandraMohan J" <ch...@intel.com>.

>> var query = _parser.Parse("Id:BAUER*");

In your code, most likely, the value got converted to lower case (i.e. bauer*) by the parse statement.
Whereas indexed value is in upper case as it is not analyzed (from screen shot).

Can you explicitly try using prefix query?



> Same results, apparently, when I use Luke 1.0.1.
>
> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom 
> app, zero.
>
> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>
> > You appear to be using Luke 3.5 which per the information on the 
> > Luke homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >
> > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to 
> > see different behavior between the API and executing in Luke.
> >
> > If you use a version of Luke which more closely aligns with the 
> > version
> of
> > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close 
> > enough since the 2.9.x releases were previews of the 3.0.x releases 
> > as I understood it) what behavior do you see?
> >
> > Hope this helps,
> >
> > Rob
> >
> > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >
> > >If I run a query against my index using QueryParser to query a field:
> > >
> > >                var query = _parser.Parse("Id:BAUER*");
> > >                var topDocs = searcher.Search(query, 10);
> > >                Assert.AreEqual(count, topDocs.TotalHits);
> > >
> > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 
> > >15 results, what am I doing wrong? I use the StandardAnalyzer both 
> > >to create the index and to query.
> > >
> > >The field is defined as:
> > >
> > >new Field("Id", myObject.Id, Field.Store.YES, 
> > >Field.Index.NOT_ANALYZED)
> > >
> > >and is a string field. The result set back from Luke looks like
> > >(screencap):
> > >
> > >http://screencast.com/t/NooMK2Rf
> > >
> > >Thanks!
> >
> >
> >
> >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Itamar Syn-Hershko <it...@code972.com>.

It doesn't matter what analyzer you use if you do  Field.Index.NOT_ANALYZED

On Tue, Jun 26, 2012 at 9:48 PM, Rob Cecil <ro...@gmail.com> wrote:

> Same results, apparently, when I use Luke 1.0.1.
>
> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom app,
> zero.
>
> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>
> > You appear to be using Luke 3.5 which per the information on the Luke
> > homepage (http://code.google.com/p/luke/) uses Lucene 3.5
> >
> > Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> > different behavior between the API and executing in Luke.
> >
> > If you use a version of Luke which more closely aligns with the version
> of
> > Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> > since the 2.9.x releases were previews of the 3.0.x releases as I
> > understood it) what behavior do you see?
> >
> > Hope this helps,
> >
> > Rob
> >
> > On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
> >
> > >If I run a query against my index using QueryParser to query a field:
> > >
> > >                var query = _parser.Parse("Id:BAUER*");
> > >                var topDocs = searcher.Search(query, 10);
> > >                Assert.AreEqual(count, topDocs.TotalHits);
> > >
> > >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> > >results, what am I doing wrong? I use the StandardAnalyzer both to
> > >create the index and to query.
> > >
> > >The field is defined as:
> > >
> > >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> > >
> > >and is a string field. The result set back from Luke looks like
> > >(screencap):
> > >
> > >http://screencast.com/t/NooMK2Rf
> > >
> > >Thanks!
> >
> >
> >
> >
> >
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Same results, apparently, when I use Luke 1.0.1.

When I search for "Id:BAUER*" I get 15 hits in Luke, but in my custom app,
zero.

On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:

> You appear to be using Luke 3.5 which per the information on the Luke
> homepage (http://code.google.com/p/luke/) uses Lucene 3.5
>
> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
> different behavior between the API and executing in Luke.
>
> If you use a version of Luke which more closely aligns with the version of
> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
> since the 2.9.x releases were previews of the 3.0.x releases as I
> understood it) what behavior do you see?
>
> Hope this helps,
>
> Rob
>
> On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:
>
> >If I run a query against my index using QueryParser to query a field:
> >
> >                var query = _parser.Parse("Id:BAUER*");
> >                var topDocs = searcher.Search(query, 10);
> >                Assert.AreEqual(count, topDocs.TotalHits);
> >
> >I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> >results, what am I doing wrong? I use the StandardAnalyzer both to
> >create the index and to query.
> >
> >The field is defined as:
> >
> >new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
> >
> >and is a string field. The result set back from Luke looks like
> >(screencap):
> >
> >http://screencast.com/t/NooMK2Rf
> >
> >Thanks!
>
>
>
>
>

RE: Disparity between API usage and Luke

Posted by Moray McConnachie <mm...@oxford-analytica.com>.

I don't have time to write self-contained examples, but here're our
keyword analyzer related classes. 

Caveat: we are programming against an older version of Lucene.NET, and I
haven't been keeping up with API changes, so this may not work in newer
versions. However the principles should be the same. Although there may
now be better ways of achieving this - a number of ways we "rolled our
own" with earlier Lucene versions have ended up with better approaches
using fewer custom classes.

	/// <summary>
	/// Trivial case-sensitive string analyzer for simple fields.
	/// </summary>
	public class lucSingleStringAnalyzer :
Lucene.Net.Analysis.Analyzer
	{
		/// <summary>
		/// instantiate
		/// </summary>
		public lucSingleStringAnalyzer():base()
		{
		}
		/// <summary>
		/// The worker - simply applies FullTermTokenizer to the
textreader
		/// </summary>
		/// <param name="fieldName">Name of the field</param>
		/// <param name="reader">TextReader</param>
		/// <returns>Standard Lucene TokenStream</returns>
		public override Lucene.Net.Analysis.TokenStream
TokenStream(string fieldName, System.IO.TextReader reader)
		{
			return new lucFullTermTokenizer(reader);
		}

	}


	/// <summary>
	/// Analyses a field by reading it all as a single string and
lower casing it.
	/// </summary>
	public class
lucLowerCaseSingleStringAnalyzer:Lucene.Net.Analysis.Analyzer
	{
		/// <summary>
		/// instantiate
		/// </summary>
		public lucLowerCaseSingleStringAnalyzer():base()
		{
		}
		/// <summary>
		/// return a lowercase filter on our custom <see
cref="lucFullTermTokenizer">tokenizer</see>, i.e. the whole field is
returned as a single lower case string, just one token.
		/// </summary>
		/// <param name="fieldName">field name of stream</param>
		/// <param name="reader">TextReader</param>
		/// <returns>standard Lucene.NET Tokenstream</returns>
		public override Lucene.Net.Analysis.TokenStream
TokenStream(string fieldName, System.IO.TextReader reader)
		{
			return new
Lucene.Net.Analysis.LowerCaseFilter(new lucFullTermTokenizer(reader));
		}

	}

	/// <summary>
	/// A class to read a full string and return all of it as a
Lucene Token.
	/// </summary>
	/// <remarks> Simple fields where the whole keyword is the
relevant search term, not parts of it (e.g. United States should only be
indexed as 
	/// "United States", not under "States" and "United", can be
tokenized with this tokenizer.
	/// </remarks>
	public class lucFullTermTokenizer: Lucene.Net.Analysis.Tokenizer
	{
		/// <summary>
		/// Measure whether I have already read everything there
is to read
		/// </summary>
		private bool blRead;
		/// <summary>
		/// instantiate
		/// </summary>
		public lucFullTermTokenizer():base()
		{
			blRead=false;
		}
		/// <summary>
		/// instantiate with text reader
		/// </summary>
		/// <param name="input">The TextReader passed on by
Lucene</param>
		public lucFullTermTokenizer(System.IO.TextReader
input):base(input)
		{
			blRead=false;
		}
		
		/// <summary>
		/// returns the next Token. This class returns a single
Token per call, so Next should always return the string value of the
field the first time, and null thereafter
		/// </summary>
		/// <returns>A new Lucene.Net Token, or null if there is
nothing to read</returns>
		public override Lucene.Net.Analysis.Token Next()
		{
			if (! blRead) 
			{
				int i=0;
				int j;
				string str=base.input.ReadToEnd();
				blRead=true;
				j=str.Length;
				return new
Lucene.Net.Analysis.Token(str,i,j-1);
			} 
			else 
			{
				return null;
			}

		}
	}

// AND HERE'S THE EXAMPLE OF THE PERFIELDANALYZERWRAPPER USING THE ABOVE
/// <summary>
/// Module containing generic helping hands for Lucene-related stuff.
/// </summary>
public static class lucUtils
	{
	public static Lucene.Net.Analysis.Analyzer lucSpecialAnalyzer {
			get 
			{
	
Lucene.Net.Analysis.PerFieldAnalyzerWrapper lucAnalyzer=new
Lucene.Net.Analysis.PerFieldAnalyzerWrapper(new StandardAnalyzer);
//default analyser is standard - in fact we use our own customised
Porter stem analyzer here
				Lucene.Net.Analysis.Analyzer
lcKeywordAnalyzer=new lucLowerCaseSingleStringAnalyzer();
				Lucene.Net.Analysis.Analyzer
KeywordAnalyzer=new lucSingleStringAnalyzer();
				lucAnalyzer.AddAnalyzer("id",
lcKeywordAnalyzer);
				lucAnalyzer.AddAnalyzer("product",
KeywordAnalyzer);
				lucAnalyzer.AddAnalyzer("country",
lcKeywordAnalyzer);
				return lucAnalyzer;
			}
		}
}


Then we can use the query parser with the same analyser.

M.
-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: 27 June 2012 17:07
To: lucene-net-user@lucene.apache.org
Subject: Re: Disparity between API usage and Luke

Moray, Thanks I did catch that and been thinking about it. I finally
have the LIA book so some of this stuff is starting to make more sense.
Would you be willing to show your Keyword Analyzer class?

thanks

On Wed, Jun 27, 2012 at 1:57 AM, Moray McConnachie <
mmcconna@oxford-analytica.com> wrote:

> Rob, just in case you missed it in the dialogue earlier, let me 
> recommend to your attention the PerFieldAnalyserWrapper mentioned by
someone else.
> This allows you to specify different analysers for different fields, 
> but presents as a single analyser. So during indexing and searching to

> benefit from analyser and query parser, and can index and search all 
> fields with the analyser - no problems therefore having fields which
are not analysed.
>
> For fields like Id we use our own version of keyword analyser which 
> converts to lower case both on index and search but otherwise 
> preserves the term entirely.
>
> The only slight problem is it makes it harder to use tools like Luke 
> which use the standard analyser by default.
>
> Moray
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
>
> ----- Original Message -----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 06:50 PM
> To: lucene-net-user@lucene.apache.org 
> <lu...@lucene.apache.org>
> Subject: Disparity between API usage and Luke
>
> If I run a query against my index using QueryParser to query a field:
>
>                var query = _parser.Parse("Id:BAUER*");
>                var topDocs = searcher.Search(query, 10);
>                Assert.AreEqual(count, topDocs.TotalHits);
>
> I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15

> results, what am I doing wrong? I use the StandardAnalyzer both to 
> create the index and to query.
>
> The field is defined as:
>
> new Field("Id", myObject.Id, Field.Store.YES, 
> Field.Index.NOT_ANALYZED)
>
> and is a string field. The result set back from Luke looks like
> (screencap):
>
> http://screencast.com/t/NooMK2Rf
>
> Thanks!
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. 
> If this has been sent to you in error, please do not use, retain or 
> disclose them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
>

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------

Re: Disparity between API usage and Luke

Posted by Rob Cecil <ro...@gmail.com>.

Moray, Thanks I did catch that and been thinking about it. I finally have
the LIA book so some of this stuff is starting to make more sense. Would
you be willing to show your Keyword Analyzer class?

thanks

On Wed, Jun 27, 2012 at 1:57 AM, Moray McConnachie <
mmcconna@oxford-analytica.com> wrote:

> Rob, just in case you missed it in the dialogue earlier, let me recommend
> to your attention the PerFieldAnalyserWrapper mentioned by someone else.
> This allows you to specify different analysers for different fields, but
> presents as a single analyser. So during indexing and searching to benefit
> from analyser and query parser, and can index and search all fields with
> the analyser - no problems therefore having fields which are not analysed.
>
> For fields like Id we use our own version of keyword analyser which
> converts to lower case both on index and search but otherwise preserves the
> term entirely.
>
> The only slight problem is it makes it harder to use tools like Luke which
> use the standard analyser by default.
>
> Moray
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
>
> ----- Original Message -----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 06:50 PM
> To: lucene-net-user@lucene.apache.org <lu...@lucene.apache.org>
> Subject: Disparity between API usage and Luke
>
> If I run a query against my index using QueryParser to query a field:
>
>                var query = _parser.Parse("Id:BAUER*");
>                var topDocs = searcher.Search(query, 10);
>                Assert.AreEqual(count, topDocs.TotalHits);
>
> I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
> results, what am I doing wrong? I use the StandardAnalyzer both to
> create the index and to query.
>
> The field is defined as:
>
> new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
>
> and is a string field. The result set back from Luke looks like
> (screencap):
>
> http://screencast.com/t/NooMK2Rf
>
> Thanks!
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. If
> this has been sent to you in error, please do not use, retain or disclose
> them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Posted by Rob Vesse <rv...@dotnetrdf.org>.

You appear to be using Luke 3.5 which per the information on the Luke
homepage (http://code.google.com/p/luke/) uses Lucene 3.5

Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised to see
different behavior between the API and executing in Luke.

If you use a version of Luke which more closely aligns with the version of
Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close enough
since the 2.9.x releases were previews of the 3.0.x releases as I
understood it) what behavior do you see?

Hope this helps,

Rob

On 6/26/12 10:50 AM, "Rob Cecil" <ro...@gmail.com> wrote:

>If I run a query against my index using QueryParser to query a field:
>
>                var query = _parser.Parse("Id:BAUER*");
>                var topDocs = searcher.Search(query, 10);
>                Assert.AreEqual(count, topDocs.TotalHits);
>
>I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
>results, what am I doing wrong? I use the StandardAnalyzer both to
>create the index and to query.
>
>The field is defined as:
>
>new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)
>
>and is a string field. The result set back from Luke looks like
>(screencap):
>
>http://screencast.com/t/NooMK2Rf
>
>Thanks!

Re: Disparity between API usage and Luke

Posted by Moray McConnachie <mm...@oxford-analytica.com>.

Rob, just in case you missed it in the dialogue earlier, let me recommend to your attention the PerFieldAnalyserWrapper mentioned by someone else. This allows you to specify different analysers for different fields, but presents as a single analyser. So during indexing and searching to benefit from analyser and query parser, and can index and search all fields with the analyser - no problems therefore having fields which are not analysed.

For fields like Id we use our own version of keyword analyser which converts to lower case both on index and search but otherwise preserves the term entirely.

The only slight problem is it makes it harder to use tools like Luke which use the standard analyser by default.

Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com


----- Original Message -----
From: Rob Cecil [mailto:rob.cecil@gmail.com]
Sent: Tuesday, June 26, 2012 06:50 PM
To: lucene-net-user@lucene.apache.org <lu...@lucene.apache.org>
Subject: Disparity between API usage and Luke

If I run a query against my index using QueryParser to query a field:

                var query = _parser.Parse("Id:BAUER*");
                var topDocs = searcher.Search(query, 10);
                Assert.AreEqual(count, topDocs.TotalHits);

I get 0 for my TotalHits, yet in Luke, the same query phrase yields 15
results, what am I doing wrong? I use the StandardAnalyzer both to
create the index and to query.

The field is defined as:

new Field("Id", myObject.Id, Field.Store.YES, Field.Index.NOT_ANALYZED)

and is a string field. The result set back from Luke looks like (screencap):

http://screencast.com/t/NooMK2Rf

Thanks!

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------