You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Artem Chereisky <a....@gmail.com> on 2010/03/08 05:26:40 UTC

new 2.9 search API

Hi,

I'm migrating my code to 2.9 and I'm having trouble understanding the new
search API

Here is a sample 2.4 code:

var searcher = new IndexSearcher(someReader);
searcher.SetSimilarity(new MySimilarity());

var hitCollector = new MyHitCollector();
var query = MyBuildQueryMethod();

searcher.Search(query, hitCollector);

The Collect method of MyHitCollector gets called for every matching document
with a docId and a score based on MySimilarity implementation.

In 2.9 HitCollector is replaced with Collector with 4 abstract methods to
implement. One of them is SetScorer(Scorer). This is where I'm getting lost.
Where do I get an instance of scorer from? There are many, one for each
query type. I think I'm missing something fundamental. Please clarify.

Regards,
Art

Re: new 2.9 search API

Posted by Artem Chereisky <a....@gmail.com>.
Of course, thanks for spotting it Michael. What was I thinking?!

Art

On 09/03/2010, at 15:55, Michael Garski <mg...@myspace-inc.com> wrote:

> Artem,
>
> You're on the right track, except for step 5. The score comes  
> directly from the Scorer.Score() method, docBase is added to the  
> document id passed into Collect to get the overall index document  
> number. The number passed into Collect is the number of the document  
> in the current segment being searched.
>
> Michael
>
> On Mar 8, 2010, at 8:44 PM, "Artem Chereisky"  
> <a....@gmail.com> wrote:
>
>> Thanks Michael, you pointed me in the right direction.
>>
>> To answer my own question and to close the thread, I didn't need to  
>> worry
>> about passing an instance of Scorer to MyCollector instance. The  
>> SetScorer
>> method is called by the Searcher, so to enable scoring I needed to  
>> do 5
>> things:
>>
>> 1. have an instance of Scorer in MyCollector class
>> 2. have an int member _docBase
>>
>>       private int _docBase;
>>       private Scorer _scorer;
>>
>> 3. implement SetNextReader
>>
>>       public override void SetNextReader(Lucene.Net.Index.IndexReader
>> reader, int docBase)
>>       {
>>           _docBase = docBase;
>>       }
>>
>> 4. Implement SetScorer
>>
>>       public override void SetScorer(Scorer scorer)
>>       {
>>           _scorer = scorer;
>>       }
>>
>>
>> 5. and finally call _scorer.Score() in Collect method adding  
>> _docBase to the
>> result of _scorer.Score().
>>
>> Thanks,
>> Art
>>
>>
>> On Tue, Mar 9, 2010 at 3:11 AM, Michael Garski <mgarski@myspace-inc.com 
>> >wrote:
>>
>>> Artem,
>>>
>>> The four methods on the Collector abstract class are invoked by the
>>> searcher that is performing the search, and are up to you to  
>>> implement
>>> in your collector as is necessary.
>>>
>>> With the change to segment-by-segment searching in 2.9, SetScorer  
>>> and
>>> SetNextReader allow the searcher to pass the current Scorer and
>>> IndexReader to the collector.
>>>
>>> The javadocs have the same content as the .NET documentation  
>>> comments,
>>> and the one for Collector is:
>>> http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col
>>> lector.html<http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col%0Alector.html 
>>> >
>>>
>>> Additionally, take a look at the Test project file Search 
>>> \QueryUtils.cs
>>> - there are two Collector implementations in it -
>>> AnonymousClassCollector & AnonymousClassCollector1 that are good
>>> examples of how to implement a concrete Collector.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Artem Chereisky [mailto:a.chereisky@gmail.com]
>>> Sent: Sunday, March 07, 2010 8:27 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: new 2.9 search API
>>>
>>> Hi,
>>>
>>> I'm migrating my code to 2.9 and I'm having trouble understanding  
>>> the
>>> new
>>> search API
>>>
>>> Here is a sample 2.4 code:
>>>
>>> var searcher = new IndexSearcher(someReader);
>>> searcher.SetSimilarity(new MySimilarity());
>>>
>>> var hitCollector = new MyHitCollector();
>>> var query = MyBuildQueryMethod();
>>>
>>> searcher.Search(query, hitCollector);
>>>
>>> The Collect method of MyHitCollector gets called for every matching
>>> document
>>> with a docId and a score based on MySimilarity implementation.
>>>
>>> In 2.9 HitCollector is replaced with Collector with 4 abstract  
>>> methods
>>> to
>>> implement. One of them is SetScorer(Scorer). This is where I'm  
>>> getting
>>> lost.
>>> Where do I get an instance of scorer from? There are many, one for  
>>> each
>>> query type. I think I'm missing something fundamental. Please  
>>> clarify.
>>>
>>> Regards,
>>> Art
>>>
>>>
>

Re: new 2.9 search API

Posted by Michael Garski <mg...@myspace-inc.com>.
Artem,

You're on the right track, except for step 5. The score comes directly  
from the Scorer.Score() method, docBase is added to the document id  
passed into Collect to get the overall index document number. The  
number passed into Collect is the number of the document in the  
current segment being searched.

Michael

On Mar 8, 2010, at 8:44 PM, "Artem Chereisky" <a....@gmail.com>  
wrote:

> Thanks Michael, you pointed me in the right direction.
>
> To answer my own question and to close the thread, I didn't need to  
> worry
> about passing an instance of Scorer to MyCollector instance. The  
> SetScorer
> method is called by the Searcher, so to enable scoring I needed to  
> do 5
> things:
>
> 1. have an instance of Scorer in MyCollector class
> 2. have an int member _docBase
>
>        private int _docBase;
>        private Scorer _scorer;
>
> 3. implement SetNextReader
>
>        public override void SetNextReader(Lucene.Net.Index.IndexReader
> reader, int docBase)
>        {
>            _docBase = docBase;
>        }
>
> 4. Implement SetScorer
>
>        public override void SetScorer(Scorer scorer)
>        {
>            _scorer = scorer;
>        }
>
>
> 5. and finally call _scorer.Score() in Collect method adding  
> _docBase to the
> result of _scorer.Score().
>
> Thanks,
> Art
>
>
> On Tue, Mar 9, 2010 at 3:11 AM, Michael Garski <mgarski@myspace-inc.com 
> >wrote:
>
>> Artem,
>>
>> The four methods on the Collector abstract class are invoked by the
>> searcher that is performing the search, and are up to you to  
>> implement
>> in your collector as is necessary.
>>
>> With the change to segment-by-segment searching in 2.9, SetScorer and
>> SetNextReader allow the searcher to pass the current Scorer and
>> IndexReader to the collector.
>>
>> The javadocs have the same content as the .NET documentation  
>> comments,
>> and the one for Collector is:
>> http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col
>> lector.html<http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col%0Alector.html 
>> >
>>
>> Additionally, take a look at the Test project file Search 
>> \QueryUtils.cs
>> - there are two Collector implementations in it -
>> AnonymousClassCollector & AnonymousClassCollector1 that are good
>> examples of how to implement a concrete Collector.
>>
>> Michael
>>
>>
>>
>>
>> -----Original Message-----
>> From: Artem Chereisky [mailto:a.chereisky@gmail.com]
>> Sent: Sunday, March 07, 2010 8:27 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: new 2.9 search API
>>
>> Hi,
>>
>> I'm migrating my code to 2.9 and I'm having trouble understanding the
>> new
>> search API
>>
>> Here is a sample 2.4 code:
>>
>> var searcher = new IndexSearcher(someReader);
>> searcher.SetSimilarity(new MySimilarity());
>>
>> var hitCollector = new MyHitCollector();
>> var query = MyBuildQueryMethod();
>>
>> searcher.Search(query, hitCollector);
>>
>> The Collect method of MyHitCollector gets called for every matching
>> document
>> with a docId and a score based on MySimilarity implementation.
>>
>> In 2.9 HitCollector is replaced with Collector with 4 abstract  
>> methods
>> to
>> implement. One of them is SetScorer(Scorer). This is where I'm  
>> getting
>> lost.
>> Where do I get an instance of scorer from? There are many, one for  
>> each
>> query type. I think I'm missing something fundamental. Please  
>> clarify.
>>
>> Regards,
>> Art
>>
>>


Re: new 2.9 search API

Posted by Artem Chereisky <a....@gmail.com>.
Thanks Michael, you pointed me in the right direction.

To answer my own question and to close the thread, I didn't need to worry
about passing an instance of Scorer to MyCollector instance. The SetScorer
method is called by the Searcher, so to enable scoring I needed to do 5
things:

1. have an instance of Scorer in MyCollector class
2. have an int member _docBase

        private int _docBase;
        private Scorer _scorer;

3. implement SetNextReader

        public override void SetNextReader(Lucene.Net.Index.IndexReader
reader, int docBase)
        {
            _docBase = docBase;
        }

4. Implement SetScorer

        public override void SetScorer(Scorer scorer)
        {
            _scorer = scorer;
        }


5. and finally call _scorer.Score() in Collect method adding _docBase to the
result of _scorer.Score().

Thanks,
Art


On Tue, Mar 9, 2010 at 3:11 AM, Michael Garski <mg...@myspace-inc.com>wrote:

> Artem,
>
> The four methods on the Collector abstract class are invoked by the
> searcher that is performing the search, and are up to you to implement
> in your collector as is necessary.
>
> With the change to segment-by-segment searching in 2.9, SetScorer and
> SetNextReader allow the searcher to pass the current Scorer and
> IndexReader to the collector.
>
> The javadocs have the same content as the .NET documentation comments,
> and the one for Collector is:
> http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col
> lector.html<http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col%0Alector.html>
>
> Additionally, take a look at the Test project file Search\QueryUtils.cs
> - there are two Collector implementations in it -
> AnonymousClassCollector & AnonymousClassCollector1 that are good
> examples of how to implement a concrete Collector.
>
> Michael
>
>
>
>
> -----Original Message-----
> From: Artem Chereisky [mailto:a.chereisky@gmail.com]
> Sent: Sunday, March 07, 2010 8:27 PM
> To: lucene-net-user@lucene.apache.org
> Subject: new 2.9 search API
>
> Hi,
>
> I'm migrating my code to 2.9 and I'm having trouble understanding the
> new
> search API
>
> Here is a sample 2.4 code:
>
> var searcher = new IndexSearcher(someReader);
> searcher.SetSimilarity(new MySimilarity());
>
> var hitCollector = new MyHitCollector();
> var query = MyBuildQueryMethod();
>
> searcher.Search(query, hitCollector);
>
> The Collect method of MyHitCollector gets called for every matching
> document
> with a docId and a score based on MySimilarity implementation.
>
> In 2.9 HitCollector is replaced with Collector with 4 abstract methods
> to
> implement. One of them is SetScorer(Scorer). This is where I'm getting
> lost.
> Where do I get an instance of scorer from? There are many, one for each
> query type. I think I'm missing something fundamental. Please clarify.
>
> Regards,
> Art
>
>

RE: new 2.9 search API

Posted by Michael Garski <mg...@myspace-inc.com>.
Artem,

The four methods on the Collector abstract class are invoked by the
searcher that is performing the search, and are up to you to implement
in your collector as is necessary.

With the change to segment-by-segment searching in 2.9, SetScorer and
SetNextReader allow the searcher to pass the current Scorer and
IndexReader to the collector.  

The javadocs have the same content as the .NET documentation comments,
and the one for Collector is:
http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/search/Col
lector.html

Additionally, take a look at the Test project file Search\QueryUtils.cs
- there are two Collector implementations in it -
AnonymousClassCollector & AnonymousClassCollector1 that are good
examples of how to implement a concrete Collector.

Michael




-----Original Message-----
From: Artem Chereisky [mailto:a.chereisky@gmail.com] 
Sent: Sunday, March 07, 2010 8:27 PM
To: lucene-net-user@lucene.apache.org
Subject: new 2.9 search API

Hi,

I'm migrating my code to 2.9 and I'm having trouble understanding the
new
search API

Here is a sample 2.4 code:

var searcher = new IndexSearcher(someReader);
searcher.SetSimilarity(new MySimilarity());

var hitCollector = new MyHitCollector();
var query = MyBuildQueryMethod();

searcher.Search(query, hitCollector);

The Collect method of MyHitCollector gets called for every matching
document
with a docId and a score based on MySimilarity implementation.

In 2.9 HitCollector is replaced with Collector with 4 abstract methods
to
implement. One of them is SetScorer(Scorer). This is where I'm getting
lost.
Where do I get an instance of scorer from? There are many, one for each
query type. I think I'm missing something fundamental. Please clarify.

Regards,
Art