You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jérôme Etévé <je...@gmail.com> on 2009/08/20 20:17:07 UTC

Implementing customized Scorer with solr API 1.4

Hi all,

 I'm kind of struggling with a customized lucene.Scorer of mine, since
I use solr 1.4.

 Here's the problem:

 I wrote a DocSetQuery which inherit from a lucene.Query. This query
is a decorator for a lucene.Query that filters out the documents which
are not in a given set of  predefined documents (a solr.DocSet which I
call docset ).

So In my Weight / Scorer, I implemented the method  nextDoc like that:

public int nextDoc() throws IOException {
do {
         if (decoScorer.nextDoc() == NO_MORE_DOCS) {
              return NO_MORE_DOCS;
         }
        // DO THIS UNTIL the doc is in the docset
 } while (!docset.exists(decoScorer.docID()));
 return decoScorer.docID();
}

The decoScorer here is the decorated scorer.

My problem here is that in docset, there are 'absolute' documents IDs,
but now solr uses a number of sub readers each with a kind of offset,
so decoScorer.docID() gives 'relative' document ID . Because of this,
I happen to test relative document IDs against a set of absolute
docIDs.

So my DocSetQuery does not work anymore. The solution would be I think
to have a way of getting the offset of the SolrReader being used in
the context to be able to do docset.exists(decoScorer.docID() +
offset) .

But how can I get this offset?
The scorer is built with a lucene.IndexReader in parameter:
public Scorer scorer(IndexReader reader) .

Within solr, this IndexReader happens to be an instance of
SolrIndexReader so I though maybe I could downcast reader to a
SolrIndexReader to be able to call the offset related methods on it
(getBase() etc...).

I feel quite unconfortable with this solution since my DocSetQuery
inherits from a lucene thing, so it would be quite odd to downcast
something to a solr class inside it, plus I didn't really figured out
how to use those offset related methods.

Thanks for your help!

All the best!

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jerome@eteve.net

Re: Implementing customized Scorer with solr API 1.4

Posted by Mark Miller <ma...@gmail.com>.

You might be interested in this issue:
http://issues.apache.org/jira/browse/LUCENE-1821

-- 
- Mark

http://www.lucidimagination.com



Jérôme Etévé wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>          if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>               return NO_MORE_DOCS;
>          }
>         // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
>

Re: Implementing customized Scorer with solr API 1.4

Posted by Jason Rutherglen <ja...@gmail.com>.

We should probably move to using Lucene's Filters/DocIdSets
instead of DocSets and merge the two. Then we will not need to
maintain two separate but similar and confusing functionality
classes. This will make seamlessly integrating searching with
Solr's Filters/DocSets into Lucene's new per segment reader
searching easier, especially for new filter writers such as
yourself. Right now we have what appears to be duplicated code.

We probably need several different issues to accomplish what
this requires. One start is SOLR-1308, though I suspect given
the restructuring required, we'll need to break things up into
several separate issues. I'm not really sure what SOLR-1179 was
for.

On Thu, Aug 20, 2009 at 11:17 AM, Jérôme Etévé<je...@gmail.com> wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>         if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>              return NO_MORE_DOCS;
>         }
>        // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jerome@eteve.net
>

Re: Implementing customized Scorer with solr API 1.4

Posted by Jérôme Etévé <je...@gmail.com>.

Hi ,
 Thanks for your help.

So do I have to do:

public Scorer scorer(IndexReader reader) throws IOException {
 SolrIndexReader solrReader = (SolrIndexReader) reader;
 int offset = solrReader.getBase() ;

Or is it a bit more complex than that?


Jerome.

2009/8/20 Mark Miller <ma...@gmail.com>:
> Jérôme Etévé wrote:
>> Hi all,
>>
>>  I'm kind of struggling with a customized lucene.Scorer of mine, since
>> I use solr 1.4.
>>
>>  Here's the problem:
>>
>>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
>> is a decorator for a lucene.Query that filters out the documents which
>> are not in a given set of  predefined documents (a solr.DocSet which I
>> call docset ).
>>
>> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>>
>> public int nextDoc() throws IOException {
>> do {
>>          if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>>               return NO_MORE_DOCS;
>>          }
>>         // DO THIS UNTIL the doc is in the docset
>>  } while (!docset.exists(decoScorer.docID()));
>>  return decoScorer.docID();
>> }
>>
>> The decoScorer here is the decorated scorer.
>>
>> My problem here is that in docset, there are 'absolute' documents IDs,
>> but now solr uses a number of sub readers each with a kind of offset,
>> so decoScorer.docID() gives 'relative' document ID . Because of this,
>> I happen to test relative document IDs against a set of absolute
>> docIDs.
>>
>> So my DocSetQuery does not work anymore. The solution would be I think
>> to have a way of getting the offset of the SolrReader being used in
>> the context to be able to do docset.exists(decoScorer.docID() +
>> offset) .
>>
>> But how can I get this offset?
>> The scorer is built with a lucene.IndexReader in parameter:
>> public Scorer scorer(IndexReader reader) .
>>
>> Within solr, this IndexReader happens to be an instance of
>> SolrIndexReader so I though maybe I could downcast reader to a
>> SolrIndexReader to be able to call the offset related methods on it
>> (getBase() etc...).
>>
> It may not feel super clean, but it should be fine - Solr always uses a
> SolrIndexSearcher which always wraps all of the IndexReaders in
> SolrIndexReader. I'm fairly sure anyway ;)
>
> By getting the base of the subreader wihtin the top reader, you can add
> it to the doc id to get the top reader doc id.
>> I feel quite unconfortable with this solution since my DocSetQuery
>> inherits from a lucene thing, so it would be quite odd to downcast
>> something to a solr class inside it, plus I didn't really figured out
>> how to use those offset related methods.
>>
>> Thanks for your help!
>>
>> All the best!
>>
>> Jerome Eteve.
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jerome@eteve.net

Re: Implementing customized Scorer with solr API 1.4

Posted by Mark Miller <ma...@gmail.com>.

Jérôme Etévé wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>          if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>               return NO_MORE_DOCS;
>          }
>         // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>   
It may not feel super clean, but it should be fine - Solr always uses a
SolrIndexSearcher which always wraps all of the IndexReaders in
SolrIndexReader. I'm fairly sure anyway ;)

By getting the base of the subreader wihtin the top reader, you can add
it to the doc id to get the top reader doc id.
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
>   


-- 
- Mark

http://www.lucidimagination.com