You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2015/06/10 18:59:55 UTC

Adding applicative cache to SolrSearcher

I am using RankQuery to implement my applicative scorer that returns a score
based on the value of specific field (lets call it 'score_field') that is
stored for every document.
The RankQuery creates a collector, and for every collected docId I retrieve
the value of score_field, calculate the score and add the doc id into
priority queue:

public class MyScorerrankQuet extends RankQuery { 
        ... 

        @Override 
        public TopDocsCollector getTopDocsCollector(int i,
SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { 
                ... 
                return new MyCollector(...) 
        } 
} 

public class MyCollector  extends TopDocsCollector{         
        MyScorer scorer; 
        SortedDocValues scoreFieldValues;	//Initialized in constrctor

        public MyCollector(){ 
                scorer = new MyScorer(); 
                scorer.start();	//the scorer's API needs to call start()
before every query and close() at the end of the query
				AtomicReader r =
SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
               	scoreFieldValues = DocValues.getSorted(r, "score_field");	/*
THIS CALL IS TIME CONSUMING! */
        } 

        @Override 
        public void collect(int id){ 
        		int docID = docBase + id;
                //1. get specific field from the doc using DocValues and
calculate score using my scorer 
              	String value = scoreFieldValues.get(docID).utf8ToString();
              	scorer.calcScore(value);
                //2. add docId and score (ScoreDoc object) into
PriorityQueue. 
        } 
} 


I used DocValues to get the value of score_field. Currently its being
instantiate in collector's constructor - which is performance killer,
because it is being called for EVERY query, even if the index is static (no
commits). I want to make the DocValue.getStored() call only when it is
really necessary, but I dont know where to put that code. Is there a place
to plug that code so when a new searcher is being opened I can add my this
applicative cache?



--
View this message in context: http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding applicative cache to SolrSearcher

Posted by adfel70 <ad...@gmail.com>.
Works great, thanks guys!
Missed the leafReader because I looked at IndexSearcher instead of
SolrIndexSearcher...



--
View this message in context: http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012p4211183.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding applicative cache to SolrSearcher

Posted by Chris Hostetter <ho...@fucit.org>.
: 
: The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
: you hardly ever need to to this, at least because Solr already does it.

Specifically you should just use...

	searcher.getLeafReader().getSortedSetDocValues(your_field_anme)

...instead of doing all this wrapping yourself.

If the field docValues="true" declared, this will all be precomputed at 
index time and super fast.  if not, then the UninvertingReader logic will 
kick in once per searcher -- if you want to "pre-warm" it just configure a 
requests that exercises your code as part of a firstSearcher and 
newSearcher event listeners.



-Hoss
http://www.lucidworks.com/

Re: Adding applicative cache to SolrSearcher

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
you hardly ever need to to this, at least because Solr already does it.

DocValues need to be accessed per segment, leaf/atomic/reader/context
provided to collector.
eg look at DocTermsIndexDocValues.strVal(int)
DocTermsIndexDocValues.open(LeafReaderContext, String) or
TermOrdValComparator and TopFieldCollector.


On Wed, Jun 10, 2015 at 6:59 PM, adfel70 <ad...@gmail.com> wrote:

>
> I am using RankQuery to implement my applicative scorer that returns a
> score
> based on the value of specific field (lets call it 'score_field') that is
> stored for every document.
> The RankQuery creates a collector, and for every collected docId I retrieve
> the value of score_field, calculate the score and add the doc id into
> priority queue:
>
> public class MyScorerrankQuet extends RankQuery {
>         ...
>
>         @Override
>         public TopDocsCollector getTopDocsCollector(int i,
> SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
>                 ...
>                 return new MyCollector(...)
>         }
> }
>
> public class MyCollector  extends TopDocsCollector{
>         MyScorer scorer;
>         SortedDocValues scoreFieldValues;       //Initialized in constrctor
>
>         public MyCollector(){
>                 scorer = new MyScorer();
>                 scorer.start(); //the scorer's API needs to call start()
> before every query and close() at the end of the query
>                                 AtomicReader r =
> SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
>                 scoreFieldValues = DocValues.getSorted(r, "score_field");
>      /*
> THIS CALL IS TIME CONSUMING! */
>         }
>
>         @Override
>         public void collect(int id){
>                         int docID = docBase + id;
>                 //1. get specific field from the doc using DocValues and
> calculate score using my scorer
>                 String value = scoreFieldValues.get(docID).utf8ToString();
>                 scorer.calcScore(value);
>                 //2. add docId and score (ScoreDoc object) into
> PriorityQueue.
>         }
> }
>
>
> I used DocValues to get the value of score_field. Currently its being
> instantiate in collector's constructor - which is performance killer,
> because it is being called for EVERY query, even if the index is static (no
> commits). I want to make the DocValue.getStored() call only when it is
> really necessary, but I dont know where to put that code. Is there a place
> to plug that code so when a new searcher is being opened I can add my this
> applicative cache?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>