You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2016/01/31 15:13:27 UTC

Executing Collector's Collect method on more than one thread

I am using RankQuery to implement my applicative scorer that returns a score
based on the value of specific field (lets call it 'score_field') that is
stored for every document. 
The RankQuery creates a collector, and for every collected docId I retrieve
the value of score_field, calculate the score and add the doc id into
priority queue: 

public class MyScorerrankQuery extends RankQuery { 
        ... 

        @Override 
        public TopDocsCollector getTopDocsCollector(int i,
SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { 
                ... 
                return new MyCollector(...) 
        } 
} 

public class MyCollector  extends TopDocsCollector{         
        MyScorer scorer; 
        SortedDocValues scoreFieldValues;
        

        @Override 
        public void collect(int id){ 
        	int docID = docBase + id; 
			//1. get specific field from the doc using DocValues and calculate score
using my scorer 
			String value = scoreFieldValues.get(docID).utf8ToString(); 
			scorer.calcScore(value); 
			//2. add docId and score (ScoreDoc object) into PriorityQueue. 
        } 
} 

Problem is that the calcScore may take ~20 ms per call, so if query returns
100,000 docs, which is not unusual, query execution time will be become 16
minutes. Is there a way to parallelize collector's logic, so more than one
thread would call calcScore simultaneously?



--
View this message in context: http://lucene.472066.n3.nabble.com/Executing-Collector-s-Collect-method-on-more-than-one-thread-tp4254269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Executing Collector's Collect method on more than one thread

Posted by Joel Bernstein <jo...@gmail.com>.
Before thinking at all about threads you might try to speeding things up
with your implementation. In particular your call to the top level
docValues is going to be very slow. The way to speed this up is to switch
to the segment level doc value at each segment switch. That way you avoid
the rather large overhead involved with top level String docValues. Then I
would change your scorer to work directly with BytesRef rather then
converting to the utf8 String.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jan 31, 2016 at 9:13 AM, adfel70 <ad...@gmail.com> wrote:

> I am using RankQuery to implement my applicative scorer that returns a
> score
> based on the value of specific field (lets call it 'score_field') that is
> stored for every document.
> The RankQuery creates a collector, and for every collected docId I retrieve
> the value of score_field, calculate the score and add the doc id into
> priority queue:
>
> public class MyScorerrankQuery extends RankQuery {
>         ...
>
>         @Override
>         public TopDocsCollector getTopDocsCollector(int i,
> SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
>                 ...
>                 return new MyCollector(...)
>         }
> }
>
> public class MyCollector  extends TopDocsCollector{
>         MyScorer scorer;
>         SortedDocValues scoreFieldValues;
>
>
>         @Override
>         public void collect(int id){
>                 int docID = docBase + id;
>                         //1. get specific field from the doc using
> DocValues and calculate score
> using my scorer
>                         String value =
> scoreFieldValues.get(docID).utf8ToString();
>                         scorer.calcScore(value);
>                         //2. add docId and score (ScoreDoc object) into
> PriorityQueue.
>         }
> }
>
> Problem is that the calcScore may take ~20 ms per call, so if query returns
> 100,000 docs, which is not unusual, query execution time will be become 16
> minutes. Is there a way to parallelize collector's logic, so more than one
> thread would call calcScore simultaneously?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Executing-Collector-s-Collect-method-on-more-than-one-thread-tp4254269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>