You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by 子落 <ya...@taobao.com> on 2013/11/15 09:17:03 UTC

about Top N search for big doclist Optimization

Is there any method for Optimization on the scene.

 

My search is always like this  sex:boy and isbig:yes  

But  I only need the top 4 result ,not need sort by score.

 

The stored in lucene frq doclist file maybe like this

 

Sex,boy  ->  1,2,3,4,5,6,8,9,10…..9999999,1000000……100000000,10000001

isbig,yes->   9999999,1000000,1000001, 1000002……100000000,10000001

 

so the top 4 result may be is 9999999,1000000,1000001, 1000002.

 

Bit the doclist is so big ,need lots of IO to read all of them,but I only
need the top 4

 

So is there any method can do for this ,not weast so may io

 

First The doclist is ordered

Second we can jump some of doclist at head

There wen we collect 4 doclist is enough ,not need to read other doclist at
end of the 4 doclist 

 

The search maybe quickly .

 

 

 

 


答复: about Top N search for big doclist Optimization

Posted by 子落 <ya...@taobao.com>.
Found  lucene has the method  to skip docs in doclist

 

But how can I used it ,  USED THE TopDocsCollector ?

 

 

  boolean skipTo(int target) throws IOException;

 

 

发件人: 子落 [mailto:yannian.mu@taobao.com] 
发送时间: 2013年11月15日 16:17
收件人: dev@lucene.apache.org
主题: about Top N search for big doclist Optimization 

 

Is there any method for Optimization on the scene.

 

My search is always like this  sex:boy and isbig:yes  

But  I only need the top 4 result ,not need sort by score.

 

The stored in lucene frq doclist file maybe like this

 

Sex,boy  ->  1,2,3,4,5,6,8,9,10…..9999999,1000000……100000000,10000001

isbig,yes->   9999999,1000000,1000001, 1000002……100000000,10000001

 

so the top 4 result may be is 9999999,1000000,1000001, 1000002.

 

Bit the doclist is so big ,need lots of IO to read all of them,but I only
need the top 4

 

So is there any method can do for this ,not weast so may io

 

First The doclist is ordered

Second we can jump some of doclist at head

There wen we collect 4 doclist is enough ,not need to read other doclist at
end of the 4 doclist 

 

The search maybe quickly .