You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 郑华斌 <hu...@qq.com> on 2014/04/29 10:23:05 UTC

How to reduce enumerating docs

Hi all,


    My doc has two fileds namely "length" and "fingerprint", which stand for the length and text of the doc. I have a custom SearchComponent that enum all the docs according to the term to search the fingerprint. That could be very slow because the number of docs is very huge and the operation is time consume. Since I only care about the docs with the length within a close range around that specified in the query, what's the right way to accelerate? Thanks


        DocsEnum docsEnum = sub_reader.termDocsEnum(term);
        if (docsEnum == null) {
              continue;
        }
        while ((doc = docsEnum.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
        // do something expensive
        }

Re: How to reduce enumerating docs

Posted by 郑华斌 <hu...@qq.com>.
Will the filter query execute before or after my custom search component?


In fact, I care about that, for example,if the following \docsEnum will contain 1M docs for term \aterm without the flter query, will it be less than 1M in case that the filter query is present?


        DocsEnum docsEnum = sub_reader.termDocsEnum(aterm);








------------------ Original ------------------
From:  "Alexandre Rafalovitch";<ar...@gmail.com>;
Send time: Tuesday, Apr 29, 2014 5:13 PM
To: "solr-user"<so...@lucene.apache.org>; 

Subject:  Re: How to reduce enumerating docs



Can't you just specify the length range as a filter query? If your
length type is tint/tlong, Solr already has optimized code that uses
multiple resolutions depth to efficiently filter through the numbers.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 3:23 PM, 郑华斌 <hu...@qq.com> wrote:
> Hi all,
>
>
>     My doc has two fileds namely "length" and "fingerprint", which stand for the length and text of the doc. I have a custom SearchComponent that enum all the docs according to the term to search the fingerprint. That could be very slow because the number of docs is very huge and the operation is time consume. Since I only care about the docs with the length within a close range around that specified in the query, what's the right way to accelerate? Thanks
>
>
>         DocsEnum docsEnum = sub_reader.termDocsEnum(term);
>         if (docsEnum == null) {
>               continue;
>         }
>         while ((doc = docsEnum.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
>         // do something expensive
>         }
.

Re: How to reduce enumerating docs

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Can't you just specify the length range as a filter query? If your
length type is tint/tlong, Solr already has optimized code that uses
multiple resolutions depth to efficiently filter through the numbers.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 3:23 PM, 郑华斌 <hu...@qq.com> wrote:
> Hi all,
>
>
>     My doc has two fileds namely "length" and "fingerprint", which stand for the length and text of the doc. I have a custom SearchComponent that enum all the docs according to the term to search the fingerprint. That could be very slow because the number of docs is very huge and the operation is time consume. Since I only care about the docs with the length within a close range around that specified in the query, what's the right way to accelerate? Thanks
>
>
>         DocsEnum docsEnum = sub_reader.termDocsEnum(term);
>         if (docsEnum == null) {
>               continue;
>         }
>         while ((doc = docsEnum.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
>         // do something expensive
>         }