You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gregg Donovan <gr...@gmail.com> on 2014/02/23 18:57:51 UTC

DistributedSearch: Skipping STAGE_GET_FIELDS?

In most of our Solr use-cases, we fetch only fl=<uniqueKey> or
fl=<uniqueKey>,<another_int_field>. I'd like to be able to do a distributed
search and skip STAGE_GET_FIELDS -- i.e. the stage where each shard is
queried for the documents found the  the top ids -- as it seems like we
could be collecting this information earlier in the pipeline.

Is this possible out-of-the-box? If not, how would you recommend
implementing it?

Thanks!

--Gregg

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Jeff Wartes <jw...@whitepages.com>.

I¹ll second that thank-you, this is awesome.

I asked about this issue in 2010, but when I didn¹t hear anything (and
disappointingly didn¹t find SOLR-1880), we ended up rolling our own
version of this functionality. I¹ve been laboriously migrating it every
time we bump our Solr version ever since. The performance difference is
quite noticeable. 
One thing is that our version interferes pretty badly with various other
Components. It¹s been a while, but my recollection is that other
Components like Debug assumed some stuff happened in STAGE_GET_FIELDS.

I think I¹ll try to apply SOLR-1880 to 4.6.1 and see what happens.

On 2/24/14, 11:07 AM, "Gregg Donovan" <gr...@gmail.com> wrote:

>Thank you Shalin and Yonik! Both
>SOLR-1880<https://issues.apache.org/jira/browse/SOLR-1880>
> and SOLR-5768 <https://issues.apache.org/jira/browse/SOLR-5768> will be
>very helpful for our distributed search performance.
>
>
>
>On Mon, Feb 24, 2014 at 5:02 AM, Shalin Shekhar Mangar <
>shalinmangar@gmail.com> wrote:
>
>> I opened SOLR-5768
>>
>> https://issues.apache.org/jira/browse/SOLR-5768
>>
>> On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
>> <sh...@gmail.com> wrote:
>> > Yes that should be simple. But regardless of the parameter, the
>> > fl=id,score use-case should be optimized by default. I think I'll
>> > commit the patch as-is and open a new issue to add the
>> > distrib.singlePass parameter.
>> >
>> > On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley <yo...@heliosearch.com>
>> wrote:
>> >> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
>> >> <sh...@gmail.com> wrote:
>> >>> I should clarify though that this optimization only works with
>> fl=id,score.
>> >>
>> >> Although it seems like it should be relatively simple to make it work
>> >> with other fields as well, by passing down the complete "fl"
>>requested
>> >> if some optional parameter is set (distrib.singlePass?)
>> >>
>> >> -Yonik
>> >> http://heliosearch.org - native off-heap filters and fieldcache for
>> solr
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Gregg Donovan <gr...@gmail.com>.

Thank you Shalin and Yonik! Both
SOLR-1880<https://issues.apache.org/jira/browse/SOLR-1880>
 and SOLR-5768 <https://issues.apache.org/jira/browse/SOLR-5768> will be
very helpful for our distributed search performance.



On Mon, Feb 24, 2014 at 5:02 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> I opened SOLR-5768
>
> https://issues.apache.org/jira/browse/SOLR-5768
>
> On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
> > Yes that should be simple. But regardless of the parameter, the
> > fl=id,score use-case should be optimized by default. I think I'll
> > commit the patch as-is and open a new issue to add the
> > distrib.singlePass parameter.
> >
> > On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley <yo...@heliosearch.com>
> wrote:
> >> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
> >> <sh...@gmail.com> wrote:
> >>> I should clarify though that this optimization only works with
> fl=id,score.
> >>
> >> Although it seems like it should be relatively simple to make it work
> >> with other fields as well, by passing down the complete "fl" requested
> >> if some optional parameter is set (distrib.singlePass?)
> >>
> >> -Yonik
> >> http://heliosearch.org - native off-heap filters and fieldcache for
> solr
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

I opened SOLR-5768

https://issues.apache.org/jira/browse/SOLR-5768

On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> Yes that should be simple. But regardless of the parameter, the
> fl=id,score use-case should be optimized by default. I think I'll
> commit the patch as-is and open a new issue to add the
> distrib.singlePass parameter.
>
> On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
>> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
>> <sh...@gmail.com> wrote:
>>> I should clarify though that this optimization only works with fl=id,score.
>>
>> Although it seems like it should be relatively simple to make it work
>> with other fields as well, by passing down the complete "fl" requested
>> if some optional parameter is set (distrib.singlePass?)
>>
>> -Yonik
>> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.



-- 
Regards,
Shalin Shekhar Mangar.

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Yes that should be simple. But regardless of the parameter, the
fl=id,score use-case should be optimized by default. I think I'll
commit the patch as-is and open a new issue to add the
distrib.singlePass parameter.

On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
>> I should clarify though that this optimization only works with fl=id,score.
>
> Although it seems like it should be relatively simple to make it work
> with other fields as well, by passing down the complete "fl" requested
> if some optional parameter is set (distrib.singlePass?)
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr



-- 
Regards,
Shalin Shekhar Mangar.

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> I should clarify though that this optimization only works with fl=id,score.

Although it seems like it should be relatively simple to make it work
with other fields as well, by passing down the complete "fl" requested
if some optional parameter is set (distrib.singlePass?)

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

I should clarify though that this optimization only works with fl=id,score.

On Sun, Feb 23, 2014 at 11:34 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> What a coincidence - I was about to commit a patch which makes it
> possible. It will be released with 4.8
>
> See https://issues.apache.org/jira/browse/SOLR-1880
>
> On Sun, Feb 23, 2014 at 11:27 PM, Gregg Donovan <gr...@gmail.com> wrote:
>> In most of our Solr use-cases, we fetch only fl=<uniqueKey> or
>> fl=<uniqueKey>,<another_int_field>. I'd like to be able to do a distributed
>> search and skip STAGE_GET_FIELDS -- i.e. the stage where each shard is
>> queried for the documents found the  the top ids -- as it seems like we
>> could be collecting this information earlier in the pipeline.
>>
>> Is this possible out-of-the-box? If not, how would you recommend
>> implementing it?
>>
>> Thanks!
>>
>> --Gregg
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.



-- 
Regards,
Shalin Shekhar Mangar.

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

What a coincidence - I was about to commit a patch which makes it
possible. It will be released with 4.8

See https://issues.apache.org/jira/browse/SOLR-1880

On Sun, Feb 23, 2014 at 11:27 PM, Gregg Donovan <gr...@gmail.com> wrote:
> In most of our Solr use-cases, we fetch only fl=<uniqueKey> or
> fl=<uniqueKey>,<another_int_field>. I'd like to be able to do a distributed
> search and skip STAGE_GET_FIELDS -- i.e. the stage where each shard is
> queried for the documents found the  the top ids -- as it seems like we
> could be collecting this information earlier in the pipeline.
>
> Is this possible out-of-the-box? If not, how would you recommend
> implementing it?
>
> Thanks!
>
> --Gregg



-- 
Regards,
Shalin Shekhar Mangar.