You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gregg Donovan <gr...@gmail.com> on 2014/03/03 17:14:58 UTC

Re: Fetching uniqueKey and other int quickly from documentCache?

Yonik,

That's a very clever idea. Unfortunately, I think that will skip the
distributed query optimization we were hoping to take advantage of in
SOLR-1880 [1], but it should work with the proposed distrib.singlePass
optimization in SOLR-5768 [2]. Does that sound right?

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-1880
[2] https://issues.apache.org/jira/browse/SOLR-5768


On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> You could try forcing things to go through function queries (via
> pseudo-fields):
>
> fl=field(id), field(myfield)
>
> If you're not requesting any stored fields, that *might* currently
> skip that step.
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com> wrote:
> > We fetch a large number of documents -- 1000+ -- for each search. Each
> > request fetches only the uniqueKey or the uniqueKey plus one secondary
> > integer key. Despite this, we find that we spent a sizable amount of time
> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> > fetching the two stored fields, LZ4 decoding, etc.
> >
> > I would love to be able to tell Solr to always fetch these two fields
> from
> > memory. We have them both in the fieldCache so we're already spending the
> > RAM. I've seen this asked previously [1], so it seems like a fairly
> common
> > need, especially for distributed search. Any ideas?
> >
> > A few possible ideas I had:
> >
> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
> > --Give the documentCache config a list of fields it should load from the
> > fieldCache
> >
> >
> > Having an in-memory mapping from docId->uniqueKey has come up for us
> > before. We've used a custom SolrCache maintaining that mapping to quickly
> > filter over personalized collections. Maybe the uniqueKey should be more
> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> > maintained the docId->uniqueKey mapping in memory?
> >
> > --Gregg
> >
> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Gregg Donovan <gr...@gmail.com>.

Yonik,

Requesting
fl=unique_key:field(unique_key),secondary_key:field(secondary_key),score vs
fl=unique_key,secondary_key,score was a nice performance win, as unique_key
and secondary_key were both already in the fieldCache. We removed our
documentCache, in fact, as it got very such little use.

We do see a code path that fetches stored fields, though, in
BinaryResponseWriter, for the case of *only* pseudo-fields being requested.
I opened a ticket and attached a patch to
https://issues.apache.org/jira/browse/SOLR-5968.




On Mon, Mar 3, 2014 at 11:30 AM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan <gr...@gmail.com> wrote:
> > Yonik,
> >
> > That's a very clever idea. Unfortunately, I think that will skip the
> > distributed query optimization we were hoping to take advantage of in
> > SOLR-1880 [1], but it should work with the proposed distrib.singlePass
> > optimization in SOLR-5768 [2]. Does that sound right?
>
>
> Yep, the two together should do the trick.
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
> > --Gregg
> >
> > [1] https://issues.apache.org/jira/browse/SOLR-1880
> > [2] https://issues.apache.org/jira/browse/SOLR-5768
> >
> >
> > On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com>
> wrote:
> >
> >> You could try forcing things to go through function queries (via
> >> pseudo-fields):
> >>
> >> fl=field(id), field(myfield)
> >>
> >> If you're not requesting any stored fields, that *might* currently
> >> skip that step.
> >>
> >> -Yonik
> >> http://heliosearch.org - native off-heap filters and fieldcache for
> solr
> >>
> >>
> >> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com>
> wrote:
> >> > We fetch a large number of documents -- 1000+ -- for each search. Each
> >> > request fetches only the uniqueKey or the uniqueKey plus one secondary
> >> > integer key. Despite this, we find that we spent a sizable amount of
> time
> >> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> >> > fetching the two stored fields, LZ4 decoding, etc.
> >> >
> >> > I would love to be able to tell Solr to always fetch these two fields
> >> from
> >> > memory. We have them both in the fieldCache so we're already spending
> the
> >> > RAM. I've seen this asked previously [1], so it seems like a fairly
> >> common
> >> > need, especially for distributed search. Any ideas?
> >> >
> >> > A few possible ideas I had:
> >> >
> >> > --Check FieldCache.html#getCacheEntries() before going to stored
> fields.
> >> > --Give the documentCache config a list of fields it should load from
> the
> >> > fieldCache
> >> >
> >> >
> >> > Having an in-memory mapping from docId->uniqueKey has come up for us
> >> > before. We've used a custom SolrCache maintaining that mapping to
> quickly
> >> > filter over personalized collections. Maybe the uniqueKey should be
> more
> >> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> >> > maintained the docId->uniqueKey mapping in memory?
> >> >
> >> > --Gregg
> >> >
> >> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
> >>
>

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan <gr...@gmail.com> wrote:
> Yonik,
>
> That's a very clever idea. Unfortunately, I think that will skip the
> distributed query optimization we were hoping to take advantage of in
> SOLR-1880 [1], but it should work with the proposed distrib.singlePass
> optimization in SOLR-5768 [2]. Does that sound right?


Yep, the two together should do the trick.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


> --Gregg
>
> [1] https://issues.apache.org/jira/browse/SOLR-1880
> [2] https://issues.apache.org/jira/browse/SOLR-5768
>
>
> On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
>
>> You could try forcing things to go through function queries (via
>> pseudo-fields):
>>
>> fl=field(id), field(myfield)
>>
>> If you're not requesting any stored fields, that *might* currently
>> skip that step.
>>
>> -Yonik
>> http://heliosearch.org - native off-heap filters and fieldcache for solr
>>
>>
>> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com> wrote:
>> > We fetch a large number of documents -- 1000+ -- for each search. Each
>> > request fetches only the uniqueKey or the uniqueKey plus one secondary
>> > integer key. Despite this, we find that we spent a sizable amount of time
>> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
>> > fetching the two stored fields, LZ4 decoding, etc.
>> >
>> > I would love to be able to tell Solr to always fetch these two fields
>> from
>> > memory. We have them both in the fieldCache so we're already spending the
>> > RAM. I've seen this asked previously [1], so it seems like a fairly
>> common
>> > need, especially for distributed search. Any ideas?
>> >
>> > A few possible ideas I had:
>> >
>> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
>> > --Give the documentCache config a list of fields it should load from the
>> > fieldCache
>> >
>> >
>> > Having an in-memory mapping from docId->uniqueKey has come up for us
>> > before. We've used a custom SolrCache maintaining that mapping to quickly
>> > filter over personalized collections. Maybe the uniqueKey should be more
>> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
>> > maintained the docId->uniqueKey mapping in memory?
>> >
>> > --Gregg
>> >
>> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>>