You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gregg Donovan <gr...@gmail.com> on 2014/02/25 03:58:30 UTC

Fetching uniqueKey and other int quickly from documentCache?

We fetch a large number of documents -- 1000+ -- for each search. Each
request fetches only the uniqueKey or the uniqueKey plus one secondary
integer key. Despite this, we find that we spent a sizable amount of time
in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
fetching the two stored fields, LZ4 decoding, etc.

I would love to be able to tell Solr to always fetch these two fields from
memory. We have them both in the fieldCache so we're already spending the
RAM. I've seen this asked previously [1], so it seems like a fairly common
need, especially for distributed search. Any ideas?

A few possible ideas I had:

--Check FieldCache.html#getCacheEntries() before going to stored fields.
--Give the documentCache config a list of fields it should load from the
fieldCache


Having an in-memory mapping from docId->uniqueKey has come up for us
before. We've used a custom SolrCache maintaining that mapping to quickly
filter over personalized collections. Maybe the uniqueKey should be more
optimized out of the box? Perhaps a custom "uniqueKey" codec that also
maintained the docId->uniqueKey mapping in memory?

--Gregg

[1] http://search-lucene.com/m/oCUKJ1heHUU1

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

I vaguely remember such a Jira issue but I can't find it now.

Gregg, can you open an issue? A patch would be even better.


On Tue, Feb 25, 2014 at 8:28 AM, Gregg Donovan <gr...@gmail.com> wrote:
> We fetch a large number of documents -- 1000+ -- for each search. Each
> request fetches only the uniqueKey or the uniqueKey plus one secondary
> integer key. Despite this, we find that we spent a sizable amount of time
> in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> fetching the two stored fields, LZ4 decoding, etc.
>
> I would love to be able to tell Solr to always fetch these two fields from
> memory. We have them both in the fieldCache so we're already spending the
> RAM. I've seen this asked previously [1], so it seems like a fairly common
> need, especially for distributed search. Any ideas?
>
> A few possible ideas I had:
>
> --Check FieldCache.html#getCacheEntries() before going to stored fields.
> --Give the documentCache config a list of fields it should load from the
> fieldCache
>
>
> Having an in-memory mapping from docId->uniqueKey has come up for us
> before. We've used a custom SolrCache maintaining that mapping to quickly
> filter over personalized collections. Maybe the uniqueKey should be more
> optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> maintained the docId->uniqueKey mapping in memory?
>
> --Gregg
>
> [1] http://search-lucene.com/m/oCUKJ1heHUU1



-- 
Regards,
Shalin Shekhar Mangar.

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Gregg Donovan <gr...@gmail.com>.

Yonik,

Requesting
fl=unique_key:field(unique_key),secondary_key:field(secondary_key),score vs
fl=unique_key,secondary_key,score was a nice performance win, as unique_key
and secondary_key were both already in the fieldCache. We removed our
documentCache, in fact, as it got very such little use.

We do see a code path that fetches stored fields, though, in
BinaryResponseWriter, for the case of *only* pseudo-fields being requested.
I opened a ticket and attached a patch to
https://issues.apache.org/jira/browse/SOLR-5968.




On Mon, Mar 3, 2014 at 11:30 AM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan <gr...@gmail.com> wrote:
> > Yonik,
> >
> > That's a very clever idea. Unfortunately, I think that will skip the
> > distributed query optimization we were hoping to take advantage of in
> > SOLR-1880 [1], but it should work with the proposed distrib.singlePass
> > optimization in SOLR-5768 [2]. Does that sound right?
>
>
> Yep, the two together should do the trick.
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
> > --Gregg
> >
> > [1] https://issues.apache.org/jira/browse/SOLR-1880
> > [2] https://issues.apache.org/jira/browse/SOLR-5768
> >
> >
> > On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com>
> wrote:
> >
> >> You could try forcing things to go through function queries (via
> >> pseudo-fields):
> >>
> >> fl=field(id), field(myfield)
> >>
> >> If you're not requesting any stored fields, that *might* currently
> >> skip that step.
> >>
> >> -Yonik
> >> http://heliosearch.org - native off-heap filters and fieldcache for
> solr
> >>
> >>
> >> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com>
> wrote:
> >> > We fetch a large number of documents -- 1000+ -- for each search. Each
> >> > request fetches only the uniqueKey or the uniqueKey plus one secondary
> >> > integer key. Despite this, we find that we spent a sizable amount of
> time
> >> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> >> > fetching the two stored fields, LZ4 decoding, etc.
> >> >
> >> > I would love to be able to tell Solr to always fetch these two fields
> >> from
> >> > memory. We have them both in the fieldCache so we're already spending
> the
> >> > RAM. I've seen this asked previously [1], so it seems like a fairly
> >> common
> >> > need, especially for distributed search. Any ideas?
> >> >
> >> > A few possible ideas I had:
> >> >
> >> > --Check FieldCache.html#getCacheEntries() before going to stored
> fields.
> >> > --Give the documentCache config a list of fields it should load from
> the
> >> > fieldCache
> >> >
> >> >
> >> > Having an in-memory mapping from docId->uniqueKey has come up for us
> >> > before. We've used a custom SolrCache maintaining that mapping to
> quickly
> >> > filter over personalized collections. Maybe the uniqueKey should be
> more
> >> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> >> > maintained the docId->uniqueKey mapping in memory?
> >> >
> >> > --Gregg
> >> >
> >> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
> >>
>

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan <gr...@gmail.com> wrote:
> Yonik,
>
> That's a very clever idea. Unfortunately, I think that will skip the
> distributed query optimization we were hoping to take advantage of in
> SOLR-1880 [1], but it should work with the proposed distrib.singlePass
> optimization in SOLR-5768 [2]. Does that sound right?


Yep, the two together should do the trick.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


> --Gregg
>
> [1] https://issues.apache.org/jira/browse/SOLR-1880
> [2] https://issues.apache.org/jira/browse/SOLR-5768
>
>
> On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
>
>> You could try forcing things to go through function queries (via
>> pseudo-fields):
>>
>> fl=field(id), field(myfield)
>>
>> If you're not requesting any stored fields, that *might* currently
>> skip that step.
>>
>> -Yonik
>> http://heliosearch.org - native off-heap filters and fieldcache for solr
>>
>>
>> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com> wrote:
>> > We fetch a large number of documents -- 1000+ -- for each search. Each
>> > request fetches only the uniqueKey or the uniqueKey plus one secondary
>> > integer key. Despite this, we find that we spent a sizable amount of time
>> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
>> > fetching the two stored fields, LZ4 decoding, etc.
>> >
>> > I would love to be able to tell Solr to always fetch these two fields
>> from
>> > memory. We have them both in the fieldCache so we're already spending the
>> > RAM. I've seen this asked previously [1], so it seems like a fairly
>> common
>> > need, especially for distributed search. Any ideas?
>> >
>> > A few possible ideas I had:
>> >
>> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
>> > --Give the documentCache config a list of fields it should load from the
>> > fieldCache
>> >
>> >
>> > Having an in-memory mapping from docId->uniqueKey has come up for us
>> > before. We've used a custom SolrCache maintaining that mapping to quickly
>> > filter over personalized collections. Maybe the uniqueKey should be more
>> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
>> > maintained the docId->uniqueKey mapping in memory?
>> >
>> > --Gregg
>> >
>> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>>

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Gregg Donovan <gr...@gmail.com>.

Yonik,

That's a very clever idea. Unfortunately, I think that will skip the
distributed query optimization we were hoping to take advantage of in
SOLR-1880 [1], but it should work with the proposed distrib.singlePass
optimization in SOLR-5768 [2]. Does that sound right?

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-1880
[2] https://issues.apache.org/jira/browse/SOLR-5768


On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> You could try forcing things to go through function queries (via
> pseudo-fields):
>
> fl=field(id), field(myfield)
>
> If you're not requesting any stored fields, that *might* currently
> skip that step.
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com> wrote:
> > We fetch a large number of documents -- 1000+ -- for each search. Each
> > request fetches only the uniqueKey or the uniqueKey plus one secondary
> > integer key. Despite this, we find that we spent a sizable amount of time
> > in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> > fetching the two stored fields, LZ4 decoding, etc.
> >
> > I would love to be able to tell Solr to always fetch these two fields
> from
> > memory. We have them both in the fieldCache so we're already spending the
> > RAM. I've seen this asked previously [1], so it seems like a fairly
> common
> > need, especially for distributed search. Any ideas?
> >
> > A few possible ideas I had:
> >
> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
> > --Give the documentCache config a list of fields it should load from the
> > fieldCache
> >
> >
> > Having an in-memory mapping from docId->uniqueKey has come up for us
> > before. We've used a custom SolrCache maintaining that mapping to quickly
> > filter over personalized collections. Maybe the uniqueKey should be more
> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> > maintained the docId->uniqueKey mapping in memory?
> >
> > --Gregg
> >
> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>

Re: Fetching uniqueKey and other int quickly from documentCache?

Posted by Yonik Seeley <yo...@heliosearch.com>.

You could try forcing things to go through function queries (via pseudo-fields):

fl=field(id), field(myfield)

If you're not requesting any stored fields, that *might* currently
skip that step.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan <gr...@gmail.com> wrote:
> We fetch a large number of documents -- 1000+ -- for each search. Each
> request fetches only the uniqueKey or the uniqueKey plus one secondary
> integer key. Despite this, we find that we spent a sizable amount of time
> in SolrIndexSearcher#doc(int docId, Set<String> fields). Time is spent
> fetching the two stored fields, LZ4 decoding, etc.
>
> I would love to be able to tell Solr to always fetch these two fields from
> memory. We have them both in the fieldCache so we're already spending the
> RAM. I've seen this asked previously [1], so it seems like a fairly common
> need, especially for distributed search. Any ideas?
>
> A few possible ideas I had:
>
> --Check FieldCache.html#getCacheEntries() before going to stored fields.
> --Give the documentCache config a list of fields it should load from the
> fieldCache
>
>
> Having an in-memory mapping from docId->uniqueKey has come up for us
> before. We've used a custom SolrCache maintaining that mapping to quickly
> filter over personalized collections. Maybe the uniqueKey should be more
> optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> maintained the docId->uniqueKey mapping in memory?
>
> --Gregg
>
> [1] http://search-lucene.com/m/oCUKJ1heHUU1