You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Ryan <mr...@moreover.com> on 2011/10/19 20:05:15 UTC

How to make UnInvertedField faster?

I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
faster, or other alternatives for generating facets quickly.

The vast majority of the CPU time for our Solr instances is spent generating
UnInvertedFields after each commit. Here's an example of one of our slower fields:

[2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652,
time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

That is from an index with approximately 8 million documents. After each commit,
it takes on average about 90 seconds to uninvert all the fields that we facet on.

Any ideas at all would be greatly appreciated.

-Michael

Re: How to make UnInvertedField faster?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer
<si...@googlemail.com> wrote:
> On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Well... the limitation of DocValues is that it cannot handle more than
>> one value per document (which UnInvertedField can).
>
> you can pack this into one byte[] or use more than one field? I don't
> see a real limitation here.

Well... not very easily?

UnInvertedField (DocTermOrds in Lucene) is the same as DocValues'
BYTES_VAR_SORTED.

So for an app to do this "on top" it'd have to handle the term -> ord
resolving itself, save that somewhere, then encode the multiple ords
into a byte[].

I agree for other simple types (no deref/sorting involved) an app
could pack them into its own byte[] that's otherwise opaque to Lucene.

Mike McCandless

http://blog.mikemccandless.com

Re: How to make UnInvertedField faster?

Posted by Simon Willnauer <si...@googlemail.com>.
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Well... the limitation of DocValues is that it cannot handle more than
> one value per document (which UnInvertedField can).

you can pack this into one byte[] or use more than one field? I don't
see a real limitation here.

simon
>
> Hopefully we can fix that at some point :)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
> <si...@googlemail.com> wrote:
>> In trunk we have a feature called IndexDocValues which basically
>> creates the uninverted structure at index time. You can then simply
>> suck that into memory or even access it on disk directly
>> (RandomAccess). Even if I can't help you right now this is certainly
>> going to help you here. There is no need to uninvert at all anymore in
>> lucene 4.0
>>
>> simon
>>
>> On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan <mr...@moreover.com> wrote:
>>> I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
>>> faster, or other alternatives for generating facets quickly.
>>>
>>> The vast majority of the CPU time for our Solr instances is spent generating
>>> UnInvertedFields after each commit. Here's an example of one of our slower fields:
>>>
>>> [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
>>> UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652,
>>> time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}
>>>
>>> That is from an index with approximately 8 million documents. After each commit,
>>> it takes on average about 90 seconds to uninvert all the fields that we facet on.
>>>
>>> Any ideas at all would be greatly appreciated.
>>>
>>> -Michael
>>>
>>
>

Re: How to make UnInvertedField faster?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well... the limitation of DocValues is that it cannot handle more than
one value per document (which UnInvertedField can).

Hopefully we can fix that at some point :)

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
<si...@googlemail.com> wrote:
> In trunk we have a feature called IndexDocValues which basically
> creates the uninverted structure at index time. You can then simply
> suck that into memory or even access it on disk directly
> (RandomAccess). Even if I can't help you right now this is certainly
> going to help you here. There is no need to uninvert at all anymore in
> lucene 4.0
>
> simon
>
> On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan <mr...@moreover.com> wrote:
>> I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
>> faster, or other alternatives for generating facets quickly.
>>
>> The vast majority of the CPU time for our Solr instances is spent generating
>> UnInvertedFields after each commit. Here's an example of one of our slower fields:
>>
>> [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
>> UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652,
>> time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}
>>
>> That is from an index with approximately 8 million documents. After each commit,
>> it takes on average about 90 seconds to uninvert all the fields that we facet on.
>>
>> Any ideas at all would be greatly appreciated.
>>
>> -Michael
>>
>

Re: How to make UnInvertedField faster?

Posted by Jason Rutherglen <ja...@gmail.com>.
Sweet + Very cool!

On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> In trunk we have a feature called IndexDocValues which basically
> creates the uninverted structure at index time. You can then simply
> suck that into memory or even access it on disk directly
> (RandomAccess). Even if I can't help you right now this is certainly
> going to help you here. There is no need to uninvert at all anymore in
> lucene 4.0
>
> simon
>
> On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan <mr...@moreover.com> wrote:
> > I was wondering if anyone has any ideas for making
> UnInvertedField.uninvert()
> > faster, or other alternatives for generating facets quickly.
> >
> > The vast majority of the CPU time for our Solr instances is spent
> generating
> > UnInvertedFields after each commit. Here's an example of one of our
> slower fields:
> >
> > [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
> > UnInverted multi-valued field
> {field=authorCS,memSize=38063628,tindexSize=422652,
> >
> time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}
> >
> > That is from an index with approximately 8 million documents. After each
> commit,
> > it takes on average about 90 seconds to uninvert all the fields that we
> facet on.
> >
> > Any ideas at all would be greatly appreciated.
> >
> > -Michael
> >
>

Re: How to make UnInvertedField faster?

Posted by Simon Willnauer <si...@googlemail.com>.
In trunk we have a feature called IndexDocValues which basically
creates the uninverted structure at index time. You can then simply
suck that into memory or even access it on disk directly
(RandomAccess). Even if I can't help you right now this is certainly
going to help you here. There is no need to uninvert at all anymore in
lucene 4.0

simon

On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan <mr...@moreover.com> wrote:
> I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
> faster, or other alternatives for generating facets quickly.
>
> The vast majority of the CPU time for our Solr instances is spent generating
> UnInvertedFields after each commit. Here's an example of one of our slower fields:
>
> [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
> UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652,
> time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}
>
> That is from an index with approximately 8 million documents. After each commit,
> it takes on average about 90 seconds to uninvert all the fields that we facet on.
>
> Any ideas at all would be greatly appreciated.
>
> -Michael
>