You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com> on 2013/11/14 18:03:08 UTC

facet method=enum and uninvertedfield limitations

I am running into performance problems with faceted queries.
If I do a 

q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0

I am getting an exception:
org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT
        at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
        at org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
        at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
        ...

I understand it's got something to do with a 24bit limit somewhere
in the code but I don't understand enough of it to be able to construct
a specialized index that can be queried with facet.method=enum.

A stripped down index still doesn't work.  It has exactly one
field CONTENT with 178,000 Terms and ~1 mio documents.  The top
ranking terms according to Luke are

1 413950	CONTENT	word1
2 321223	CONTENT	word2
3 299036	CONTENT	word3
4 276757	CONTENT	word4
...

How would we have to strip the index?

Thanks,
Michael

Re: facet method=enum and uninvertedfield limitations

Posted by Dmitry Kan <so...@gmail.com>.

What is the actual target speed you are pursuing? Is this for user
suggestions or something of that sort? Content based suggestions with
faceting and esp on 1.4 solr won't be lightning fast.

Have you looked at TermsComponent?
http://wiki.apache.org/solr/TermsComponent

By shingles, which in the rest of the world are more commonly called
ngrams, I meant a way of "compressing" the number of entities to iterate
through. Let's say if you only store bigrams or trigrams and facet based on
those (less in amount).

Dmitry


On Wed, Nov 20, 2013 at 6:10 PM, Lemke, Michael SZ/HZA-ZSW <
lemkemch@schaeffler.com> wrote:

> On Wednesday, November 20, 2013 7:37 AM, Dmitry Kan wrote:
>
> Thanks for your reply.
>
> >
> >Since you are faceting on a text field (is this correct?) you deal with a
> >lot of unique values in it.
>
> Yes, this is a text field and we experimented with reducing the index.  As
> I said in my original question the stripped down index had 178,000 terms
> and it (fc) still didn't work.  Is number of terms the relevant quantity?
>
> >So your best bet is enum method.
>
> Hm, yes, that works but I have to wait 4 minutes for the answer (with the
> original data).  Not good.
>
> >Also if you
> >are on solr 4x try building doc values in the index: this suits faceting
> >well.
>
> We are on Solr 1.4, so, no.
>
> >
> >Otherwise start from your spec once again. Can you use shingles instead?
>
> Possibly but I don't know shingles.  Although I'd prefer to use our
> original
> index we are trying to build a specialized index just for this sort of
> query but still don't know what to look for.
>
> A query like
>
>
>  q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>
> would give me the top ten results containing 'word' and something starting
> with 'a'.  That's what I want.  An empty facet.prefix should also work.
> Eventually, the query will be more complex containing other fields and
> filter queries but the basic function should be exactly like this.  How
> can we achieve this?
>
> Thanks,
> Michael
>
>
> >On 19 Nov 2013 17:44, "Lemke, Michael SZ/HZA-ZSW" <
> lemkemch@schaeffler.com>
> >wrote:
> >
> >> On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote:
> >>
> >> Judging from numerous replies this seems to be a tough question.
> >> Nevertheless, I'd really appreciate any help as we are stuck.
> >> We'd really like to know what in our index causes the facet.method=fc
> >> query to fail.
> >>
> >> Thanks,
> >> Michael
> >>
> >> >On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
> >> >>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
> >> >><le...@schaeffler.com> wrote:
> >> >>> I am running into performance problems with faceted queries.
> >> >>> If I do a
> >> >>>
> >> >>>
> >>
> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
> >> >>>
> >> >>> I am getting an exception:
> >> >>> org.apache.solr.common.SolrException: Too many values for
> >> UnInvertedField faceting on field CONTENT
> >> >>>         at
> >>
> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
> >> >>>         at
> >>
> org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
> >> >>>         at
> >>
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
> >> >>>         ...
> >> >>>
> >> >>> I understand it's got something to do with a 24bit limit somewhere
> >> >>> in the code but I don't understand enough of it to be able to
> construct
> >> >>> a specialized index that can be queried with facet.method=enum.
> >> >>
> >> >>You shouldn't need to do anything differently to try facet.method=enum
> >> >>(just replace facet.method=fc with facet.method=enum)
> >> >
> >> >This is true and facet.method=enum does work indeed.  The problem is
> >> >runtime.  In particular queries with an empty facet.prefix= run many
> >> >seconds if not minutes.  I initially asked about this here:
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C33EC3398272FBE47B64EE3B3E98F69A7614279DE@de011521.schaeffler.com%3E
> >> >
> >> >It was suggested that fc is much faster than enum and I'd like to
> >> >test that.  We are still fairly free to design the index such that
> >> >it performs well.  But to do that we need to understand what is
> >> >killing it.
> >> >
> >> >>
> >> >>You may also want to add the parameter
> >> >>facet.enum.cache.minDf=100000
> >> >>to lower memory usage by only usiing the filter cache for terms that
> >> >>match more than 100K docs.
> >> >
> >> >That helped a little, cut down my particular test from 10 sec to 5 sec.
> >> >But still too slow.  Mind you this is for an autosuggest feature.
> >> >
> >> >Thanks for your reply.
> >> >
> >> >Michael
> >> >
> >> >
> >>
> >>
>
>


-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

RE: facet method=enum and uninvertedfield limitations

Posted by "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com>.

On Wednesday, November 20, 2013 7:37 AM, Dmitry Kan wrote:

Thanks for your reply.

>
>Since you are faceting on a text field (is this correct?) you deal with a
>lot of unique values in it.

Yes, this is a text field and we experimented with reducing the index.  As
I said in my original question the stripped down index had 178,000 terms
and it (fc) still didn't work.  Is number of terms the relevant quantity?

>So your best bet is enum method. 

Hm, yes, that works but I have to wait 4 minutes for the answer (with the
original data).  Not good.

>Also if you
>are on solr 4x try building doc values in the index: this suits faceting
>well.

We are on Solr 1.4, so, no.

>
>Otherwise start from your spec once again. Can you use shingles instead?

Possibly but I don't know shingles.  Although I'd prefer to use our original
index we are trying to build a specialized index just for this sort of
query but still don't know what to look for.

A query like

 q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0

would give me the top ten results containing 'word' and something starting
with 'a'.  That's what I want.  An empty facet.prefix should also work.
Eventually, the query will be more complex containing other fields and
filter queries but the basic function should be exactly like this.  How
can we achieve this?

Thanks,
Michael


>On 19 Nov 2013 17:44, "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com>
>wrote:
>
>> On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote:
>>
>> Judging from numerous replies this seems to be a tough question.
>> Nevertheless, I'd really appreciate any help as we are stuck.
>> We'd really like to know what in our index causes the facet.method=fc
>> query to fail.
>>
>> Thanks,
>> Michael
>>
>> >On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
>> >>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
>> >><le...@schaeffler.com> wrote:
>> >>> I am running into performance problems with faceted queries.
>> >>> If I do a
>> >>>
>> >>>
>> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>> >>>
>> >>> I am getting an exception:
>> >>> org.apache.solr.common.SolrException: Too many values for
>> UnInvertedField faceting on field CONTENT
>> >>>         at
>> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
>> >>>         at
>> org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
>> >>>         at
>> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>> >>>         ...
>> >>>
>> >>> I understand it's got something to do with a 24bit limit somewhere
>> >>> in the code but I don't understand enough of it to be able to construct
>> >>> a specialized index that can be queried with facet.method=enum.
>> >>
>> >>You shouldn't need to do anything differently to try facet.method=enum
>> >>(just replace facet.method=fc with facet.method=enum)
>> >
>> >This is true and facet.method=enum does work indeed.  The problem is
>> >runtime.  In particular queries with an empty facet.prefix= run many
>> >seconds if not minutes.  I initially asked about this here:
>> >
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C33EC3398272FBE47B64EE3B3E98F69A7614279DE@de011521.schaeffler.com%3E
>> >
>> >It was suggested that fc is much faster than enum and I'd like to
>> >test that.  We are still fairly free to design the index such that
>> >it performs well.  But to do that we need to understand what is
>> >killing it.
>> >
>> >>
>> >>You may also want to add the parameter
>> >>facet.enum.cache.minDf=100000
>> >>to lower memory usage by only usiing the filter cache for terms that
>> >>match more than 100K docs.
>> >
>> >That helped a little, cut down my particular test from 10 sec to 5 sec.
>> >But still too slow.  Mind you this is for an autosuggest feature.
>> >
>> >Thanks for your reply.
>> >
>> >Michael
>> >
>> >
>>
>>

RE: facet method=enum and uninvertedfield limitations

Posted by Dmitry Kan <so...@gmail.com>.

Since you are faceting on a text field (is this correct?) you deal with a
lot of unique values in it. So your best bet is enum method. Also if you
are on solr 4x try building doc values in the index: this suits faceting
well.

Otherwise start from your spec once again. Can you use shingles instead?
On 19 Nov 2013 17:44, "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com>
wrote:

> On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote:
>
> Judging from numerous replies this seems to be a tough question.
> Nevertheless, I'd really appreciate any help as we are stuck.
> We'd really like to know what in our index causes the facet.method=fc
> query to fail.
>
> Thanks,
> Michael
>
> >On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
> >>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
> >><le...@schaeffler.com> wrote:
> >>> I am running into performance problems with faceted queries.
> >>> If I do a
> >>>
> >>>
> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
> >>>
> >>> I am getting an exception:
> >>> org.apache.solr.common.SolrException: Too many values for
> UnInvertedField faceting on field CONTENT
> >>>         at
> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
> >>>         at
> org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
> >>>         at
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
> >>>         ...
> >>>
> >>> I understand it's got something to do with a 24bit limit somewhere
> >>> in the code but I don't understand enough of it to be able to construct
> >>> a specialized index that can be queried with facet.method=enum.
> >>
> >>You shouldn't need to do anything differently to try facet.method=enum
> >>(just replace facet.method=fc with facet.method=enum)
> >
> >This is true and facet.method=enum does work indeed.  The problem is
> >runtime.  In particular queries with an empty facet.prefix= run many
> >seconds if not minutes.  I initially asked about this here:
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C33EC3398272FBE47B64EE3B3E98F69A7614279DE@de011521.schaeffler.com%3E
> >
> >It was suggested that fc is much faster than enum and I'd like to
> >test that.  We are still fairly free to design the index such that
> >it performs well.  But to do that we need to understand what is
> >killing it.
> >
> >>
> >>You may also want to add the parameter
> >>facet.enum.cache.minDf=100000
> >>to lower memory usage by only usiing the filter cache for terms that
> >>match more than 100K docs.
> >
> >That helped a little, cut down my particular test from 10 sec to 5 sec.
> >But still too slow.  Mind you this is for an autosuggest feature.
> >
> >Thanks for your reply.
> >
> >Michael
> >
> >
>
>

RE: facet method=enum and uninvertedfield limitations

Posted by "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com>.

On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote:

Judging from numerous replies this seems to be a tough question.
Nevertheless, I'd really appreciate any help as we are stuck.
We'd really like to know what in our index causes the facet.method=fc
query to fail.

Thanks,
Michael

>On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
>>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
>><le...@schaeffler.com> wrote:
>>> I am running into performance problems with faceted queries.
>>> If I do a
>>>
>>> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>>>
>>> I am getting an exception:
>>> org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT
>>>         at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
>>>         at org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
>>>         at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>>>         ...
>>>
>>> I understand it's got something to do with a 24bit limit somewhere
>>> in the code but I don't understand enough of it to be able to construct
>>> a specialized index that can be queried with facet.method=enum.
>>
>>You shouldn't need to do anything differently to try facet.method=enum
>>(just replace facet.method=fc with facet.method=enum)
>
>This is true and facet.method=enum does work indeed.  The problem is
>runtime.  In particular queries with an empty facet.prefix= run many
>seconds if not minutes.  I initially asked about this here:
>http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C33EC3398272FBE47B64EE3B3E98F69A7614279DE@de011521.schaeffler.com%3E
>
>It was suggested that fc is much faster than enum and I'd like to
>test that.  We are still fairly free to design the index such that
>it performs well.  But to do that we need to understand what is
>killing it.
>
>>
>>You may also want to add the parameter
>>facet.enum.cache.minDf=100000
>>to lower memory usage by only usiing the filter cache for terms that
>>match more than 100K docs.
>
>That helped a little, cut down my particular test from 10 sec to 5 sec.
>But still too slow.  Mind you this is for an autosuggest feature.
>
>Thanks for your reply.
>
>Michael
>
>

RE: facet method=enum and uninvertedfield limitations

Posted by "Lemke, Michael SZ/HZA-ZSW" <le...@schaeffler.com>.

On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
><le...@schaeffler.com> wrote:
>> I am running into performance problems with faceted queries.
>> If I do a
>>
>> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>>
>> I am getting an exception:
>> org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT
>>         at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
>>         at org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
>>         at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>>         ...
>>
>> I understand it's got something to do with a 24bit limit somewhere
>> in the code but I don't understand enough of it to be able to construct
>> a specialized index that can be queried with facet.method=enum.
>
>You shouldn't need to do anything differently to try facet.method=enum
>(just replace facet.method=fc with facet.method=enum)

This is true and facet.method=enum does work indeed.  The problem is
runtime.  In particular queries with an empty facet.prefix= run many
seconds if not minutes.  I initially asked about this here:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C33EC3398272FBE47B64EE3B3E98F69A7614279DE@de011521.schaeffler.com%3E

It was suggested that fc is much faster than enum and I'd like to
test that.  We are still fairly free to design the index such that
it performs well.  But to do that we need to understand what is
killing it.

>
>You may also want to add the parameter
>facet.enum.cache.minDf=100000
>to lower memory usage by only usiing the filter cache for terms that
>match more than 100K docs.

That helped a little, cut down my particular test from 10 sec to 5 sec.
But still too slow.  Mind you this is for an autosuggest feature.

Thanks for your reply.

Michael

Re: facet method=enum and uninvertedfield limitations

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
<le...@schaeffler.com> wrote:
> I am running into performance problems with faceted queries.
> If I do a
>
> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>
> I am getting an exception:
> org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT
>         at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
>         at org.apache.solr.request.UnInvertedField.&lt;init&gt;(UnInvertedField.java:178)
>         at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>         ...
>
> I understand it's got something to do with a 24bit limit somewhere
> in the code but I don't understand enough of it to be able to construct
> a specialized index that can be queried with facet.method=enum.

You shouldn't need to do anything differently to try facet.method=enum
(just replace facet.method=fc with facet.method=enum)

You may also want to add the parameter
facet.enum.cache.minDf=100000
to lower memory usage by only usiing the filter cache for terms that
match more than 100K docs.

-Yonik
http://heliosearch.com -- making solr shine