You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 张月祥 <zh...@calis.edu.cn> on 2014/05/23 18:14:27 UTC

Internals about "Too many values for UnInvertedField faceting on field xxx"

Could anybody tell us some internals about "Too many values for
UnInvertedField faceting on field xxx" ?

 

We have two solr servers.

 

Solr A :  

 

128G RAM, 60M docs, 2600 different terms with field “code”,  every term of
field “code” has fixed length 6.

the sum count of token of field “code” is 9 Billions. 

The total space used by field “code” is 50 Billions.

 

 

Solr B:  

 

128G RAM, 140M docs,1600 different terms with field “code” every term of
field “code” has fixed length 6.

the sum count of token of field “code” is 18 Billions

The total space of field “code” is 90 Billions.

 

 

When we do facet query “
q=*:*&wt=xml&indent=true&facet=true&facet.field=code”  

Solr B is OK,  BUT Solr A meets Exception with the message “Too many values
for UnInvertedField faceting on field code”.

 

Now we think the limitation of UnInvertedField is related with the number of
different terms with one field

 

Could anybody tell us some internals about this problem? We won’t to use
facet.method=enum because it ‘s too slow to use.

 

Thanks!


Re: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Mon, May 26, 2014 at 9:21 PM, 张月祥 <zh...@calis.edu.cn> wrote:
> Thanks a lot.
>
>> There are only 256 byte arrays to hold all of the ord data, and the
> pointers into those arrays are only 24 bits long.  That gets you back
> to 32 bits, or 4GB of ord data max.  It's practically less since you
> only have to overflow one array before the exception is thrown.
>
> What does the ord data mean? Term Id or Term-Document Relation or Document-Term Relation ?

Every document has a list of term numbers (term ords) associated with it.
The deltas between sorted term numbers are vInt encoded.

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache

RE: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"

Posted by 张月祥 <zh...@calis.edu.cn>.
Thanks a lot.

> There are only 256 byte arrays to hold all of the ord data, and the
pointers into those arrays are only 24 bits long.  That gets you back
to 32 bits, or 4GB of ord data max.  It's practically less since you
only have to overflow one array before the exception is thrown.

What does the ord data mean? Term Id or Term-Document Relation or Document-Term Relation ? 





Re: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Sat, May 24, 2014 at 9:50 PM, 张月祥 <zh...@calis.edu.cn> wrote:
> Thanks for your reply. I'll try it.
>
> We're  still interested in the real limitation about  "Too many values for
> UnInvertedField faceting on field xxx" .
>
> Could anybody tell us some internals about "Too many values for
> UnInvertedField faceting on field xxx" ?

There are only 256 byte arrays to hold all of the ord data, and the
pointers into those arrays are only 24 bits long.  That gets you back
to 32 bits, or 4GB of ord data max.  It's practically less since you
only have to overflow one array before the exception is thrown.

This faceting method is best for high numbers of unique values, but a
relatively low number of unique values per document.
I've been considering making an off-heap version for Heliosearch, and
maybe bump the limits a little at the same time...

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache

答复: Internals about "Too many values for UnInvertedField faceting on field xxx"

Posted by 张月祥 <zh...@calis.edu.cn>.
Thanks for your reply. I'll try it.

We're  still interested in the real limitation about  "Too many values for
UnInvertedField faceting on field xxx" .

Could anybody tell us some internals about "Too many values for
UnInvertedField faceting on field xxx" ?

-----邮件原件-----
发件人: Toke Eskildsen [mailto:te@statsbiblioteket.dk] 
发送时间: 2014年5月24日 0:26
收件人: solr-user@lucene.apache.org
主题: RE: Internals about "Too many values for UnInvertedField faceting on
field xxx"

张月祥 [zhangyx@calis.edu.cn] wrote:
> Could anybody tell us some internals about "Too many values for 
> UnInvertedField faceting on field xxx" ?

I must admit I do not fully understand it in detail, but it is a known
problem with Field Cache (facet.method=fc) faceting. The remedy is to use
DocValues, which does not have the same limitation. This should also result
in lower heap usage. You will have to re-index everything though.

We have successfully used DocValues on an index with 400M documents and 300M
unique values on a single facet field.

- Toke Eskildsen



RE: Internals about "Too many values for UnInvertedField faceting on field xxx"

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
张月祥 [zhangyx@calis.edu.cn] wrote:
> Could anybody tell us some internals about "Too many values for
> UnInvertedField faceting on field xxx" ?

I must admit I do not fully understand it in detail, but it is a known problem with Field Cache (facet.method=fc) faceting. The remedy is to use DocValues, which does not have the same limitation. This should also result in lower heap usage. You will have to re-index everything though.

We have successfully used DocValues on an index with 400M documents and 300M unique values on a single facet field.

- Toke Eskildsen