You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Elodie Sannier <el...@kelkoo.fr> on 2013/04/25 15:22:11 UTC

FieldCache insanity with field used as facet and group

Hello,

I am using the Lucene FieldCache with SolrCloud and I have "insane" instances with messages like:

VALUEMISMATCH: Multiple distinct value objects for SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713

All insane instances are for a field "merchantid" of type "int" used as facet and group field.

I'm using a custom SearchHandler which makes two sub-queries, a first query with group.field=merchantid and a second query with facet.field=merchantid.

When I'm using the parameter facet.method=enum, I don't have the insane instance but I'm not sure it is the good fix.

This insanity can have performance impact ?
How can I fix it ?

Elodie Sannier


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: FieldCache insanity with field used as facet and group

Posted by Elodie Sannier <el...@kelkoo.fr>.
I'm reproducing the problem with the 4.2.1 example with 2 shards.

1) started up solr shards, indexed the example data, and confirmed empty
fieldCaches
[sanniere@funlevel-dx example]$ java
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
[sanniere@funlevel-dx example2]$ java -Djetty.port=7574
-DzkHost=localhost:9983 -jar start.jar

2) used both grouping and faceting on the popularity field, then checked
the fieldcache insanity count
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity"
 > /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=popularity"
 > /dev/null
[sanniere@funlevel-dx example]$ curl -sS
"http://localhost:8983/solr/admin/mbeans?stats=true&key=fieldCache&wt=json&indent=true"
| grep "entries_count|insanity_count"
"entries_count":10,
"insanity_count":2,

"insanity#0":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_g(4.2.1):C1)+popularity\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#12129794\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n",
"insanity#1":"VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_f(4.2.1):C9)+popularity\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'=>'popularity',class
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1130715\n"}}},
"HIGHLIGHTING",{},
"OTHER",{}]}

I've updated https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 28.05.2013 10:22, Elodie Sannier a écrit :
> I've created https://issues.apache.org/jira/browse/SOLR-4866
>
> Elodie
>
> Le 07.05.2013 18:19, Chris Hostetter a écrit :
>> : I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
>> : with messages like:
>>
>> FWIW: I'm the one that named the result of these "sanity checks"
>> "FieldCacheInsantity" and i have regretted it ever since -- a better label
>> would have been "inconsistency"
>>
>> : VALUEMISMATCH: Multiple distinct value objects for
>> : SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
>> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
>> : org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
>> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
>> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
>> :
>> : All insane instances are for a field "merchantid" of type "int" used as facet
>> : and group field.
>>
>> Interesting: it appears that the grouping code and the facet code are not
>> being consistent in how they are building hte field cache, so you are
>> getting two objects in the cache for each segment
>>
>> I haven't checked if this happens much with the example configs, but if
>> you could: please file a bug with the details of which Solr version you
>> are using along with the schema fieldType&   filed declarations for your
>> merchantid field, along with the mbean stats output showing the field
>> cache insanity after executing two queries like...
>>
>> /select?q=*:*&facet=true&facet.field=merchantid
>> /select?q=*:*&group=true&group.field=merchantid
>>
>> (that way we can rule out your custom SearchComponent as having a bug in
>> it)
>>
>> : This insanity can have performance impact ?
>> : How can I fix it ?
>>
>> the impact is just that more ram is being used them is probably strictly
>> neccessary.  unless there is something unusual in your fieldType
>> delcataion, i don't think there is an easy fix you can apply -- we need to
>> fix the underlying code.
>>
>> -Hoss
>
> --
> Kelkoo
>
> *Elodie Sannier *Software engineer
>
> *E*elodie.sannier@kelkoo.fr<ma...@kelkoo.fr>
> *Y!Messenger* kelkooelodies
> *T* +33 (0)4 56 09 07 55 *M*
> *A* 4/6 Rue des Méridiens 38130 Echirolles
>
>
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.


--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sannier@kelkoo.fr <ma...@kelkoo.fr>
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: FieldCache insanity with field used as facet and group

Posted by Elodie Sannier <el...@kelkoo.fr>.
I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :
> : I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
> : with messages like:
>
> FWIW: I'm the one that named the result of these "sanity checks"
> "FieldCacheInsantity" and i have regretted it ever since -- a better label
> would have been "inconsistency"
>
> : VALUEMISMATCH: Multiple distinct value objects for
> : SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
> : org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
> : 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
> :
> : All insane instances are for a field "merchantid" of type "int" used as facet
> : and group field.
>
> Interesting: it appears that the grouping code and the facet code are not
> being consistent in how they are building hte field cache, so you are
> getting two objects in the cache for each segment
>
> I haven't checked if this happens much with the example configs, but if
> you could: please file a bug with the details of which Solr version you
> are using along with the schema fieldType&  filed declarations for your
> merchantid field, along with the mbean stats output showing the field
> cache insanity after executing two queries like...
>
> /select?q=*:*&facet=true&facet.field=merchantid
> /select?q=*:*&group=true&group.field=merchantid
>
> (that way we can rule out your custom SearchComponent as having a bug in
> it)
>
> : This insanity can have performance impact ?
> : How can I fix it ?
>
> the impact is just that more ram is being used them is probably strictly
> neccessary.  unless there is something unusual in your fieldType
> delcataion, i don't think there is an easy fix you can apply -- we need to
> fix the underlying code.
>
> -Hoss


--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sannier@kelkoo.fr <ma...@kelkoo.fr>
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: FieldCache insanity with field used as facet and group

Posted by Chris Hostetter <ho...@fucit.org>.
: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:

FWIW: I'm the one that named the result of these "sanity checks" 
"FieldCacheInsantity" and i have regretted it ever since -- a better label 
would have been "inconsistency"

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class
: org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
: All insane instances are for a field "merchantid" of type "int" used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not 
being consistent in how they are building hte field cache, so you are 
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if 
you could: please file a bug with the details of which Solr version you 
are using along with the schema fieldType & filed declarations for your 
merchantid field, along with the mbean stats output showing the field 
cache insanity after executing two queries like...

/select?q=*:*&facet=true&facet.field=merchantid
/select?q=*:*&group=true&group.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in 
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly 
neccessary.  unless there is something unusual in your fieldType 
delcataion, i don't think there is an easy fix you can apply -- we need to 
fix the underlying code.

-Hoss