You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jonathan Gonzalez (JIRA)" <ji...@apache.org> on 2015/08/06 02:01:06 UTC

[jira] [Commented] (SOLR-7867) implicit sharded, facet grouping problem with multivalued string field starting with digits

    [ https://issues.apache.org/jira/browse/SOLR-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659224#comment-14659224 ] 

Jonathan Gonzalez commented on SOLR-7867:
-----------------------------------------

The problem rely on docValues attribute, for some reason the dvd file becomes corrupted after several incremental feeding,  I'm able to reproduce this problem and fix it by disabling the docValues attribute docValues=false.

Fields definition:
{code}
<field name="fieldForGrouping" type="int" indexed="true" stored="false" multiValued="false" omitNorms="true" termVectors="false" termPositions="false" docValues="false"/>
<field name="fieldForFacet" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="false" termPositions="false" docValues="true"/>
{code}

Query:
The query is using &group.field=<fieldForGrouping>&group.facet=true and a simple facet like:
{code}
&facet.field={!key=FacetKey_12345678%20facet.prefix=12345678}fieldForFacet
{code}

The following image, shows Solr reading the index file of type dvd (Per-Document Values .dvd, .dvm - Encodes additional scoring factors or other per-document information. https://lucene.apache.org/core/5_2_0/core/org/apache/lucene/codecs/lucene50/Lucene50DocValuesFormat.html), enabled by the docValues=true. (https://cwiki.apache.org/confluence/display/solr/DocValues)
!ErrorReadingDocValues.PNG!

Then trying to read the facet.prefix value from this dvd file, there is an attempt to read more than the current buffer size causing this issue:
!DocValuesException.PNG!

I hope it helps!


> implicit sharded, facet grouping problem with multivalued string field starting with digits
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7867
>                 URL: https://issues.apache.org/jira/browse/SOLR-7867
>             Project: Solr
>          Issue Type: Bug
>          Components: faceting, SolrCloud
>    Affects Versions: 5.2
>         Environment: 3.13.0-48-generic #80-Ubuntu SMP x86_64 GNU/Linux
> java version "1.7.0_80"
> Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
>            Reporter: Umut Erogul
>              Labels: docValues, facet, group, sharding
>         Attachments: DocValuesException.PNG, ErrorReadingDocValues.PNG
>
>
> related parts @ schema.xml:
> {code}<field name="keyword_ss" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
> <field name="author_s" type="string" indexed="true" stored="true" docValues="true"/>{code}
> every document has valid author_s and keyword_ss fields;
> we can make successful facet group queries on single node, single collection, solr-4.9.0 server
> {code}
> q: *:* fq: keyword_ss:3m
> facet=true&facet.field=keyword_ss&group=true&group.field=author_s&group.facet=true
> {code}
> when querying on solr-5.2.0 server with implicit sharded environment with:
> {code}<!-- router.field -->
> <field name="shard_name" type="string" indexed="true" stored="true" required="true"/>{code}
> with example shard names; affinity1 affinity2 affinity3 affinity4
> the same query with same documents gets:
> {code}
> ERROR - 2015-08-04 08:15:15.222; [document affinity3 core_node32 document_affinity3_replica2] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception during facet.field: keyword_ss
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
>         at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
> ...
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>         at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene50DocValuesProducer.java:1008)
>         at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene50DocValuesProducer.java:1026)
>         at org.apache.lucene.search.grouping.term.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:373)
>         at org.apache.lucene.search.grouping.AbstractGroupFacetCollector.mergeSegmentResults(AbstractGroupFacetCollector.java:91)
>         at org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:541)
>         at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:463)
>         at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:386)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:626)
>         ... 33 more
> {code}
> all the problematic queries are caused by strings starting with digits; ("3m", "8 saniye", "2 broke girls", "1v1y")
> there are some strings that the query works like ("24", "90+", "45 dakika")
> we do not observe the problem when querying with 
> -keyword_ss:(0-9)*
> updating the problematic documents (a small subset of keyword_ss:(0-9)*), fixes the query, 
> but we cannot find an easy solution to find the problematic documents
> there is around 400m docs; seperated at 28 shards; 
> -keyword_ss:(0-9)* matches %97 of documents



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org