You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jay (JIRA)" <ji...@apache.org> on 2018/04/18 07:18:00 UTC
[jira] [Commented] (SOLR-7867) implicit sharded, facet grouping problem with multivalued string field starting with digits

    [ https://issues.apache.org/jira/browse/SOLR-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442022#comment-16442022 ] 

Jay commented on SOLR-7867:
---------------------------

I am seeing similar error in Solr 5.3 & 6.6.3. In my case, the error happens intermittently and all the values are alphanumeric (don't start with digit as reported above). Reindexing the document seems to address the issue. Have not been able to verify if it happens after reindexing it.

> implicit sharded, facet grouping problem with multivalued string field starting with digits
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7867
>                 URL: https://issues.apache.org/jira/browse/SOLR-7867
>             Project: Solr
>          Issue Type: Bug
>          Components: faceting, SolrCloud
>    Affects Versions: 5.2
>         Environment: 3.13.0-48-generic #80-Ubuntu SMP x86_64 GNU/Linux
> java version "1.7.0_80"
> Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
>            Reporter: Umut Erogul
>            Priority: Major
>              Labels: docValues, facet, group, sharding
>         Attachments: DocValuesException.PNG, ErrorReadingDocValues.PNG
>
>
> related parts @ schema.xml:
> {code}<field name="keyword_ss" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
> <field name="author_s" type="string" indexed="true" stored="true" docValues="true"/>{code}
> every document has valid author_s and keyword_ss fields;
> we can make successful facet group queries on single node, single collection, solr-4.9.0 server
> {code}
> q: *:* fq: keyword_ss:3m
> facet=true&facet.field=keyword_ss&group=true&group.field=author_s&group.facet=true
> {code}
> when querying on solr-5.2.0 server with implicit sharded environment with:
> {code}<!-- router.field -->
> <field name="shard_name" type="string" indexed="true" stored="true" required="true"/>{code}
> with example shard names; affinity1 affinity2 affinity3 affinity4
> the same query with same documents gets:
> {code}
> ERROR - 2015-08-04 08:15:15.222; [document affinity3 core_node32 document_affinity3_replica2] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception during facet.field: keyword_ss
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
>         at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
> ...
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>         at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene50DocValuesProducer.java:1008)
>         at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene50DocValuesProducer.java:1026)
>         at org.apache.lucene.search.grouping.term.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:373)
>         at org.apache.lucene.search.grouping.AbstractGroupFacetCollector.mergeSegmentResults(AbstractGroupFacetCollector.java:91)
>         at org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:541)
>         at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:463)
>         at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:386)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:626)
>         ... 33 more
> {code}
> all the problematic queries are caused by strings starting with digits; ("3m", "8 saniye", "2 broke girls", "1v1y")
> there are some strings that the query works like ("24", "90+", "45 dakika")
> we do not observe the problem when querying with 
> -keyword_ss:(0-9)*
> updating the problematic documents (a small subset of keyword_ss:(0-9)*), fixes the query, 
> but we cannot find an easy solution to find the problematic documents
> there is around 400m docs; seperated at 28 shards; 
> -keyword_ss:(0-9)* matches %97 of documents



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org