You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2020/05/15 23:42:00 UTC

[jira] [Updated] (SOLR-14492) many json.facet aggregations can throw ArrayIndexOutOfBoundsException when using DVHASH due to incorrect resize impl

     [ https://issues.apache.org/jira/browse/SOLR-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris M. Hostetter updated SOLR-14492:
--------------------------------------
    Attachment: SOLR-14492.patch
        Status: Open  (was: Open)


The attached patch helps demonstrate this failure -- note that this only fails with such a small set of docs/values because TestJsonFacets includes...

{code}
    FacetFieldProcessorByHashDV.MAXIMUM_STARTING_TABLE_SIZE=2; // stress test resizing
{code}

Failure example...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestJsonFacets -Dtests.method=testMultiValuedBucketReHashing -Dtests.seed=ABA0B47A555426CA -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=lb-LU -Dtests.timezone=America/Regina -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   0.02s | TestJsonFacets.testMultiValuedBucketReHashing {p0=DV} <<<
   [junit4]    > Throwable #1: java.lang.ArrayIndexOutOfBoundsException: Index 7 out of bounds for length 2
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([ABA0B47A555426CA:CD61409929DFC726]:0)
   [junit4]    > 	at org.apache.solr.search.facet.SumAgg$SumSortedNumericAcc.collectValues(SumAgg.java:92)
   [junit4]    > 	at org.apache.solr.search.facet.DocValuesAcc.collect(DocValuesAcc.java:50)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessor.collectFirstPhase(FacetFieldProcessor.java:286)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessorByHashDV.collectValFirstPhase(FacetFieldProcessorByHashDV.java:430)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessorByHashDV$5.collect(FacetFieldProcessorByHashDV.java:393)
   [junit4]    > 	at org.apache.solr.search.DocSetUtil.collectSortedDocSet(DocSetUtil.java:278)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessorByHashDV.collectDocs(FacetFieldProcessorByHashDV.java:374)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessorByHashDV.calcFacets(FacetFieldProcessorByHashDV.java:248)
   [junit4]    > 	at org.apache.solr.search.facet.FacetFieldProcessorByHashDV.process(FacetFieldProcessorByHashDV.java:215)
   [junit4]    > 	at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
   [junit4]    > 	at org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:475)
   [junit4]    > 	at org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:432)
   [junit4]    > 	at org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
   [junit4]    > 	at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
   [junit4]    > 	at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:147)
{noformat}

----

Quick and dirty list of places where SlotAccs _seem_ to be using the 'Resizer' correctly...

{noformat}
$ grep '=\s*resizer.resize' src/java/org/apache/solr/search/facet/*
src/java/org/apache/solr/search/facet/HLLAgg.java:      sets = resizer.resize(sets, null);
src/java/org/apache/solr/search/facet/MinMaxAgg.java:      exists = resizer.resize(exists);
src/java/org/apache/solr/search/facet/MinMaxAgg.java:      slotOrd = resizer.resize(slotOrd, MISSING);
src/java/org/apache/solr/search/facet/PercentileAgg.java:      digests = resizer.resize(digests, null);
src/java/org/apache/solr/search/facet/PercentileAgg.java:      digests = resizer.resize(digests, null);
src/java/org/apache/solr/search/facet/PercentileAgg.java:      digests = resizer.resize(digests, null);
src/java/org/apache/solr/search/facet/RelatednessAgg.java:      slotvalues = resizer.resize(slotvalues, null);
src/java/org/apache/solr/search/facet/SlotAcc.java:    result = resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/SlotAcc.java:    result = resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/SlotAcc.java:    result = resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/SlotAcc.java:    counts = resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/SlotAcc.java:    this.counts = resizer.resize(this.counts, 0);
src/java/org/apache/solr/search/facet/SlotAcc.java:    this.sum = resizer.resize(this.sum, 0);
src/java/org/apache/solr/search/facet/SlotAcc.java:    this.counts = resizer.resize(this.counts, 0);
src/java/org/apache/solr/search/facet/SlotAcc.java:    this.result = resizer.resize(this.result, 0);
src/java/org/apache/solr/search/facet/SlotAcc.java:    result = resizer.resize(result, 0);
src/java/org/apache/solr/search/facet/UniqueAgg.java:      sets = resizer.resize(sets, null);
src/java/org/apache/solr/search/facet/UniqueBlockAgg.java:      lastSeenValuesPerSlot = resizer.resize(lastSeenValuesPerSlot, Integer.MIN_VALUE);
src/java/org/apache/solr/search/facet/UniqueSlotAcc.java:    arr = resizer.resize(arr, null);
src/java/org/apache/solr/search/facet/UniqueSlotAcc.java:      counts = resizer.resize(counts, 0);
{noformat}

Quick and dirty example of places where it seems very likely we have broken SlotAcc impls..

{noformat}
$ grep '^\s*resizer.resize' src/java/org/apache/solr/search/facet/*
src/java/org/apache/solr/search/facet/AvgAgg.java:      resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/AvgAgg.java:      resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/AvgAgg.java:      resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/CountValsAgg.java:      resizer.resize(result, 0);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(sum, 0);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/DocValuesAcc.java:    resizer.resize(sum, 0);
src/java/org/apache/solr/search/facet/MinMaxAgg.java:      resizer.resize(result, MISSING);
src/java/org/apache/solr/search/facet/MinMaxAgg.java:      resizer.resize(slotOrd, MISSING);
src/java/org/apache/solr/search/facet/UnInvertedFieldAcc.java:    resizer.resize(result, initialValue);
src/java/org/apache/solr/search/facet/UnInvertedFieldAcc.java:    resizer.resize(counts, 0);
src/java/org/apache/solr/search/facet/UnInvertedFieldAcc.java:    resizer.resize(sum, 0);
{noformat}


> many json.facet aggregations can throw ArrayIndexOutOfBoundsException when using DVHASH due to incorrect resize impl
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-14492
>                 URL: https://issues.apache.org/jira/browse/SOLR-14492
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-14492.patch
>
>
> It appears we have quite a few SlotAcc impls that don't properly implement resize: they ask the {{Resizer}} to resize their arrays, but throw away the result. (arrays can't be resized in place, the {{Resizer}} is designed to return a new replacment map, initializing empty values and/or mapping old indicies to new indicies)
> For many FacetFieldProcessors, this isn't (normally) a problem because they create their Accs using a "max upper bound" on the possible number of slots in advance -- and only use resize later to "shrink" the number of slots.
> But in the case of {{method:dvhash}} / FacetFieldProcessorByHashDV this processor starts out using a number of slots based on the size of the base DocSet (rounded up to the next power of 2) maxed out at 1024, and then _grows_ the SlotAccs if it encounters more values then that.
> This means that if the "base" context of the term facet is significantly smaller then the number of values in the docValues field being faceted on (ie: multiValued fields), then these problematic SlotAccs won't grow properly and you'll get ArrayIndexOutOfBoundsException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org