You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/06/19 01:52:00 UTC
[jira] [Created] (DRILL-7301) Assertion failure in HashAgg with mem prediction off

Paul Rogers created DRILL-7301:
----------------------------------

             Summary: Assertion failure in HashAgg with mem prediction off
                 Key: DRILL-7301
                 URL: https://issues.apache.org/jira/browse/DRILL-7301
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.16.0
            Reporter: Paul Rogers
            Assignee: Boaz Ben-Zvi


DRILL-6951 revised the mock data source to use the new "EVF". A side effect is that the new version minimizes batch internal fragmentation (which is a good thing.) As it turns out, the {{TestHashAggrSpill}} unit tests based their spilling tests on total memory, included wasted internal fragmentation. After the upgrade to the mock data source, some of the {{TestHashAggrSpill}} tests failed because they no longer spilled.

The revised mock limits batch sizes to 10 MB by default. The code ensures that the largest vector, likely the one for {{empid_s17}}, is near 100% full.

Experimentation showed that doubling the row count provided sufficient memory usage to cause the operator to spill as requested. But, one test now fails with an assertion error:

{code:java}
  /**
   * Test with "needed memory" prediction turned off
   * (i.e., exercise code paths that catch OOMs from the Hash Table and recover)
   */
  @Test
  public void testNoPredictHashAggrSpill() throws Exception {
    testSpill(58_000_000, 16, 2, 2, false, false /* no prediction */, null,
        DEFAULT_ROW_COUNT, 1, 1, 1);
  }
{code}

Partial stack:

{noformat}
	at org.apache.drill.exec.physical.impl.common.HashTableTemplate.outputKeys(HashTableTemplate.java:910) ~[classes/:na]
	at org.apache.drill.exec.test.generated.HashAggregatorGen0.outputCurrentBatch(HashAggTemplate.java:1184) ~[na:na]
	at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:267) ~[classes/:na]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) ~[classes/:na]
{noformat}

Failure line:

{code:java}
  @Override
  public boolean outputKeys(int batchIdx, VectorContainer outContainer, int numRecords) {
    assert batchIdx < batchHolders.size(); // <-- Fails here
    return batchHolders.get(batchIdx).outputKeys(outContainer, numRecords);
  }
{code}

Perhaps the increase in row count forced the operator into an operating range with insufficient memory. If so, the test should have failed with some kind of OOM rather than an index assertion.

To test the low-memory theory, the memory limit was increased to {{60_000_000}}. Now the code failed at a different point:

{noformat}
	at org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:678) ~[classes/:na]
	at org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1337) ~[na:na]
	at org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:606) ~[na:na]
	at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:296) ~[classes/:na]
{noformat}

Code line:

{code:java}
  @Override
  public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode, int targetBatchRowCount) throws SchemaChangeException, RetryAfterSpillException {
    ...
    for ( int currentIndex = startIdx;
         ... {
      // remember the current link, which would be the last when the next link is empty
      lastEntryBatch = batchHolders.get((currentIndex >>> 16) & BATCH_MASK); // <-- Here
{code}

Increasing memory to {{62_000_000}} produced this error:

{noformat}
	at org.apache.drill.exec.physical.impl.common.HashTableTemplate.outputKeys(HashTableTemplate.java:910) ~[classes/:na]
	at org.apache.drill.exec.test.generated.HashAggregatorGen0.outputCurrentBatch(HashAggTemplate.java:1184) ~[na:na]
	at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:267) ~[classes/:na]
{noformat}

At the line shown for the first exception.

Increasing memory to {{64_000_000}} triggered the second error again.

Increasing memory to {{66_000_000}} triggered the first error again.

The errors recurred at memory (later jumping by 20M) up to 140 M, at which point the test failed because the query ran, but did not spill. The query fails with a memory limit of 130M.

At {{135_000_000}} the query works, but the returned row count is wrong:

{noformat}
java.lang.AssertionError: expected:<2400000> but was:<2334465>
{noformat}

At:

{code:java}
  private void runAndDump(...
      assertEquals(expectedRows, summary.recordCount());
{code}

There does seem to be something wrong with this code path. All other tests run fine with the new mock data source (and adjusted row counts.)

Have disabled the offending test until this bug can be fixed.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)