You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2019/06/21 14:48:00 UTC

[jira] [Created] (ORC-517) Incorrect statistics written for decimal values

Nandor Kollar created ORC-517:
---------------------------------

             Summary: Incorrect statistics written for decimal values
                 Key: ORC-517
                 URL: https://issues.apache.org/jira/browse/ORC-517
             Project: ORC
          Issue Type: Bug
    Affects Versions: 1.5.0
            Reporter: Nandor Kollar


I came across with the following problem with min-max statistics while writing test cases for ORC with Spark (latest master). I created an table stored as ORC with a single decimal field, added a couple of negative number to this table, and used ORC tools to print the details of the ORC file created. I noticed that despite the minimum value was correct, the maximum was 0 (instead of the largest negative number added). To better understand the problem, here is a unit test to demonstrate it:

{code}
  @Test
  public void testDecimalMinMaxStatistics() throws Exception {
    TypeDescription schema = TypeDescription.createDecimal()
      .withScale(2).withPrecision(7);

    Writer writer = OrcFile.createWriter(testFilePath,
      OrcFile.writerOptions(conf).setSchema(schema).stripeSize(100000)
        .bufferSize(10000));
    VectorizedRowBatch batch = new VectorizedRowBatch(1, 1024);

    DecimalColumnVector decimalColumnVector = new DecimalColumnVector(7, 2);
    batch.cols[0] = decimalColumnVector;
    batch.reset();
    batch.size = 2;

    decimalColumnVector.set(0, new HiveDecimalWritable("-99999.99"));
    decimalColumnVector.set(1, new HiveDecimalWritable("-88888.88"));
    writer.addRowBatch(batch);
    writer.close();

    Reader reader = OrcFile.createReader(testFilePath,
      OrcFile.readerOptions(conf).filesystem(fs));
    DecimalColumnStatistics statistics = (DecimalColumnStatistics) reader.getStatistics()[0];
    assertEquals("Incorrect maximum value", new BigDecimal("-99999.99"), statistics.getMinimum().bigDecimalValue());
    assertEquals("Incorrect minimum value", new BigDecimal("-88888.88"), statistics.getMaximum().bigDecimalValue());
  }
{code}

Note, that this test fails only on 1.5 and master, and passes on 1.4 branch. Am I doing something wrong here? If this is indeed a bug, I don't think this causes correctness problems, but might be source of performance regression in case min-max stats are used with predicate pushdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)