You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "TP Boudreau (JIRA)" <ji...@apache.org> on 2019/06/15 20:23:00 UTC

[jira] [Created] (ARROW-5618) Using deprecated Int96 storage for timestamps triggers integer overflow in some cases

TP Boudreau created ARROW-5618:
----------------------------------

             Summary: Using deprecated Int96 storage for timestamps triggers integer overflow in some cases
                 Key: ARROW-5618
                 URL: https://issues.apache.org/jira/browse/ARROW-5618
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: TP Boudreau


When storing Arrow timestamps in Parquet files using the Int96 storage format, certain combinations of array lengths and validity bitmasks cause an integer overflow error on read.  It's not immediately clear whether the Arrow/Parquet writer is storing zeroes when it should be storing positive values or the reader is attempting to calculate a nanoseconds value inappropriately from zeroed inputs (perhaps missing the null bit flag).  Also not immediately clear why only certain length columns seem to be affected.

Probably the quickest way to reproduce this undefined behavior is to alter the existing unit test UseDeprecatedInt96 (in file .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling its column lengths (repeating the same values), followed by 'make unittest' using clang-7 with sanitizers enabled.  (Here's a patch applicable to current master that changes the test as described: [1]; I used the following cmake command to build my environment: [2].)  You should get a log something like [3].  If requested, I'll see if I can put together a stand-alone minimal test case that induces the behavior.

The quick-hack at [4] will prevent integer overflows, but this is only included to confirm the proximate cause of the bug: the Julian days field of the Int96 appears to be zero, when a strictly positive number is expected.

I've assigned the issue to myself and I'll start looking into the root cause of this.

[1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e
[2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9
[3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d
[4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)