You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "farkmarnum (via GitHub)" <gi...@apache.org> on 2023/05/20 21:07:52 UTC

[GitHub] [arrow] farkmarnum opened a new issue, #35700: [JS] pre-epoch Timestamps are off by a fixed amount

farkmarnum opened a new issue, #35700:
URL: https://github.com/apache/arrow/issues/35700

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   While constructing a `Vector` of `Timestamp` values, any pre-epoch values end up offset by approximately `2^32 * 10^(3n)`, where `n` depends on the precision. 
   
   <details>
   <summary>
   I've written a script for minimal repro using the latest version of the apache-arrow NPM package (12.0.0).
   </summary>
   
   ```javascript
   // test.mjs
   import {
     Date_,
     makeBuilder,
     Timestamp,
     TimeUnit,
     DateUnit,
   } from "apache-arrow";
   
   /**
    * @param {Date} value
    * @param {TimeUnit} precision
    * @returns {number}
    */
   const testTimestamp = (value, precision) => {
     const columnBuilder = makeBuilder({ type: new Timestamp(precision) });
     columnBuilder.append(value);
     const vec = columnBuilder.finish().toVector();
     const valueInVec = Array.from(vec)[0];
   
     return +value - +valueInVec;
   };
   
   console.log("Testing TIMESTAMP\n");
   
   [
     ["2000-01-01", TimeUnit.NANOSECOND],
     ["2000-01-01", TimeUnit.MICROSECOND],
     ["2000-01-01", TimeUnit.MILLISECOND],
     ["2000-01-01", TimeUnit.SECOND],
     [],
     ["1900-01-01", TimeUnit.NANOSECOND],
     ["1900-01-01", TimeUnit.MICROSECOND],
     ["1900-01-01", TimeUnit.MILLISECOND],
     ["1900-01-01", TimeUnit.SECOND],
     [],
     ["1969-12-31 23:59:59Z", TimeUnit.NANOSECOND],
     ["1969-01-01", TimeUnit.NANOSECOND],
     ["1900-01-01", TimeUnit.NANOSECOND],
     ["1800-01-01", TimeUnit.NANOSECOND],
     ["1700-01-01", TimeUnit.NANOSECOND],
   ].forEach(([dateStr, precision]) => {
     if (!dateStr) {
       console.log();
       return;
     }
     const outcome = testTimestamp(new Date(dateStr), precision);
     const label = `${dateStr} w/ ${TimeUnit[precision]}`;
     console.log(`${label.padEnd(40)} => ${outcome}`);
   });
   
   console.log("\n\nTesting DATE\n");
   
   /**
    * @param {Date} value
    * @returns {Date}
    */
   const testDate = (value, precision) => {
     const columnBuilder = makeBuilder({ type: new Date_(DateUnit.DAY) });
     columnBuilder.append(value);
     const vec = columnBuilder.finish().toVector();
     const valueInVec = Array.from(vec)[0];
   
     return valueInVec;
   };
   
   [
     ["1969-12-15 01:00:00Z", "rounded up"],
     ["1969-12-31 01:00:00Z", "rounded up"],
     ["1970-01-01 00:00:00Z", "rounded down"],
     ["1970-01-01 23:00:00Z", "rounded down"],
     ["1970-01-15 01:00:00Z", "rounded down"],
   ].forEach(([dateStr, label]) => {
     const date = new Date(dateStr);
     const outcome = testDate(date);
     console.log(`${date.toISOString()} => ${outcome.toISOString()} (${label})`);
   });
   ```
   </details>
   
   Here's the output:
   ```
   Testing TIMESTAMP
   
   2000-01-01 w/ NANOSECOND                 => 0
   2000-01-01 w/ MICROSECOND                => 0
   2000-01-01 w/ MILLISECOND                => 0
   2000-01-01 w/ SECOND                     => 0
   
   1900-01-01 w/ NANOSECOND                 => -4294.96728515625
   1900-01-01 w/ MICROSECOND                => -4294967.2958984375
   1900-01-01 w/ MILLISECOND                => -4294967296
   1900-01-01 w/ SECOND                     => -4294967296000
   
   1969-12-31 23:59:59Z w/ NANOSECOND       => -4294.967296
   1969-01-01 w/ NANOSECOND                 => -4294.967296600342
   1900-01-01 w/ NANOSECOND                 => -4294.96728515625
   1800-01-01 w/ NANOSECOND                 => -4294.9677734375
   1700-01-01 w/ NANOSECOND                 => -4294.966796875
   
   
   Testing DATE
   
   1969-12-15T01:00:00.000Z => 1969-12-16T00:00:00.000Z (rounded up)
   1969-12-31T01:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded up)
   1970-01-01T00:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded down)
   1970-01-01T23:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded down)
   1970-01-15T01:00:00.000Z => 1970-01-15T00:00:00.000Z (rounded down)
   ```
   
   As you can see, for the `Timestamp` type, values post-epoch pass through unscathed, but values pre-epoch end up off by a fixed increment of:
   - approximately `2^32 / 1,000,000` for `NANOSECOND` precision
   - approximately `2^32 / 1,000` for `MICROSECOND` precision
   - `2^32` for `MILLISECOND` precision
   - `2^32 * 1,000` for `SECOND` precision
   
   Note: I say "fixed", but the amount actually varies slightly when it's a float (for `NANOSECOND` and `MICROSECOND` precisions), as you can see in the last block of tests for `Timestamp` -- it varies a bit, seemingly varying more as the value gets further from the epoch.
   
   It seems like the Arrow `Timestamp` type is a 64-bit int of {precision} since the epoch, but is represented as two 32-bit ints in JS. Something is getting messed up for pre-epoch timestamps, which have a negative number for their internal representation. I'm guessing it has to do with 32bit arithmetic and/or float precision issues.
   
   Additionally, for the `Date_` type, values post-epoch seem to "round down" to the nearest day properly, but values pre-epoch seem to "round up" to the next day. I'm guessing that this is a related issue.
   
   ### Component(s)
   
   JavaScript


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] farkmarnum commented on issue #35700: [JS] pre-epoch Timestamps are off by a fixed amount

Posted by "farkmarnum (via GitHub)" <gi...@apache.org>.
farkmarnum commented on issue #35700:
URL: https://github.com/apache/arrow/issues/35700#issuecomment-1556015654

   It looks like this was introduced by https://github.com/apache/arrow/commit/cde18a6ba525d090d11ac6f3563370dce60d03a1 to resolve [Read timestamp low bits as Uint32s](https://github.com/apache/arrow/pull/1678).
   
   The problematic code is [here](https://github.com/apache/arrow/blob/65520b361941e9abad386614999dbc250295959e/js/src/visitor/get.ts#L100):
   ```javascript
   const epochMillisecondsLongToMs = (data: Int32Array, index: number) =>
     4294967296 * data[index + 1] + (data[index] >>> 0);
   ```
   and related code is [here](https://github.com/apache/arrow/blob/65520b361941e9abad386614999dbc250295959e/js/src/visitor/set.ts#LL104C8-L107C3):
   ```javascript
   const setEpochMsToMillisecondsLong = (data: Int32Array, index: number, epochMs: number) => {
       data[index] = Math.trunc(epochMs % 4294967296);
       data[index + 1] = Math.trunc(epochMs / 4294967296);
   };
   ```
   
   The code is transforming the 64bit `epochMs` value into two 32bit values, and then transforming them back. We can reduce these two steps to a simple test like this:
   ```javascript
   const toInt32sAndBack = (epochMs) => {
     data = [];
     data[0] = Math.trunc(epochMs % 4294967296);
     data[1] = Math.trunc(epochMs / 4294967296);
     return 4294967296 * data[1] + (data[0] >>> 0)
   };
   
   const test = (epochMs) => toInt32sAndBack(epochMs) - epochMs;
   
   test(2_000_000_000_000)
   // 0
   
   test(-2_000_000_000_000)
   // 4294967296
   ```
   
   When `epochMs` is negative, `data[0]` is negative but `data[0] >>> 0` is positive.
   
   So, while https://github.com/apache/arrow/commit/cde18a6ba525d090d11ac6f3563370dce60d03a1 solved the precision issue, it introduced a new issue for pre-epoch timestamps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] farkmarnum commented on issue #35700: [JS] pre-epoch Timestamps are off by a fixed amount

Posted by "farkmarnum (via GitHub)" <gi...@apache.org>.
farkmarnum commented on issue #35700:
URL: https://github.com/apache/arrow/issues/35700#issuecomment-1568913645

   Hi @domoritz, I noticed you changed the label from "bug" to "enhancement" — does that mean supporting Timestamps before 1970 is not a feature the Arrow JS library supports at this time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] domoritz commented on issue #35700: [JS] pre-epoch Timestamps are off by a fixed amount

Posted by "domoritz (via GitHub)" <gi...@apache.org>.
domoritz commented on issue #35700:
URL: https://github.com/apache/arrow/issues/35700#issuecomment-1568941407

   Ehh, no. I must have been on the wrong issue. I'm sorry for causing confusion. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org