You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Gang Wu (JIRA)" <ji...@apache.org> on 2017/03/22 23:37:41 UTC

[jira] [Commented] (ORC-37) Represent the in memory timestamps using UTC rather than the local timezone.

    [ https://issues.apache.org/jira/browse/ORC-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937369#comment-15937369 ] 

Gang Wu commented on ORC-37:
----------------------------

Looks like this is already fixed in the following code:

  void TimestampColumnReader::next(ColumnVectorBatch& rowBatch,
                                   uint64_t numValues,
                                   char *notNull) {
    ColumnReader::next(rowBatch, numValues, notNull);
    notNull = rowBatch.hasNulls ? rowBatch.notNull.data() : nullptr;
    TimestampVectorBatch& timestampBatch =
      dynamic_cast<TimestampVectorBatch&>(rowBatch);
    int64_t *secsBuffer = timestampBatch.data.data();
    secondsRle->next(secsBuffer, numValues, notNull);
    int64_t *nanoBuffer = timestampBatch.nanoseconds.data();
    nanoRle->next(nanoBuffer, numValues, notNull);

    // Construct the values
    for(uint64_t i=0; i < numValues; i++) {
      if (notNull == nullptr || notNull[i]) {
        uint64_t zeros = nanoBuffer[i] & 0x7;
        nanoBuffer[i] >>= 3;
        if (zeros != 0) {
          for(uint64_t j = 0; j <= zeros; ++j) {
            nanoBuffer[i] *= 10;
          }
        }
        int64_t writerTime = secsBuffer[i] + epochOffset;
        secsBuffer[i] = writerTime +
          writerTimezone.getVariant(writerTime).gmtOffset;
        if (secsBuffer[i] < 0 && nanoBuffer[i] != 0) {
          secsBuffer[i] -= 1;
        }
      }
    }
  }

Correct me if I'm wrong. Thanks!

> Represent the in memory timestamps using UTC rather than the local timezone.
> ----------------------------------------------------------------------------
>
>                 Key: ORC-37
>                 URL: https://issues.apache.org/jira/browse/ORC-37
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> Change the representation of TimestampVectorBatch to be in UTC rather than local time.
> The advantages are:
> * More closely matches the SQL semantics of timestamp without timezone.
> * Allows accurate representation of all values including the ones that occur
>   during the local leap forward/back for daylight savings.
> * One less timezone conversion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)