You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/17 19:29:00 UTC

[jira] [Commented] (IMPALA-7521) CLONE - Speed up sub-second unix time->TimestampValue conversions

    [ https://issues.apache.org/jira/browse/IMPALA-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618046#comment-16618046 ] 

ASF subversion and git services commented on IMPALA-7521:
---------------------------------------------------------

Commit 2ee8caeb3053dfa2c434c680ffb2ac756627ee38 in impala's branch refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=2ee8cae ]

IMPALA-7521: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split to date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part, converting the seconds part wit
boost::posix_time::from_time_t(), and then adding the sub-seconds
part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

Benchmarks show 2x - 6x speedup for the measured functions.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in the multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Reviewed-on: http://gerrit.cloudera.org:8080/11183
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> CLONE - Speed up sub-second unix time->TimestampValue conversions
> -----------------------------------------------------------------
>
>                 Key: IMPALA-7521
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7521
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: performance, timestamp
>
> Currently Impala converts from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime ) by first splitting the input into seconds and sub-seconds part, converting the seconds part with  boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. This can be done much faster  by splitting the sub-second input into date_ and time_ directly.
> Avoiding boost::posix_time::from_time_t() would be also nice because it can only deal with timestamps from 1677 to 2262, which adds extra complexity to the related code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org