You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Riza Suminto (Code Review)" <ge...@cloudera.org> on 2021/10/28 19:23:20 UTC

[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#3).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
......................................................................

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. Int to string parsing method
is replaced with FastInt32ToBufferLeft in TimestampParser::Format.

This patch gives > 10X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:           Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
                                                                         (relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
                            literal               37.4     38.6     39.1         1X         1X         1X
              cast(now() as string)                2.2     2.28     2.31    0.0589X    0.0591X    0.0591X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss')               5.94     6.18     6.23     0.159X      0.16X     0.159X
      from_unixtime(0,'yyyy-MM-dd')               11.5     11.8     11.9     0.308X     0.305X     0.304X
                   from_unixtime(0)               2.26     2.35     2.39    0.0606X     0.061X    0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:           Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
                                                                         (relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
                            literal               37.9       38     38.4         1X         1X         1X
              cast(now() as string)               29.3     29.3     29.4     0.773X      0.77X     0.767X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss')               33.6     33.6     33.8     0.888X     0.884X      0.88X
      from_unixtime(0,'yyyy-MM-dd')               49.9     49.9     50.3      1.32X      1.31X      1.31X
                   from_unixtime(0)               33.1     33.2     33.5     0.875X     0.872X     0.873X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
13 files changed, 209 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>