You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Riza Suminto (Code Review)" <ge...@cloudera.org> on 2021/10/28 19:23:20 UTC
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17980
to look at the new patch set (#3).
Change subject: IMPALA-10984: Improve TimestampValue to String casting
......................................................................
IMPALA-10984: Improve TimestampValue to String casting
TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.
This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. Int to string parsing method
is replaced with FastInt32ToBufferLeft in TimestampParser::Format.
This patch gives > 10X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.
Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile
(relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
literal 37.4 38.6 39.1 1X 1X 1X
cast(now() as string) 2.2 2.28 2.31 0.0589X 0.0591X 0.0591X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 5.94 6.18 6.23 0.159X 0.16X 0.159X
from_unixtime(0,'yyyy-MM-dd') 11.5 11.8 11.9 0.308X 0.305X 0.304X
from_unixtime(0) 2.26 2.35 2.39 0.0606X 0.061X 0.0612X
After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile
(relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
literal 37.9 38 38.4 1X 1X 1X
cast(now() as string) 29.3 29.3 29.4 0.773X 0.77X 0.767X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 33.6 33.6 33.8 0.888X 0.884X 0.88X
from_unixtime(0,'yyyy-MM-dd') 49.9 49.9 50.3 1.32X 1.31X 1.31X
from_unixtime(0) 33.1 33.2 33.5 0.875X 0.872X 0.873X
The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".
This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.
Testing:
- Pass core tests.
Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
13 files changed, 209 insertions(+), 130 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>