You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2023/03/27 23:01:55 UTC
[impala] 01/17: IMPALA-11355: Add STRING overloads for hour/minute/second/millisecond
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git
commit ec7a834b378847251a37f98a1dfcfb18c004f1b4
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Mon Jul 11 19:07:55 2022 +0200
IMPALA-11355: Add STRING overloads for hour/minute/second/millisecond
IMPALA-9531 dropped support for "dateless timestamps",
e.g. cast("12:05:05" as timestamp) now returns NULL.
This led to breaking functions like minute("12:05:05"), as minute()
expects a timestamp, and Impala adds an implicit cast, so what actually
happens is minute(cast("12:05:05" as timestamp)), which returns NULL.
This change adds overloads for similar functions that take STRING
instead of TIMESTAMP parameter. The same functions already take a
STRING parameter in Hive and mySQL.
The changes in the parser mainly restore code removed in IMPALA-9531.
Note that these functions could be potentially optimized by returning
parts of the parse result without converting them to boost time first,
but this is not done here to make the change minimal.
Testing:
- restored related tests in expr-test and added some new ones for
malformed time-of-day strings
- added benchmarks for the new overloads and fixed the ones for the
old functions (they tested NULL)
Change-Id: I6cc1c851ee71ab4fcc58105c7e9931155a483679
Reviewed-on: http://gerrit.cloudera.org:8080/18718
Reviewed-by: Riza Suminto <ri...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
be/src/benchmarks/expr-benchmark.cc | 154 ++++++++++++---------
be/src/exprs/expr-test.cc | 19 +++
be/src/exprs/timestamp-functions-ir.cc | 41 ++++++
be/src/exprs/timestamp-functions.h | 4 +
be/src/runtime/date-parse-util.cc | 2 +-
.../runtime/datetime-simple-date-format-parser.cc | 34 ++++-
.../runtime/datetime-simple-date-format-parser.h | 10 +-
be/src/runtime/timestamp-parse-util.cc | 6 +-
be/src/runtime/timestamp-parse-util.h | 6 +-
common/function-registry/impala_functions.py | 4 +
10 files changed, 202 insertions(+), 78 deletions(-)
diff --git a/be/src/benchmarks/expr-benchmark.cc b/be/src/benchmarks/expr-benchmark.cc
index d850a5771..52130e292 100644
--- a/be/src/benchmarks/expr-benchmark.cc
+++ b/be/src/benchmarks/expr-benchmark.cc
@@ -770,77 +770,88 @@ Benchmark* BenchmarkMathFunctions(bool codegen) {
return suite;
}
+// Machine Info: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
// TimestampFn: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile
// (relative) (relative) (relative)
// ---------------------------------------------------------------------------------------------------------
-// literal 33.8 34 34.1 1X 1X 1X
-// to_string 13.3 13.4 13.5 0.395X 0.395X 0.395X
-// add_year 16.5 16.7 16.8 0.488X 0.49X 0.492X
-// sub_month 16.3 16.3 16.4 0.482X 0.48X 0.481X
-// add_weeks 20.4 20.4 20.5 0.603X 0.599X 0.601X
-// sub_days 19.4 19.4 19.5 0.573X 0.569X 0.572X
-// add 20.6 20.6 20.7 0.608X 0.605X 0.608X
-// sub_hours 18.7 18.7 18.9 0.553X 0.55X 0.553X
-// add_minutes 19.4 19.5 19.6 0.573X 0.575X 0.575X
-// sub_seconds 19.1 19.4 19.4 0.564X 0.569X 0.569X
-// add_milli 18.5 18.5 18.7 0.548X 0.545X 0.546X
-// sub_micro 17.9 17.9 18.1 0.529X 0.526X 0.529X
-// add_nano 18.3 18.3 18.5 0.542X 0.54X 0.541X
-// unix_timestamp1 37.9 37.9 38.1 1.12X 1.11X 1.12X
-// unix_timestamp2 51.9 52.2 52.5 1.54X 1.54X 1.54X
-// from_unix1 30.4 30.7 30.9 0.899X 0.902X 0.904X
-// from_unix2 43.3 43.5 44 1.28X 1.28X 1.29X
-// from_unix3 30.7 31.1 31.3 0.91X 0.916X 0.917X
-// year 39.3 39.4 39.7 1.16X 1.16X 1.16X
-// month 39.5 40.1 40.3 1.17X 1.18X 1.18X
-// day of month 38.2 38.4 38.5 1.13X 1.13X 1.13X
-// day of year 35.4 35.6 35.8 1.05X 1.05X 1.05X
-// week of year 34.6 34.6 34.8 1.02X 1.02X 1.02X
-// hour 81.5 81.9 82.5 2.41X 2.41X 2.42X
-// minute 80 80.5 81.5 2.37X 2.37X 2.39X
-// second 81.2 82.2 82.9 2.4X 2.42X 2.43X
-// to date 19.4 19.4 19.5 0.573X 0.569X 0.57X
-// date diff 17.5 17.5 17.6 0.518X 0.515X 0.514X
-// from utc 21.9 21.9 22.2 0.649X 0.646X 0.649X
-// to utc 19.4 19.4 19.4 0.573X 0.569X 0.569X
-// now 286 287 290 8.45X 8.45X 8.5X
-// unix_timestamp 207 208 211 6.13X 6.13X 6.17X
+// literal 10.8 11.6 12 1X 1X 1X
+// to_string 4.34 4.91 5.02 0.402X 0.424X 0.419X
+// add_year 6.17 7.11 7.29 0.572X 0.614X 0.609X
+// sub_month 6.13 6.93 7.12 0.568X 0.598X 0.595X
+// add_weeks 6.97 8.71 9.02 0.646X 0.752X 0.754X
+// sub_days 6.73 8.34 8.57 0.624X 0.721X 0.716X
+// add 7.19 8.84 9.02 0.666X 0.763X 0.754X
+// sub_hours 7.37 8.07 8.24 0.683X 0.697X 0.688X
+// add_minutes 7.28 7.93 8.21 0.674X 0.685X 0.687X
+// sub_seconds 7.63 7.89 8.14 0.707X 0.682X 0.68X
+// add_milli 6.85 7.5 7.73 0.635X 0.648X 0.646X
+// sub_micro 6.97 7.49 7.64 0.646X 0.647X 0.639X
+// add_nano 6.79 7.4 7.62 0.629X 0.639X 0.637X
+// unix_timestamp1 12.9 14 14.5 1.19X 1.21X 1.21X
+// unix_timestamp2 18.1 19.9 20.4 1.67X 1.72X 1.7X
+// from_unix1 9.54 10.7 11 0.884X 0.924X 0.918X
+// from_unix2 14.4 16.1 16.7 1.34X 1.39X 1.4X
+// from_unix3 9.79 10.9 11.3 0.907X 0.945X 0.948X
+// year 12.4 14.2 14.5 1.15X 1.23X 1.21X
+// month 13 14.4 14.6 1.2X 1.24X 1.22X
+// day of month 13.2 14.5 14.9 1.22X 1.25X 1.25X
+// day of year 11.7 12.7 12.9 1.08X 1.09X 1.08X
+// week of year 11.8 12.8 13.1 1.09X 1.11X 1.1X
+// hour(timestamp) 6.71 7.26 7.41 0.622X 0.627X 0.619X
+// minute(timestamp) 6.48 7.25 7.41 0.601X 0.626X 0.619X
+// second(timestamp) 6.55 7.24 7.41 0.607X 0.626X 0.619X
+// millisecond(timestamp) 6.55 7.28 7.41 0.606X 0.629X 0.619X
+// hour(string) 10 11 11.2 0.927X 0.946X 0.933X
+// minute(string) 9.91 11 11.2 0.918X 0.947X 0.933X
+// second(string) 9.9 10.9 11.2 0.917X 0.942X 0.933X
+// millisecond(string) 9.88 10.6 11 0.915X 0.916X 0.918X
+// to date 6.07 6.98 7.25 0.563X 0.603X 0.606X
+// date diff 5.75 6.39 6.55 0.533X 0.551X 0.547X
+// from utc 7.56 8.25 8.55 0.7X 0.712X 0.714X
+// to utc 6.6 7.28 7.49 0.612X 0.629X 0.626X
+// now 103 109 112 9.5X 9.43X 9.4X
+// unix_timestamp 80.2 86.4 88.8 7.43X 7.46X 7.42X
//
// TimestampFnCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile
// (relative) (relative) (relative)
// ---------------------------------------------------------------------------------------------------------
-// literal 38.2 38.4 38.6 1X 1X 1X
-// to_string 15 15.3 15.3 0.392X 0.398X 0.396X
-// add_year 18.3 18.3 18.5 0.479X 0.477X 0.478X
-// sub_month 17.9 18.1 18.1 0.469X 0.47X 0.47X
-// add_weeks 23.8 23.8 23.9 0.622X 0.619X 0.619X
-// sub_days 22.6 22.6 22.7 0.59X 0.588X 0.589X
-// add 23.4 23.4 23.5 0.613X 0.61X 0.609X
-// sub_hours 21.8 21.8 21.9 0.569X 0.566X 0.566X
-// add_minutes 21.9 21.9 22 0.574X 0.571X 0.57X
-// sub_seconds 21.9 21.9 22 0.574X 0.571X 0.57X
-// add_milli 20.9 20.9 21.1 0.547X 0.545X 0.547X
-// sub_micro 20.1 20.1 20.2 0.525X 0.523X 0.523X
-// add_nano 21.1 21.1 21.3 0.552X 0.549X 0.551X
-// unix_timestamp1 47 47.2 47.5 1.23X 1.23X 1.23X
-// unix_timestamp2 61.3 61.5 61.8 1.6X 1.6X 1.6X
-// from_unix1 34.8 35 35.3 0.91X 0.911X 0.914X
-// from_unix2 51.5 51.5 52 1.35X 1.34X 1.35X
-// from_unix3 34.8 35 35.3 0.91X 0.911X 0.914X
-// year 57.4 57.8 58.5 1.5X 1.5X 1.51X
-// month 58.4 58.6 59.1 1.53X 1.53X 1.53X
-// day of month 58.9 59.1 59.4 1.54X 1.54X 1.54X
-// day of year 53.2 53.7 54.3 1.39X 1.4X 1.41X
-// week of year 50.9 51.1 51.4 1.33X 1.33X 1.33X
-// hour 125 132 134 3.26X 3.43X 3.48X
-// minute 132 133 134 3.46X 3.46X 3.48X
-// second 132 133 135 3.46X 3.47X 3.49X
-// to date 24.2 24.3 24.6 0.632X 0.631X 0.637X
-// date diff 22.6 22.6 22.7 0.591X 0.588X 0.589X
-// from utc 38.1 38.5 38.8 0.995X 1X 1X
-// to utc 22.1 22.1 22.4 0.579X 0.576X 0.58X
-// now 517 520 524 13.5X 13.5X 13.6X
-// unix_timestamp 399 403 406 10.4X 10.5X 10.5X
+// literal 11.4 11.8 12.1 1X 1X 1X
+// to_string 4.83 5 5.09 0.425X 0.424X 0.42X
+// add_year 6.98 7.18 7.36 0.615X 0.609X 0.607X
+// sub_month 6.69 6.93 7.05 0.589X 0.588X 0.581X
+// add_weeks 8.56 8.86 9.02 0.754X 0.752X 0.743X
+// sub_days 8.1 8.39 8.58 0.714X 0.712X 0.707X
+// add 8.56 8.86 9.02 0.754X 0.752X 0.743X
+// sub_hours 7.87 8.09 8.34 0.693X 0.687X 0.688X
+// add_minutes 7.76 8 8.18 0.683X 0.679X 0.674X
+// sub_seconds 7.75 7.95 8.1 0.683X 0.674X 0.668X
+// add_milli 7.3 7.46 7.6 0.643X 0.633X 0.626X
+// sub_micro 7.33 7.59 7.79 0.645X 0.644X 0.642X
+// add_nano 7.2 7.46 7.59 0.634X 0.633X 0.625X
+// unix_timestamp1 13.5 14 14.3 1.19X 1.19X 1.18X
+// unix_timestamp2 19.3 20 20.4 1.7X 1.7X 1.68X
+// from_unix1 10.3 10.6 11 0.908X 0.901X 0.905X
+// from_unix2 15.8 16.4 16.9 1.4X 1.39X 1.39X
+// from_unix3 10.5 11.1 11.4 0.922X 0.945X 0.937X
+// year 14.3 14.8 15.1 1.26X 1.26X 1.24X
+// month 14.3 14.8 15.1 1.26X 1.26X 1.24X
+// day of month 14.3 14.6 14.9 1.26X 1.24X 1.23X
+// day of year 12.6 13.2 13.4 1.11X 1.12X 1.11X
+// week of year 12.2 13 13.4 1.08X 1.1X 1.1X
+// hour(timestamp) 7.28 7.41 7.6 0.641X 0.629X 0.626X
+// minute(timestamp) 7.16 7.41 7.55 0.63X 0.629X 0.622X
+// second(timestamp) 7.16 7.4 7.6 0.63X 0.628X 0.626X
+// millisecond(timestamp) 7.2 7.41 7.59 0.634X 0.629X 0.625X
+// hour(string) 10.5 11 11.3 0.928X 0.93X 0.929X
+// minute(string) 10.5 11 11.2 0.921X 0.93X 0.92X
+// second(string) 10.6 11 11.3 0.933X 0.93X 0.93X
+// millisecond(string) 10.6 11 11.4 0.933X 0.93X 0.937X
+// to date 6.43 7.11 7.23 0.566X 0.603X 0.596X
+// date diff 6.3 6.43 6.55 0.554X 0.545X 0.539X
+// from utc 8.14 8.57 8.73 0.716X 0.727X 0.719X
+// to utc 7.13 7.41 7.55 0.628X 0.629X 0.622X
+// now 107 110 113 9.45X 9.36X 9.34X
+// unix_timestamp 84.4 86.4 88.7 7.43X 7.33X 7.31X
Benchmark* BenchmarkTimestampFunctions(bool codegen) {
Benchmark* suite = new Benchmark(BenchmarkName("TimestampFn", codegen));
BENCHMARK("literal", "cast('2012-01-01 09:10:11.123456789' as timestamp)");
@@ -880,9 +891,18 @@ Benchmark* BenchmarkTimestampFunctions(bool codegen) {
BENCHMARK("day of month", "dayofmonth(cast('2011-12-22' as timestamp))");
BENCHMARK("day of year", "dayofyear(cast('2011-12-22' as timestamp))");
BENCHMARK("week of year", "weekofyear(cast('2011-12-22' as timestamp))");
- BENCHMARK("hour", "hour(cast('09:10:11.000000' as timestamp))");
- BENCHMARK("minute", "minute(cast('09:10:11.000000' as timestamp))");
- BENCHMARK("second", "second(cast('09:10:11.000000' as timestamp))");
+ BENCHMARK("hour(timestamp)",
+ "hour(cast('1970-01-01 09:10:11.130000' as timestamp))");
+ BENCHMARK("minute(timestamp)",
+ "minute(cast('1970-01-01 09:10:11.130000' as timestamp))");
+ BENCHMARK(
+ "second(timestamp)", "second(cast('1970-01-01 09:10:11.130000' as timestamp))");
+ BENCHMARK(
+ "millisecond(timestamp)", "millisecond(cast('1970-01-01 09:10:11.130000' as timestamp))");
+ BENCHMARK("hour(string)", "hour('09:10:11.130000')");
+ BENCHMARK("minute(string)", "minute('09:10:11.130000')");
+ BENCHMARK("second(string)", "second('09:10:11.130000')");
+ BENCHMARK("millisecond(string)", "millisecond('09:10:11.130000')");
BENCHMARK("to date",
"to_date(cast('2011-12-22 09:10:11.12345678' as timestamp))");
BENCHMARK("date diff", "datediff(cast('2011-12-22 09:10:11.12345678' as timestamp), "
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index d8a29cb47..688f8f74e 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -6954,6 +6954,25 @@ TEST_P(ExprTest, TimestampFunctions) {
TestStringValue(
"to_date(cast('2011-12-22 09:10:11.12345678' as timestamp))", "2011-12-22");
+ // These expressions directly extract hour/minute/second/millis from STRING type
+ // to support these functions for timestamp strings without a date part (IMPALA-11355).
+ TestValue("hour('09:10:11.000000')", TYPE_INT, 9);
+ TestValue("minute('09:10:11.000000')", TYPE_INT, 10);
+ TestValue("second('09:10:11.000000')", TYPE_INT, 11);
+ TestValue("millisecond('09:10:11.123456')", TYPE_INT, 123);
+ TestValue("millisecond('09:10:11')", TYPE_INT, 0);
+ // Test the functions above with invalid inputs.
+ TestIsNull("hour('09:10:1')", TYPE_INT);
+ TestIsNull("hour('838:59:59')", TYPE_INT);
+ TestIsNull("minute('09-10-11')", TYPE_INT);
+ TestIsNull("second('09:aa:11.000000')", TYPE_INT);
+ TestIsNull("second('09:10:11pm')", TYPE_INT);
+ TestIsNull("millisecond('24:11:11.123')", TYPE_INT);
+ TestIsNull("millisecond('09:61:11.123')", TYPE_INT);
+ TestIsNull("millisecond('09:10:61.123')", TYPE_INT);
+ TestIsNull("millisecond('09:10:11.123aaa')", TYPE_INT);
+ TestIsNull("millisecond('')", TYPE_INT);
+
// Check that timeofday() does not crash or return incorrect results
TestIsNotNull("timeofday()", TYPE_STRING);
diff --git a/be/src/exprs/timestamp-functions-ir.cc b/be/src/exprs/timestamp-functions-ir.cc
index 1cc06e184..79a0d59f1 100644
--- a/be/src/exprs/timestamp-functions-ir.cc
+++ b/be/src/exprs/timestamp-functions-ir.cc
@@ -276,6 +276,47 @@ IntVal TimestampFunctions::Millisecond(FunctionContext* context,
return IntVal(time.total_milliseconds() - time.total_seconds() * 1000);
}
+bool StringToTimeOfDay(
+ const StringVal& str_val, boost::posix_time::time_duration* time) {
+ if (str_val.is_null) return false;
+ boost::gregorian::date dummy_date;
+ return TimestampParser::ParseSimpleDateFormat(
+ reinterpret_cast<char*>(str_val.ptr), str_val.len, &dummy_date, time, true);
+}
+
+IntVal TimestampFunctions::Hour(FunctionContext* context, const StringVal& str_val) {
+ boost::posix_time::time_duration time;
+ if (!StringToTimeOfDay(str_val, &time)) {
+ return IntVal::null();
+ }
+ return IntVal(time.hours());
+}
+
+IntVal TimestampFunctions::Minute(FunctionContext* context, const StringVal& str_val) {
+ boost::posix_time::time_duration time;
+ if (!StringToTimeOfDay(str_val, &time)) {
+ return IntVal::null();
+ }
+ return IntVal(time.minutes());
+}
+
+IntVal TimestampFunctions::Second(FunctionContext* context, const StringVal& str_val) {
+ boost::posix_time::time_duration time;
+ if (!StringToTimeOfDay(str_val, &time)) {
+ return IntVal::null();
+ }
+ return IntVal(time.seconds());
+}
+
+IntVal TimestampFunctions::Millisecond(
+ FunctionContext* context, const StringVal& str_val) {
+ boost::posix_time::time_duration time;
+ if (!StringToTimeOfDay(str_val, &time)) {
+ return IntVal::null();
+ }
+ return IntVal(time.total_milliseconds() - time.total_seconds() * 1000);
+}
+
TimestampVal TimestampFunctions::Now(FunctionContext* context) {
const TimestampValue* now = context->impl()->state()->now();
TimestampVal return_val;
diff --git a/be/src/exprs/timestamp-functions.h b/be/src/exprs/timestamp-functions.h
index 47a2d8643..68995a2fd 100644
--- a/be/src/exprs/timestamp-functions.h
+++ b/be/src/exprs/timestamp-functions.h
@@ -168,9 +168,13 @@ class TimestampFunctions {
static IntVal DayOfYear(FunctionContext* context, const TimestampVal& ts_val);
static IntVal WeekOfYear(FunctionContext* context, const TimestampVal& ts_val);
static IntVal Hour(FunctionContext* context, const TimestampVal& ts_val);
+ static IntVal Hour(FunctionContext* context, const StringVal& str_val);
static IntVal Minute(FunctionContext* context, const TimestampVal& ts_val);
+ static IntVal Minute(FunctionContext* context, const StringVal& str_val);
static IntVal Second(FunctionContext* context, const TimestampVal& ts_val);
+ static IntVal Second(FunctionContext* context, const StringVal& str_val);
static IntVal Millisecond(FunctionContext* context, const TimestampVal& ts_val);
+ static IntVal Millisecond(FunctionContext* context, const StringVal& str_val);
/// Date/time functions.
static TimestampVal Now(FunctionContext* context);
diff --git a/be/src/runtime/date-parse-util.cc b/be/src/runtime/date-parse-util.cc
index 8db8def80..ea8698d82 100644
--- a/be/src/runtime/date-parse-util.cc
+++ b/be/src/runtime/date-parse-util.cc
@@ -100,7 +100,7 @@ bool DateParser::ParseSimpleDateFormat(const char* str, int len, bool accept_tim
const DateTimeFormatContext* dt_ctx =
SimpleDateFormatTokenizer::GetDefaultFormatContext(str, trimmed_len,
- accept_time_toks);
+ accept_time_toks, false);
if (dt_ctx != nullptr) return ParseSimpleDateFormat(str, trimmed_len, *dt_ctx, date);
// Generating context lazily as a fall back if default formats fail.
diff --git a/be/src/runtime/datetime-simple-date-format-parser.cc b/be/src/runtime/datetime-simple-date-format-parser.cc
index dac71d81b..5a937540e 100644
--- a/be/src/runtime/datetime-simple-date-format-parser.cc
+++ b/be/src/runtime/datetime-simple-date-format-parser.cc
@@ -34,6 +34,8 @@ namespace datetime_parse_util {
bool SimpleDateFormatTokenizer::initialized = false;
const int SimpleDateFormatTokenizer::DEFAULT_DATE_FMT_LEN = 10;
+const int SimpleDateFormatTokenizer::DEFAULT_TIME_FMT_LEN = 8;
+const int SimpleDateFormatTokenizer::DEFAULT_TIME_FRAC_FMT_LEN = 18;
const int SimpleDateFormatTokenizer::DEFAULT_SHORT_DATE_TIME_FMT_LEN = 19;
const int SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_FMT_LEN = 29;
const int SimpleDateFormatTokenizer::FRACTIONAL_MAX_LEN = 9;
@@ -41,8 +43,10 @@ const int SimpleDateFormatTokenizer::FRACTIONAL_MAX_LEN = 9;
DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_SHORT_DATE_TIME_CTX;
DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_SHORT_ISO_DATE_TIME_CTX;
DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_DATE_CTX;
+DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_TIME_CTX;
DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_CTX[10];
DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_ISO_DATE_TIME_CTX[10];
+DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_TIME_FRAC_CTX[10];
void SimpleDateFormatTokenizer::InitCtx() {
if (initialized) return;
@@ -74,6 +78,17 @@ void SimpleDateFormatTokenizer::InitCtx() {
DEFAULT_DATE_CTX.Reset("yyyy-MM-dd");
Tokenize(&DEFAULT_DATE_CTX, PARSE);
+ // Setup the default short time context HH:mm:ss
+ DEFAULT_TIME_CTX.Reset("HH:mm:ss");
+ Tokenize(&DEFAULT_TIME_CTX, PARSE, true, true);
+
+ // Setup the default short time context with fractional seconds HH:mm:ss.SSSSSSSSS
+ for (int i = FRACTIONAL_MAX_LEN; i >= 0; --i) {
+ DEFAULT_TIME_FRAC_CTX[i].Reset(DATE_TIME_CTX_FMT + 11,
+ DEFAULT_TIME_FRAC_FMT_LEN - (FRACTIONAL_MAX_LEN - i));
+ Tokenize(&DEFAULT_TIME_FRAC_CTX[i], PARSE, true, true);
+ }
+
// Flag that the parser is ready.
initialized = true;
}
@@ -97,7 +112,8 @@ bool SimpleDateFormatTokenizer::IsValidTZOffset(const char* str_begin,
}
bool SimpleDateFormatTokenizer::Tokenize(
- DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool accept_time_toks) {
+ DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool accept_time_toks,
+ bool accept_time_toks_only) {
DCHECK(dt_ctx != NULL);
DCHECK(dt_ctx->fmt != NULL);
DCHECK(dt_ctx->fmt_len > 0);
@@ -180,8 +196,8 @@ bool SimpleDateFormatTokenizer::Tokenize(
}
dt_ctx->toks.push_back(tok);
}
- if (cast_mode == PARSE) return (dt_ctx->has_date_toks);
- return (dt_ctx->has_date_toks || dt_ctx->has_time_toks);
+ if (cast_mode == PARSE && !accept_time_toks_only) return dt_ctx->has_date_toks;
+ return dt_ctx->has_date_toks || dt_ctx->has_time_toks;
}
const char* SimpleDateFormatTokenizer::ParseDigitToken(const char* str,
@@ -340,12 +356,13 @@ bool SimpleDateFormatTokenizer::TokenizeByStr( DateTimeFormatContext* dt_ctx,
}
const DateTimeFormatContext* SimpleDateFormatTokenizer::GetDefaultFormatContext(
- const char* str, int len, bool accept_time_toks) {
+ const char* str, int len, bool accept_time_toks, bool accept_time_toks_only) {
DCHECK(initialized);
DCHECK(str != nullptr);
DCHECK(len > 0);
+ DCHECK(!accept_time_toks_only || accept_time_toks);
- if (LIKELY(len >= DEFAULT_DATE_FMT_LEN)) {
+ if (LIKELY(len >= DEFAULT_TIME_FMT_LEN)) {
// Check if this string starts with a date component
if (str[4] == '-' && str[7] == '-') {
// Do we have a date component only?
@@ -398,6 +415,13 @@ const DateTimeFormatContext* SimpleDateFormatTokenizer::GetDefaultFormatContext(
break;
}
}
+ } else if (accept_time_toks_only && str[2] == ':' && str[5] == ':') {
+ if (len == DEFAULT_TIME_FMT_LEN) return &DEFAULT_TIME_CTX;
+ // There is only time component.
+ len = min(len, DEFAULT_TIME_FRAC_FMT_LEN);
+ if (len > DEFAULT_TIME_FMT_LEN && str[8] == '.') {
+ return &DEFAULT_TIME_FRAC_CTX[len - DEFAULT_TIME_FMT_LEN - 1];
+ }
}
}
return nullptr;
diff --git a/be/src/runtime/datetime-simple-date-format-parser.h b/be/src/runtime/datetime-simple-date-format-parser.h
index 30b0812ff..2e0ce98bb 100644
--- a/be/src/runtime/datetime-simple-date-format-parser.h
+++ b/be/src/runtime/datetime-simple-date-format-parser.h
@@ -74,6 +74,8 @@ class SimpleDateFormatTokenizer {
public:
/// Constants to hold default format lengths.
static const int DEFAULT_DATE_FMT_LEN;
+ static const int DEFAULT_TIME_FMT_LEN;
+ static const int DEFAULT_TIME_FRAC_FMT_LEN;
static const int DEFAULT_SHORT_DATE_TIME_FMT_LEN;
static const int DEFAULT_DATE_TIME_FMT_LEN;
static const int FRACTIONAL_MAX_LEN;
@@ -85,7 +87,7 @@ public:
/// cast_mode -- indicates if it is a 'datetime to string' or 'string to datetime' cast
/// Return true if the parse was successful.
static bool Tokenize(DateTimeFormatContext* dt_ctx, CastDirection cast_mode,
- bool accept_time_toks = true);
+ bool accept_time_toks = true, bool accept_time_toks_only = false);
/// Parse the date/time string to generate the DateTimeFormatToken required by
/// DateTimeFormatContext. Similar to Tokenize() this function will take the string
@@ -106,10 +108,12 @@ public:
/// len -- length of the string to parse (must be > 0)
/// accept_time_toks -- if true, time tokens are accepted. Otherwise time tokens are
/// rejected.
+ /// accept_time_toks_only -- if true, time tokens without date tokens are accepted.
+ /// Otherwise, they are rejected.
/// Return the corresponding default format context if parsing succeeded, or nullptr
/// otherwise.
static const DateTimeFormatContext* GetDefaultFormatContext(const char* str, int len,
- bool accept_time_toks);
+ bool accept_time_toks, bool accept_time_toks_only);
/// Return default date/time format context for a timestamp parsing.
/// If 'time' has a fractional seconds, context with pattern
@@ -134,8 +138,10 @@ private:
static DateTimeFormatContext DEFAULT_SHORT_DATE_TIME_CTX;
static DateTimeFormatContext DEFAULT_SHORT_ISO_DATE_TIME_CTX;
static DateTimeFormatContext DEFAULT_DATE_CTX;
+ static DateTimeFormatContext DEFAULT_TIME_CTX;
static DateTimeFormatContext DEFAULT_DATE_TIME_CTX[10];
static DateTimeFormatContext DEFAULT_ISO_DATE_TIME_CTX[10];
+ static DateTimeFormatContext DEFAULT_TIME_FRAC_CTX[10];
/// Checks if str_begin point to the beginning of a valid timezone offset.
static bool IsValidTZOffset(const char* str_begin, const char* str_end);
diff --git a/be/src/runtime/timestamp-parse-util.cc b/be/src/runtime/timestamp-parse-util.cc
index 506b69f30..98339dba7 100644
--- a/be/src/runtime/timestamp-parse-util.cc
+++ b/be/src/runtime/timestamp-parse-util.cc
@@ -63,7 +63,8 @@ static bool IndicateTimestampParseFailure(date* d, time_duration* t) {
}
bool TimestampParser::ParseSimpleDateFormat(const char* str, int len,
- boost::gregorian::date* d, boost::posix_time::time_duration* t) {
+ boost::gregorian::date* d, boost::posix_time::time_duration* t,
+ bool accept_time_toks_only) {
DCHECK(d != nullptr);
DCHECK(t != nullptr);
if (UNLIKELY(str == nullptr)) return IndicateTimestampParseFailure(d, t);
@@ -100,7 +101,8 @@ bool TimestampParser::ParseSimpleDateFormat(const char* str, int len,
SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_FMT_LEN);
// Determine the default formatting context that's required for parsing.
const DateTimeFormatContext* dt_ctx =
- SimpleDateFormatTokenizer::GetDefaultFormatContext(str, default_fmt_len, true);
+ SimpleDateFormatTokenizer::GetDefaultFormatContext(
+ str, default_fmt_len, true, accept_time_toks_only);
if (dt_ctx != nullptr) {
return ParseSimpleDateFormat(str, default_fmt_len, *dt_ctx, d, t);
}
diff --git a/be/src/runtime/timestamp-parse-util.h b/be/src/runtime/timestamp-parse-util.h
index ad61a81c3..60eb8888a 100644
--- a/be/src/runtime/timestamp-parse-util.h
+++ b/be/src/runtime/timestamp-parse-util.h
@@ -39,13 +39,17 @@ class TimestampParser {
/// date may be specified. All components are required in either the
/// date or time except for the fractional seconds following the period. In the case
/// of just a date, the time will be set to 00:00:00.
+ /// In case accept_time_toks_only=true, HH:mm:ss.SSSSSSSSS is also accepted and if
+ /// there is no data part in the string, the output date is set to invalid.
/// str -- valid pointer to the string to parse
/// len -- length of the string to parse (must be > 0)
/// d -- the date value where the results of the parsing will be placed
/// t -- the time value where the results of the parsing will be placed
+ /// accept_time_toks_only -- also accepts time of the day string without date part
/// Returns true if the date/time was successfully parsed.
static bool ParseSimpleDateFormat(const char* str, int len, boost::gregorian::date* d,
- boost::posix_time::time_duration* t) WARN_UNUSED_RESULT;
+ boost::posix_time::time_duration* t,
+ bool accept_time_toks_only = false) WARN_UNUSED_RESULT;
/// Parse a date/time string. The data must adhere to SimpleDateFormat, otherwise it
/// will be rejected i.e. no missing tokens. In the case of just a date, the time will
diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py
index dd81a692d..d6956aa60 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -133,9 +133,13 @@ visible_functions = [
[['dayofyear'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions9DayOfYearEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
[['week', 'weekofyear'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions10WeekOfYearEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
[['hour'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions4HourEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+ [['hour'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions4HourEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
[['minute'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6MinuteEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+ [['minute'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions6MinuteEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
[['second'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6SecondEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+ [['second'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions6SecondEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
[['millisecond'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions11MillisecondEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+ [['millisecond'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions11MillisecondEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
[['to_date'], 'STRING', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6ToDateEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
[['dayname'], 'STRING', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions11LongDayNameEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
[['date_trunc'], 'TIMESTAMP', ['STRING', 'TIMESTAMP'],