You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Marcel Kornacker (Code Review)" <ge...@cloudera.org> on 2016/04/07 05:01:27 UTC

[Impala-CR](cdh5-trunk) IMPALA-1903: Add support for partitioning by TIMESTAMP

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP
......................................................................


Patch Set 7:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/1621/7/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 204:           // would lead to both partitions having the same key modulo ignored constant
i don't understand what "would lead to both partitions..." actually refers to (is this assuming a specific implementation of the partition selection logic? an implementation that doesn't exist?)


Line 206:           // partition_key_values for constant values, since only that is written to.
this entire paragraph is a very contorted and confusing way of saying "only partitions whose key values match the constant partition key values of the PARTITION clause of this insert stmt can receive data in this insert stmt".  please streamline. (i know you didn't write this, but might as well do some clean-up while passing through it, kind of like moving a few rocks out of a frequently taken path).


Line 214:                 TYPE_TIMESTAMP) {
what if you have a null literal for a timestamp partition col, shouldn't you be able to insert into that?


http://gerrit.cloudera.org:8080/#/c/1621/7/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 210:         *expr = pool->Add(new NullLiteral(texpr_node));
hm, we're silently ignoring a parse error here. this should really be governed by abort_on_error, 

we already have customers complaining about our lack of consistency when it comes to invalid data (out-of-bounds integers are silently truncated, decimals aren't), we shouldn't add to this. i think this needs to obey abort_on_error (and throw an analysis exception in the fe).

or maybe we should always return an error for invalid timestamp literals? check with santosh.


http://gerrit.cloudera.org:8080/#/c/1621/7/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 248
what was this? looks bizarre.


http://gerrit.cloudera.org:8080/#/c/1621/7/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

Line 359:   static inline bool IsValidTimestamp(const ::std::string s) {
const std::string&


http://gerrit.cloudera.org:8080/#/c/1621/7/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 18: typedef i64 TDate
why not an i32?


http://gerrit.cloudera.org:8080/#/c/1621/7/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java
File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java:

Line 48:   // equals() and compareTo() should not be implemented for TimestampListeral, since
implement them and throw the appropriate exception in the body?


Line 73:     if (targetType.equals(this.type_)) {
single line


http://gerrit.cloudera.org:8080/#/c/1621/7/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
File testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test:

Line 163: SELECT * from all_insert_partition_col_types
these should be planner tests


http://gerrit.cloudera.org:8080/#/c/1621/7/testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test
File testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test:

Line 24: -- Basic partition pruning works
this doesn't demonstrate that partition pruning works, you need a planner test for that


Line 74: insert into t partition (ts = '') values (4);
that's kind of weird, is that intentional? ('' being silently interpreted as null, rather than returning an error)


Line 112: select i from t where ts = '1995-08-28 00:00:00.00000' and ts <=> '1995-08-28' order by i
why the <=>?


Line 135: -- Bare times (without dates) are supported
i'm not sure that should be the case. if you look at timestamp-value.h, an unset date doesn't really make sense (although the comments pretend it does).


Line 141: from t where ts < '1400-01-01' or ts > '9999-12-31 23:59:59.999999999'
aren't those the supported timestamp ranges?


Line 160: ---- TYPES
you can get rid of these sections for stuff like show partitions, it doesn't really add anything (for query results we want to verify that a timestamp came back, but that's not the case here)


Line 166: select * from t where extract(ts,'day') is not null and ts < '1993-07-01'
why the extract()?


Line 173: select count(*) from t where extract(ts,'day') is not null and ts < '1993-06-26 04:05:06'
does this add anything beyond what you already demonstrated with 

select * from t where extract(ts,'day') is not null and ts < '1993-07-01'

(ie, that it parsed it correctly; we should have tests elsewhere that make sure that comparisons on correct timestamp values work correctly)


Line 180: select * from t where extract(ts,'day') is not null and ts = '1993-06-26 04:05:06'
same here


Line 304: insert into t5 partition (p = '1997-01-01 00:00:00') values ('1997-01-01');
why does this match the 'in', and also the preceding row?


Line 314: select count(*) from t5 where t in (p);
is all of this cast to string?


http://gerrit.cloudera.org:8080/#/c/1621/7/tests/query_test/test_partitioning.py
File tests/query_test/test_partitioning.py:

Line 100:     assert (3 == len(self.client.execute("show partitions hive_timestamp_partitions.t")
let's switch back to standard/non-yoda conditions.

also, i don't think you need the parentheses.


Line 103:             self.client.execute("select count(*) from hive_timestamp_partitions.t "
what's the value of this, given the next statement?


Line 109:             self.client.execute("select count(*) from hive_timestamp_partitions.t "
same here


Line 143:                        "partition (ts = 'last week') "
why is that legal?


-- 
To view, visit http://gerrit.cloudera.org:8080/1621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes