You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2020/01/07 11:01:31 UTC

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14982


Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
M testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-date.test
M tests/query_test/test_scanners.py
15 files changed, 134 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 13:54:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5429/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 14:05:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 8:

PS8 is a rebase with master to resolve conflict.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jan 2020 09:53:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5410/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 14:24:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 3: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 13:02:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 18:54:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/5/tests/query_test/test_scanners.py@305
PS5, Line 305: class TestOrcDateType(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 13:55:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> I gave this a quick look an it seems that supporting the use case mentioned
I'm ok to do this in another JIRA.

We can modify OrcTimestampReader to support reading orc::TimestampVectorBatch into Date type slots. In its constructor it knows which kind of slots (timestamp or date) it's writting to. So in ReadValue() it can have different behaviors based on different modes (timestamp values => timestamp slots / timestamp values => date slots). We can do the same on OrcDateColumnReader to let it support reading ORC Date values into Timestamp type slots.

Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column (e.g. column1 is timestamp in stripe1 and is date in stripe2). So we don't need to deal with different batch types in UpdateInputBatch().

BTW, It'd be better to add test coverage for this type compactibility check in test_scanners.py (See TestOrc.test_type_conversions).



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 08:14:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14982/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14982/3//COMMIT_MSG@11
PS3, Line 11: Unix epoch.
I would add "(proleptic Gregorian)", as a big part of the test code deals with this, even if the mentioned IMPALA-7370 has more explanation about this.


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> I think ORC supports schema evolution too.
I am not too enthusiastic about adding this feature now. My main concern is that future developments like min/max filters can become more complex to implement/test if we have to support different Orc type - Impala type mappings.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 14:21:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jan 2020 13:41:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 6:

(3 comments)

Thanks for creating the JIRA! The patch looks good to me actually. Just replying the discussion for IMPALA-9288.

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> If I extend the DATE case to also allow "type.type == TYPE_TIMESTAMP" then the DCHECK fails in UpdateInputBatch().

Yes, in this case we need to modify the casting in OrcTimestampReader::UpdateInputBatch(), since we allow reading ORC Date values and materialize them into Timestamp slots. Note that we create the reader based on the slot descriptor, not the orc type (see OrcColumnReader::Create). So for orc_type = Date and slot_type = Timestamp, we are actually creating a OrcTimestampReader.

> BTW, It'd be better to add test coverage for this type compactibility check in test_scanners.py (See TestOrc.test_type_conversions).

Maybe you miss the last line in my last comment. Do you think we should add test on this?


http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py@306
PS6, Line 306: class TestOrcDateType(ImpalaTestSuite):
nit: merge this into TestOrc below?


http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py@1335
PS6, Line 1335:   def test_type_conversions(self, vector, unique_database):
It'd be better to add test coverage for date and timestamp type compactibility check here.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Jan 2020 11:52:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5452/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 13:54:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 9: Code-Review+1

LGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 13:17:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5436/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jan 2020 10:03:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/5/tests/query_test/test_scanners.py@305
PS5, Line 305: 
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 14:03:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/4/tests/query_test/test_scanners.py@352
PS4, Line 352:     orc_tbl_name = "out_of_range_date_orc"
             :     create_sql = "create table %s.%s (d date) stored as orc" % (unique_database,
             :         orc_tbl_name)
             :     create_table_and_copy_files(self.client, create_sql, unique_database, orc_tbl_name,
             :         ["/testdata/data/out_of_range_date.orc"])
             : 
             :     new_vector = deepcopy(vector)
             :     del new_vector.get_value('exec_option')['abort_on_error']
             :     self.run_test_case('QueryTest/out-of-range-date', new_vector, unique_database)
             : 
             :   def test_pre_gregorian_date(self, vector, unique_database):
             :     """Test date interoperability issues between Impala and Hive 2.1.1 when scanning
             :        a parquet table that contains dates that precede the introduction of Gregorian
             :        calendar in 1582-10-15.
             :     """
             :     create_table_from_parquet(self.client, unique_database, "hive2_pre_gregorian")
             : 
             :     orc_tbl_name = "hive2_pre_gregorian_orc"
             :     create_sql = "create table %s.%s (d date) stored as orc" % (unique_database,
             :         orc_tbl_name)
             :     create_table_and_copy_files(self.client, create_sql, unique_database, orc_tbl_name,
             :         ["/testdata/data/hive2_pre_gregorian.orc"])
These tests are related to ORC, but we are in class TestParquet.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jan 2020 16:22:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#7).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
16 files changed, 168 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5378/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 13:20:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py@1302
PS8, Line 1302: an
> nit: the comments could be updated a bit, since not only one illtypes table
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 13:06:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Reviewed-on: http://gerrit.cloudera.org:8080/14982
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
16 files changed, 170 insertions(+), 29 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5437/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jan 2020 10:22:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#4).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
M testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-date.test
M tests/query_test/test_scanners.py
15 files changed, 123 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#5).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
15 files changed, 152 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181: {
nit: extra braces


http://gerrit.cloudera.org:8080/#/c/14982/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
File testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test:

http://gerrit.cloudera.org:8080/#/c/14982/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test@112
PS2, Line 112: # Querying the ORC partition separately.
             : select date_part, date_col from $DATABASE.date_tbl where date_part='2099-12-31';
we might not need this query anymore since now we query the whole table in the previous query.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Jan 2020 13:50:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#8).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
16 files changed, 168 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5411/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 14:30:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> > If I extend the DATE case to also allow "type.type == TYPE_TIMESTAMP" the
Sorry, missed to address the end of the comment. Done


http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py@306
PS6, Line 306: class TestParquet(ImpalaTestSuite):
> nit: merge this into TestOrc below?
sure, Done


http://gerrit.cloudera.org:8080/#/c/14982/6/tests/query_test/test_scanners.py@1335
PS6, Line 1335:         % unique_database)
> It'd be better to add test coverage for date and timestamp type compactibil
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jan 2020 09:34:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE: {
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     }
I think ORC supports schema evolution too.

I tested this quickly in Hive:

1. Create an ORC table TBL1 with a DATE column.
2. Create an ORC table TBL2 with a TIMESTAMP column that has the same location as TBL1.
3. Insert some DATE values into TBL1 and some TIMESTAMP values into TBL2.
4. select from TBL1 returns both DATE and TIMESTAMP values (converted to DATE).
5. select from TBL2 returns both DATE and TIMESTAMPS values. The DATE values are converted to TIMESTAMP.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 13:21:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#6).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
15 files changed, 154 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@224
PS2, Line 224:  public:
> nit: space is missing?
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@235
PS2, Line 235:  private:
> nit: space is missing? (see other classes below)
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@101
PS2, Line 101:       case TYPE_DATE:
> nit: the braces can be omitted
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@213
PS2, Line 213: tNullSlot(tup
> I think you could add UNLIKELY here.
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181: 
> nit: extra braces
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@30
PS2, Line 30: Supported HDFS file formats. Every file format specifies:
> Not related to this commit specifically, but would be good to have this com
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
File testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test:

http://gerrit.cloudera.org:8080/#/c/14982/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test@112
PS2, Line 112: 
             : 
> we might not need this query anymore since now we query the whole table in 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 12:55:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#2).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
M testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-date.test
M tests/query_test/test_scanners.py
15 files changed, 132 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Norbert Luksa, Zoltan Borok-Nagy, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#9).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
A testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date-orc.test
A testdata/workloads/functional-query/queries/QueryTest/out-of-range-date-orc.test
M tests/query_test/test_scanners.py
16 files changed, 170 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Norbert Luksa, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14982

to look at the new patch set (#3).

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................

IMPALA-8801: Date type support for ORC scanner

Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch.

Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.

Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A testdata/data/hive2_pre_gregorian.orc
A testdata/data/out_of_range_date.orc
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
M testdata/workloads/functional-query/queries/QueryTest/hive2-pre-gregorian-date.test
M testdata/workloads/functional-query/queries/QueryTest/out-of-range-date.test
M tests/query_test/test_scanners.py
15 files changed, 123 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14982/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5371/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Jan 2020 11:42:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@224
PS2, Line 224: public:
nit: space is missing?


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@235
PS2, Line 235: private:
nit: space is missing? (see other classes below)


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@101
PS2, Line 101:       case TYPE_DATE: {
nit: the braces can be omitted


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@213
PS2, Line 213: !dv.IsValid()
I think you could add UNLIKELY here.


http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@30
PS2, Line 30: Supported HDFS file formats. Every file format specifies:
Not related to this commit specifically, but would be good to have this comment updated with the sixth parameter's description.



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Jan 2020 13:12:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14982/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14982/3//COMMIT_MSG@11
PS3, Line 11: Unix epoch using proleptic Gregorian calendar.
> I would add "(proleptic Gregorian)", as a big part of the test code deals w
Done


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> I am not too enthusiastic about adding this feature now. My main concern is
I gave this a quick look an it seems that supporting the use case mentioned by Attila is not that straightforward either.
- After extending this check here to "type == TYPE_DATE || type == TYPE_TIMESTAMP" causes issues in OrcDateColumnReader::UpdateInputBatch() where a DCHECK fails.
- I could get around this by doing a timestamp cast if the date cast fails but in order to do this batch_ has to be ColumnVectorBatch instead of LongVectorBatch as that is the common parent of the date and timestamp representations.
- Once doing this L??? in OrcDateColumnReader::ReadValue() doesn't compile as ColumnVectorBatch doesn't have a member called data.

There would be multiple ways to get around these but in my opinion it would require more efforts that doing this as a review comment. What about opening a Jira for this?



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jan 2020 10:54:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5370/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Jan 2020 11:33:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-metadata-utils.cc@181
PS2, Line 181:     case orc::TypeKind::DATE:
             :       if (type.type == TYPE_DATE) return Status::OK();
             :       break;
             :     d
> I'm ok to do this in another JIRA.
Thanks for the explanation Quanlong! I oepened a separate Jira for this: https://issues.apache.org/jira/browse/IMPALA-9290

One thing I don't get here. If I leave this file intact then I get the "Type mismatch" below. If I extend the DATE case to also allow "type.type == TYPE_TIMESTAMP" then the DCHECK fails in UpdateInputBatch().


http://gerrit.cloudera.org:8080/#/c/14982/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/4/tests/query_test/test_scanners.py@352
PS4, Line 352:   def test_parquet(self, vector):
             :     self.run_test_case('QueryTest/parquet', vector)
             : 
             :   def test_corrupt_files(self, vector):
             :     new_vector = deepcopy(vector)
             :     del new_vector.get_value('exec_option')['num_nodes']  # .test file sets num_nodes
             :     new_vector.get_value('exec_option')['abort_on_error'] = 0
             :     self.run_test_case('QueryTest/parquet-continue-on-error', new_vector)
             :     new_vector.get_value('exec_option')['abort_on_error'] = 1
             :     self.run_test_case('QueryTest/parquet-abort-on-error', new_vector)
             : 
             :   def test_timestamp_out_of_range(self, vector, unique_database):
             :     """IMPALA-4363: Test scanning parquet files with an out of range timestamp.
             :        Also tests IMPALA-7595: Test Parquet timestamp columns where the time part
             :        is out of the valid range [0..24H).
             :     """
             :     # out of range date part
             :     create_table_from_parquet(self.client, unique_database, "out_of_range_timestamp")
             : 
             :     # out of range time part
             :     create_table_from_parquet(self.client, unique_database, "out_of_range_time_of_day")
             : 
> These tests are related to ORC, but we are in class TestParquet.
Indeed :) Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 13:54:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 14:05:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py@1302
PS8, Line 1302: an
nit: the comments could be updated a bit, since not only one illtypes table is created



-- 
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 12:26:44 +0000
Gerrit-HasComments: Yes