You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2022/08/25 14:27:16 UTC

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18909


Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
TODO: create a Jira for this.
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
4 files changed, 22 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/18909/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8510/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 11:50:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18909/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18909/1/be/src/exec/hdfs-scanner.cc@79
PS1, Line 79:   if (file_format() != THdfsFileFormat::PARQUET && file_format() != THdfsFileFormat::ORC) {
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 14:28:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18909/4/be/src/exec/orc/hdfs-orc-scanner.h
File be/src/exec/orc/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/18909/4/be/src/exec/orc/hdfs-orc-scanner.h@275
PS4, Line 275:   /// be used for the synthetic rowid column in original files of a full ACID table, or for
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 09:57:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11227/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 15:39:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18909/3/be/src/exec/orc/hdfs-orc-scanner.h
File be/src/exec/orc/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/18909/3/be/src/exec/orc/hdfs-orc-scanner.h@274
PS3, Line 274: purposes: It c
> The term 'original files' is only used in the context of full ACID tables.
Thanks for the explanation! Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 11:22:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11251/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 10:16:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Reviewed-on: http://gerrit.cloudera.org:8080/18909
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.h
M be/src/exec/orc/orc-column-readers.cc
M be/src/exec/orc/orc-column-readers.h
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
7 files changed, 35 insertions(+), 25 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11252/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 10:57:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 2:

(3 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/18909/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18909/2//COMMIT_MSG@21
PS2, Line 21: TODO: create a Jira for this.
Remove TODO?


http://gerrit.cloudera.org:8080/#/c/18909/2/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18909/2/be/src/exec/orc/hdfs-orc-scanner.cc@606
PS2, Line 606: acid_synthetic_rowid_
nit: maybe we should rename this variable to 'file_position_'


http://gerrit.cloudera.org:8080/#/c/18909/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/18909/2/tests/query_test/test_scanners.py@161
PS2, Line 161: especially ORC
Would be nice if we could add some tests here as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 14:19:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18909/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18909/2//COMMIT_MSG@21
PS2, Line 21: 
> Remove TODO?
Ooops, some leftover :) Done


http://gerrit.cloudera.org:8080/#/c/18909/2/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18909/2/be/src/exec/orc/hdfs-orc-scanner.cc@606
PS2, Line 606: file_position_ = slot
> nit: maybe we should rename this variable to 'file_position_'
Done


http://gerrit.cloudera.org:8080/#/c/18909/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/18909/2/tests/query_test/test_scanners.py@161
PS2, Line 161: especially ORC
> Would be nice if we could add some tests here as well.
Agree, but the underlying .test file uses tables that only exist in Parquet. I figured I'd save the hassle of creating them to ORC as well.
I did some manual experimenting based on the tests in mixing-virtual-columns.test to verify that I get the same results between functional_parquet and functional_orc_def.



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Aug 2022 12:36:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 16:45:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18909

to look at the new patch set (#5).

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.h
M be/src/exec/orc/orc-column-readers.cc
M be/src/exec/orc/orc-column-readers.h
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
7 files changed, 35 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/18909/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 5: Code-Review+2

LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 11:50:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18909/3/be/src/exec/orc/hdfs-orc-scanner.h
File be/src/exec/orc/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/18909/3/be/src/exec/orc/hdfs-orc-scanner.h@274
PS3, Line 274: original files
The term 'original files' is only used in the context of full ACID tables.

Now this member is more general as it can serve the synthetic rowid column in original files in full ACID tables, and it can also serve the virtual column FILE__POSITION.



-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Aug 2022 15:38:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18909

to look at the new patch set (#3).

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.h
M be/src/exec/orc/orc-column-readers.cc
M be/src/exec/orc/orc-column-readers.h
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
7 files changed, 33 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/18909/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18909

to look at the new patch set (#2).

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
TODO: create a Jira for this.
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
4 files changed, 23 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/18909/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 11:50:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18909

to look at the new patch set (#4).

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................

IMPALA-11529: FILE__POSITION virtual column for ORC tables

IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.

Testing:
 - TestScannersVirtualColumns.test_virtual_column_file_position_generic
   is changed to run now on ORC as well. I don't think further testing
   is required as this functionality has already been there for row__id
   we just re-used it for FILE__POSITION.

Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/orc/hdfs-orc-scanner.h
M be/src/exec/orc/orc-column-readers.cc
M be/src/exec/orc/orc-column-readers.h
M testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
M tests/query_test/test_scanners.py
7 files changed, 35 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/18909/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11238/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Aug 2022 12:54:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11529: FILE POSITION virtual column for ORC tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18909 )

Change subject: IMPALA-11529: FILE__POSITION virtual column for ORC tables
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11225/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Gerrit-Change-Number: 18909
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 14:50:17 +0000
Gerrit-HasComments: No