You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/02/08 13:39:41 UTC

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17039


Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................

IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Currently ORC schema resolution is position-based only. Then the
positional information is reused during ORC column reader creation.
This prevents adding other column resolution strategies, e.g.
column resolution by name or Iceberg field id. It also prevents
schema evolution, as table metadata and file metadata should be
in sync.

This patch makes column reader creation independent of schema
resolution. It does this by creating a mapping between slot/tuple
descriptors and ORC type ids during schema resolution and use
this mapping during column reader creation.

Now further patches just need to add support for other column
resolution strategies, the column readers will be created
accordingly.

Testing:
 * no additional tests as it is neither a bug fix nor a new feature

Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
6 files changed, 64 insertions(+), 129 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6904/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 17:26:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17039

to look at the new patch set (#4).

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................

IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Currently ORC schema resolution is position-based only. Then the
positional information is reused during ORC column reader creation.
This prevents adding other column resolution strategies, e.g.
column resolution by name or Iceberg field id. It also prevents
schema evolution, as table metadata and file metadata should be
in sync.

This patch makes column reader creation independent of schema
resolution. It does this by creating a mapping between slot/tuple
descriptors and ORC type ids during schema resolution and use
this mapping during column reader creation.

Now further patches just need to add support for other column
resolution strategies, the column readers will be created
accordingly.

Testing:
 * no additional tests as it is neither a bug fix nor a new feature

Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M tests/query_test/test_scanners.py
7 files changed, 80 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17039

to look at the new patch set (#3).

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................

IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Currently ORC schema resolution is position-based only. Then the
positional information is reused during ORC column reader creation.
This prevents adding other column resolution strategies, e.g.
column resolution by name or Iceberg field id. It also prevents
schema evolution, as table metadata and file metadata should be
in sync.

This patch makes column reader creation independent of schema
resolution. It does this by creating a mapping between slot/tuple
descriptors and ORC type ids during schema resolution and use
this mapping during column reader creation.

Now further patches just need to add support for other column
resolution strategies, the column readers will be created
accordingly.

Testing:
 * no additional tests as it is neither a bug fix nor a new feature

Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M tests/query_test/test_scanners.py
7 files changed, 71 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@291
PS1, Line 291: inline uint64_t OrcComplexColumnReader::GetTargetColId(
             :     const SlotDescriptor* slot_desc) const {
             :   return slot_desc->type().IsCollectionType() ?
             :  
> This is no longer used.
Done


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@296
PS1, Line 296: }
> This is no longer used too.
Done


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@311
PS1, Line 311:  * Return the result in '*child' and its index inside the children. Returns false for
> nit: inline?
Done


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@316
PS1, Line 316:   DCHECK(parent.getKind() == orc::TypeKind::STRUCT ||
> Could you move the original comments of FindChild here?
Added a bit more verbose comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Feb 2021 15:23:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h@212
PS3, Line 212: slot/tuple
If I saw correctly, we always know exactly whether it is a slot or a tuple that we get/put. We could use 2 separate maps instead for slots and tuples - this would be a few extra lines of code but give better type safety.



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Feb 2021 17:04:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h@212
PS3, Line 212: from slot/
> If I saw correctly, we always know exactly whether it is a slot or a tuple 
Done


http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@308
PS3, Line 308: tiple children, i.
> I think we should mention that LIST also has one child type but we won't us
Re-worded a bit to emphasize it's for complex types only that might have multiple children.


http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321
PS3, Line 321: node
> nit: should we use DCHECK(node != nullptr) instead?
We can have a nullptr here in case of MAPs. It happens when we only query one child of the map.

E.g. when we query the "value" of the map, the ORC lib still creates two children of the MAP type, one child being nullptr. I think it's because this way 0 always mean KEY and 1 always mean VALUE.


http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@728
PS3, Line 728: 0) {
> nit: For code readability, I think we should use 0 here, because 'field' is
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 16:18:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8172/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 16:48:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................

IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Currently ORC schema resolution is position-based only. Then the
positional information is reused during ORC column reader creation.
This prevents adding other column resolution strategies, e.g.
column resolution by name or Iceberg field id. It also prevents
schema evolution, as table metadata and file metadata should be
in sync.

This patch makes column reader creation independent of schema
resolution. It does this by creating a mapping between slot/tuple
descriptors and ORC type ids during schema resolution and use
this mapping during column reader creation.

Now further patches just need to add support for other column
resolution strategies, the column readers will be created
accordingly.

Testing:
 * no additional tests as it is neither a bug fix nor a new feature

Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Reviewed-on: http://gerrit.cloudera.org:8080/17039
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M tests/query_test/test_scanners.py
7 files changed, 80 insertions(+), 138 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 23:07:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321
PS3, Line 321: node
> We can have a nullptr here in case of MAPs. It happens when we only query o
Ah yes, thanks for the explanation!



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 16:53:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8090/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 08 Feb 2021 14:01:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 3: Code-Review+1

(3 comments)

LGTM. We can bump it to +2 after resolving the discussions.

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@308
PS3, Line 308: i.e. STRUCT or MAP
I think we should mention that LIST also has one child type but we won't use this function for it.


http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321
PS3, Line 321: node
nit: should we use DCHECK(node != nullptr) instead?


http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@728
PS3, Line 728: SchemaPathConstants::MAP_KEY
nit: For code readability, I think we should use 0 here, because 'field' is now a subcolumn index of an ORC MAP type instead of a field in the SchemaPath. It's unrelative to SchemaPathConstants anymore.



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 13:39:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17039

to look at the new patch set (#2).

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................

IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Currently ORC schema resolution is position-based only. Then the
positional information is reused during ORC column reader creation.
This prevents adding other column resolution strategies, e.g.
column resolution by name or Iceberg field id. It also prevents
schema evolution, as table metadata and file metadata should be
in sync.

This patch makes column reader creation independent of schema
resolution. It does this by creating a mapping between slot/tuple
descriptors and ORC type ids during schema resolution and use
this mapping during column reader creation.

Now further patches just need to add support for other column
resolution strategies, the column readers will be created
accordingly.

Testing:
 * no additional tests as it is neither a bug fix nor a new feature

Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M tests/query_test/test_scanners.py
7 files changed, 69 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Feb 2021 17:26:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8118/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Feb 2021 15:43:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8116/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Feb 2021 15:35:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 )

Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
......................................................................


Patch Set 1:

(5 comments)

Good to see a negative size patch! Thanks for the simplification! I'm going to have a detailed look. Left some minor comments first.

http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@291
PS1, Line 291: inline bool PathContains(const SchemaPath& path, const SchemaPath& sub_path) {
             :   return path.size() >= sub_path.size() &&
             :       std::equal(sub_path.begin(), sub_path.end(), path.begin());
             : }
This is no longer used.


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@296
PS1, Line 296: inline const SchemaPath& GetTargetColPath(const SlotDescriptor* slot_desc) {
This is no longer used too.


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@311
PS1, Line 311: bool IsDescendant(const orc::Type& node, uint64_t candidate_col_id) {
nit: inline?


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@316
PS1, Line 316: 
Could you move the original comments of FindChild here?


http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@606
PS1, Line 606:   // We have a position slot descriptor if it refers to this LIST ORC type, but it isn't
             :   // a collection slot.
It may be useful to mention "See how position slots are resolved in HdfsOrcScanner::ResolveColumns()."



-- 
To view, visit http://gerrit.cloudera.org:8080/17039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e
Gerrit-Change-Number: 17039
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Feb 2021 09:57:06 +0000
Gerrit-HasComments: Yes