You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2020/01/24 10:36:26 UTC

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15104


Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
5 files changed, 153 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15104/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15104/5//COMMIT_MSG@20
PS5, Line 20: Note, 
> typo: testing
Done


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-columnar-scanner-ir.cc
File be/src/exec/hdfs-columnar-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-columnar-scanner-ir.cc@23
PS5, Line 23: scratch_batch_
> nit: the != nullptr in written out in most of Impala
Done


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-orc-scanner.cc@702
PS5, Line 702: 
> nit: the != nullptr in written out in most of Impala
Done


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc@264
PS5, Line 264: >col_id_path_map_[
> Can you mention it in the commit message that no scratch batch is used if t
I've already added a note about this to the commit msg.



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 10:01:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5403/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 10:45:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#8).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Note, this change doesn't cover collection types just primitive types
and struct.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 425 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/8
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5377/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 10:19:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] PoC: IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has abandoned this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: PoC: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#6).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 423 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 8: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15104/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15104/5//COMMIT_MSG@20
PS5, Line 20: Note, 
typo: testing


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-columnar-scanner-ir.cc
File be/src/exec/hdfs-columnar-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-columnar-scanner-ir.cc@23
PS5, Line 23: scratch_batch_
nit: the != nullptr in written out in most of Impala


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-orc-scanner.cc@702
PS5, Line 702: 
nit: the != nullptr in written out in most of Impala


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc@264
PS5, Line 264: >col_id_path_map_[
Can you mention it in the commit message that no scratch batch is used if there are collections? This will lead to also not having codegen in this case if the we want to use the same functions as Parquet, which always uses scratch batches.



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 15:30:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#5).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 423 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5384/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 16:11:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 5: Code-Review+1

(3 comments)

Only found nits, other than that LGTM

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-scanner.h@a566
PS5, Line 566: 
             : 
             : 
             : 
             : 
nit: maybe these could be moved together to public.


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.h@188
PS5, Line 188:   friend class OrcPrimitiveColumnReader<OrcBoolColumnReader>;
Since ReadValue() is public I think you don't need the friend declarations anymore.


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc@221
PS5, Line 221: Status OrcDecimal16ColumnReader::ReadValue(int row_idx, Tuple* tuple,
             :     MemPool* pool) {
nit: fits single line again



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 12:55:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5444/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 10:20:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 7: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15104/7/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/7/be/src/exec/hdfs-scanner.h@176
PS7, Line 176: Not inlined in IR so it can be replaced with a constant.
nit: this comment is applicable to both 'tuple_byte_size' functions, so it should be put before L174, similarly to Base.



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 13:35:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 10:20:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#9).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Note, this change doesn't cover collection types just primitive types
and struct. Collection types will follow the previous row-by-row
approach.

Testing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 425 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] PoC: IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: PoC: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5513/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Jan 2020 11:22:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#7).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 426 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] PoC: IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#2).

Change subject: PoC: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

PoC: IMPALA-9228: ORC scanner reads rows into scratch batch

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
5 files changed, 153 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] PoC: IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has restored this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: PoC: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h
File be/src/exec/hdfs-columnar-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h@40
PS4, Line 40: ().
nit: "...in the derived classes."


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h@45
PS4, Line 45: ;
nit: you could add "= nullptr" to make the default value of it obvious. And then you don't need to init it in the constructor.


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.h@258
PS4, Line 258: has at least a
             :   /// collection amongst its children.
nit: sounds weird to me, maybe just "has a collection amongst..."


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.cc@703
PS4, Line 703:       DCHECK_EQ(scratch_batch_->total_allocated_bytes(), 0);
We hit this DCHECK when we come from the loop of AssembleRows(). We can also have allocated bytes if the scratch batch was compacted by the row batch, in that case the scratch batch don't transfer the ownership of 'tuple_mem_pool'.

I think it's enough to only have this DCHECK Close().


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@130
PS4, Line 130: to implement "curiously recurring template pattern".
nit: to implement static polymorphism via the "curiously recurring template pattern".


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@161
PS4, Line 161: derived->ReadValueHelper(row_idx + i, tuple, pool)
If you'd add the 'final' specifier to the ReadValue() implementations then the compiler could devirtualize these calls, i.e. you wouldn't need ReadValueHelper.


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@469
PS4, Line 469:   virtual Status ReadValueBatch(ScratchTupleBatch* scratch_batch, MemPool* pool)
You need to rename this to make clang-tidy happy as it hides virtual function with the same name.



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 15:20:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Note, this change doesn't cover collection types just primitive types
and struct. Collection types will follow the previous row-by-row
approach.

Testing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Reviewed-on: http://gerrit.cloudera.org:8080/15104
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 425 insertions(+), 144 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15104/7/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/7/be/src/exec/hdfs-scanner.h@176
PS7, Line 176: Not inlined in IR so it can be replaced with a constant.
> nit: this comment is applicable to both 'tuple_byte_size' functions, so it 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 15:26:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h
File be/src/exec/hdfs-columnar-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h@40
PS4, Line 40: () 
> nit: "...in the derived classes."
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-columnar-scanner.h@45
PS4, Line 45:  
> nit: you could add "= nullptr" to make the default value of it obvious. And
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.h@258
PS4, Line 258: has a collection
             :   /// amongst its children.
> nit: sounds weird to me, maybe just "has a collection amongst..."
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/hdfs-orc-scanner.cc@703
PS4, Line 703:       DCHECK(scratch_batch_->AtEnd());
> We hit this DCHECK when we come from the loop of AssembleRows(). We can als
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@130
PS4, Line 130: to implement static polymorphism via the "curiously
> nit: to implement static polymorphism via the "curiously recurring template
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@161
PS4, Line 161: 
> If you'd add the 'final' specifier to the ReadValue() implementations then 
Done


http://gerrit.cloudera.org:8080/#/c/15104/4/be/src/exec/orc-column-readers.h@469
PS4, Line 469:   /// Keep row index if we're top level readers
> You need to rename this to make clang-tidy happy as it hides virtual functi
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 09:45:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/hdfs-scanner.h@a566
PS5, Line 566: 
             : 
             : 
             : 
             : 
> nit: maybe these could be moved together to public.
Done


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.h@188
PS5, Line 188:   Status ReadValue(int row_idx, Tuple* tuple, MemPool* pool) final WARN_UNUSED_RESULT;
> Since ReadValue() is public I think you don't need the friend declarations 
Friend declarations are still needed for accessing 'derived->batch_' from OrcPrimitiveColumnReader::ReadValueBatch()


http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15104/5/be/src/exec/orc-column-readers.cc@221
PS5, Line 221:     ErrorMsg msg(errorCode, scanner_->filename(), orc_column_id_);
             :     return scanner_-
> nit: fits single line again
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 13:14:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15104/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15104/1/be/src/exec/hdfs-orc-scanner.cc@693
PS1, Line 693:     // TODO kaszab: call EvalConjuncts on rows in scratch_batch before incrementiong row_id
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15104/1/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15104/1/be/src/exec/orc-column-readers.h@181
PS1, Line 181:       // Params of GetValueBatch(): int max_values, int tuple_size, uint8_t* RESTRICT tuple_mem, int* RESTRICT num_values
line too long (121 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Jan 2020 10:37:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 15:19:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 6:

PS6 is a rebase with master to resolve conflict with the ORC string allocations enhancement.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 12:57:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5379/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 13:41:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5359/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Feb 2020 19:58:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#4).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 464 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5381/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 13:58:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15104 )

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Mar 2020 10:20:35 +0000
Gerrit-HasComments: No