You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2021/04/01 02:33:42 UTC

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17259


Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................

IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 29 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17259

to look at the new patch set (#3).

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................

IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 6: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Apr 2021 04:40:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 20:46:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@163
PS1, Line 163: }
flake8: E123 closing bracket does not match indentation of opening bracket's line


http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@438
PS1, Line 438: def build_impala_parquet_codec_statement(codec):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@493
PS1, Line 493: n
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@498
PS1, Line 498: "
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@787
PS1, Line 787: u
flake8: E501 line too long (94 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:34:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 7: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 03:12:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17259

to look at the new patch set (#2).

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................

IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 29 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8486/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:53:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................

IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Reviewed-on: http://gerrit.cloudera.org:8080/17259
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Joe McDonnell <jo...@cloudera.com>
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:59:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Apr 2021 13:56:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8488/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 03:14:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................


Patch Set 7:

IMPALA-9997/IMPALA-9998 stacked on top of this got a +1 verified.


-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 20:46:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17259

to look at the new patch set (#4).

Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................

IMPALA-10629: Fix parquet compression codecs for data load scripts

Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.

This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.

For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).

This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).

Testing:
 - Ran bin/load-data.py -w tpch --table_format=parquet/zstd
   and checked the codec in the file with the parquet-reader
   utility

Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>