You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2021/04/01 02:33:42 UTC
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17259
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
IMPALA-10629: Fix parquet compression codecs for data load scripts
Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 29 insertions(+), 6 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/1
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17259
to look at the new patch set (#3).
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
IMPALA-10629: Fix parquet compression codecs for data load scripts
Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/3
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 6: Code-Review+2
Carry +2
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Apr 2021 04:40:02 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 7: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 20:46:10 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 1:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@163
PS1, Line 163: }
flake8: E123 closing bracket does not match indentation of opening bracket's line
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@438
PS1, Line 438: def build_impala_parquet_codec_statement(codec):
flake8: E302 expected 2 blank lines, found 1
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@493
PS1, Line 493: n
flake8: E501 line too long (92 > 90 characters)
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@498
PS1, Line 498: "
flake8: E501 line too long (94 > 90 characters)
http://gerrit.cloudera.org:8080/#/c/17259/1/testdata/bin/generate-schema-statements.py@787
PS1, Line 787: u
flake8: E501 line too long (94 > 90 characters)
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:34:28 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 7: Code-Review+2
Carry +2
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 03:12:20 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17259
to look at the new patch set (#2).
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
IMPALA-10629: Fix parquet compression codecs for data load scripts
Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 29 insertions(+), 6 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/2
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 1:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/8486/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:53:15 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
IMPALA-10629: Fix parquet compression codecs for data load scripts
Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Reviewed-on: http://gerrit.cloudera.org:8080/17259
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Joe McDonnell <jo...@cloudera.com>
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)
Approvals:
Joe McDonnell: Looks good to me, approved; Verified
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 3:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/8487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 02:59:51 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 5: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Apr 2021 13:56:43 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 4:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/8488/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Apr 2021 03:14:43 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17259 )
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
Patch Set 7:
IMPALA-9997/IMPALA-9998 stacked on top of this got a +1 verified.
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 20:46:33 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10629: Fix parquet compression codecs for data load scripts
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17259
to look at the new patch set (#4).
Change subject: IMPALA-10629: Fix parquet compression codecs for data load scripts
......................................................................
IMPALA-10629: Fix parquet compression codecs for data load scripts
Currently, the dataload scripts don't respect non-standard
compression codecs when loading Parquet data. It always
loads snappy, even when specifying something else like
--table_format=parquet/zstd.
This fixes the dataload scripts so that they specify the
compression_codec query option correctly and thus use the
right codec when loading Parquet.
For backwards compatibility, this preserves the behavior
that parquet/none corresponds to the default compression
codec (which is Snappy).
This should make it easier to do performance testing on
various Parquet codecs (like ZSTD).
Testing:
- Ran bin/load-data.py -w tpch --table_format=parquet/zstd
and checked the codec in the file with the parquet-reader
utility
Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
---
M testdata/bin/generate-schema-statements.py
1 file changed, 34 insertions(+), 6 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17259/4
--
To view, visit http://gerrit.cloudera.org:8080/17259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a346de3e5c4e38328e5a8ce8162697b7dd6553a
Gerrit-Change-Number: 17259
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>