You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2018/02/04 18:34:23 UTC

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9210


Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................

IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL

Changes the default value for the PARQUET_ARRAY_RESOLUTION
query option to conform to the Parquet standard.
Before: TWO_LEVEL_THEN_THREE_LEVEL
After:  THREE_LEVEL

For more information see:
* IMPALA-4725
* https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Testing:
- expands and cleans up the existing tests for more coverage
  over the different resolution policies
- private core/hdfs run passed

Cherry-picks: not for 2.x.

Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_nested_types.py
3 files changed, 138 insertions(+), 183 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/9210/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9210

to look at the new patch set (#3).

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................

IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL

Changes the default value for the PARQUET_ARRAY_RESOLUTION
query option to conform to the Parquet standard.
Before: TWO_LEVEL_THEN_THREE_LEVEL
After:  THREE_LEVEL

For more information see:
* IMPALA-4725
* https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Testing:
- expands and cleans up the existing tests for more coverage
  over the different resolution policies
- private core/hdfs run passed

Cherry-picks: not for 2.x.

Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_nested_types.py
3 files changed, 153 insertions(+), 193 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/9210/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 22:58:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1889/


-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 19:18:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................

IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL

Changes the default value for the PARQUET_ARRAY_RESOLUTION
query option to conform to the Parquet standard.
Before: TWO_LEVEL_THEN_THREE_LEVEL
After:  THREE_LEVEL

For more information see:
* IMPALA-4725
* https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Testing:
- expands and cleans up the existing tests for more coverage
  over the different resolution policies
- private core/hdfs run passed

Cherry-picks: not for 2.x.

Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Reviewed-on: http://gerrit.cloudera.org:8080/9210
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_nested_types.py
3 files changed, 153 insertions(+), 193 deletions(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py@134
PS1, Line 134: ARRAY_RESOUTION_POLICIES
> Typo: RESOLUTION
Done


http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py@208
PS1, Line 208:     self.client.execute("set parquet_array_resolution=%s" % arr_res)
> Will this setting stick around when the client is recycled? Should we be re
Good point. Switched to self.execute_query() which internally calls uses self.client.set_configuration().



-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 02:18:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 19:17:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py@134
PS1, Line 134: ARRAY_RESOUTION_POLICIES
Typo: RESOLUTION


http://gerrit.cloudera.org:8080/#/c/9210/1/tests/query_test/test_nested_types.py@208
PS1, Line 208:     self.client.execute("set parquet_array_resolution=%s" % arr_res)
Will this setting stick around when the client is recycled? Should we be resetting it?

Should we also be using client.set_configuration() instead of executing the server-side statement?



-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 01:25:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9210/2/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/9210/2/tests/query_test/test_nested_types.py@355
PS2, Line 355:     arr_res = vector.get_value('parquet_array_resolution')
Seems ok, but we could probably factor these three lines out into a function.



-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 16:37:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9210 )

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9210/2/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/9210/2/tests/query_test/test_nested_types.py@355
PS2, Line 355:     arr_res = vector.get_value('parquet_array_resolution')
> Seems ok, but we could probably factor these three lines out into a functio
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Feb 2018 19:16:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5037: Default PARQUET ARRAY RESOLUTION=THREE LEVEL

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9210

to look at the new patch set (#2).

Change subject: IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL
......................................................................

IMPALA-5037: Default PARQUET_ARRAY_RESOLUTION=THREE_LEVEL

Changes the default value for the PARQUET_ARRAY_RESOLUTION
query option to conform to the Parquet standard.
Before: TWO_LEVEL_THEN_THREE_LEVEL
After:  THREE_LEVEL

For more information see:
* IMPALA-4725
* https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Testing:
- expands and cleans up the existing tests for more coverage
  over the different resolution policies
- private core/hdfs run passed

Cherry-picks: not for 2.x.

Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_nested_types.py
3 files changed, 155 insertions(+), 193 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/9210/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8f7e9010c4354d667305d9df7b78862efb23fe1
Gerrit-Change-Number: 9210
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>