You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2018/02/07 18:36:07 UTC

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9241


Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................

IMPALA-6077: remove Parquet BIT_PACKED def level support

The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.

Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e60aea300f9e83db67f56a5efd07c35c.

The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.

Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.

Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/test_scanners.py
5 files changed, 37 insertions(+), 32 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/9241/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 2:

(4 comments)

Thanks for catching my mistake in the tests.

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test:

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@55
PS2, Line 55: # IMPALA-6077: unsupported BIT_PACKED encoding fails when materializing columns.
            : select count(id), count(tinyint_col), count(smallint_col), count(int_col),
            :   count(bigint_col), count(float_col), count(double_col), count(date_string_col),
            :   count(string_col), count(timestamp_col), count(year), count(month), count(day)
            : from alltypesagg_bitpacked
> This query seems to be the same as the next query, and should not materiali
Done


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@64
PS2, Line 64: materializing
> I am not 100% sure about this, but I think that if a column is not complex,
Yeah that's a good point, I switched it to just selecting the column value. We don't currently serve count(colname) from stats but could easily since it only depends on the non-null count.

I also made a mistake here forgetting to update a query - one of them is meant to be a count(*), which we have always served from stats.


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@65
PS2, Line 65: select count(id), count(tinyint_col), count(smallint_col), count(int_col),
            :   count(bigint_col), count(float_col), count(double_col), count(date_string_col),
            :   count(string_col), count(timestamp_col), count(year), count(month), count(day)
> Is it necessary to list every column here? If one column is enough for the 
Done


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@71
PS2, Line 71: ====
I also added a couple of queries to future-proof this, where we could serve them from metadata in theory but don't right now.



-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 16:45:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Csaba Ringhofer, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9241

to look at the new patch set (#3).

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................

IMPALA-6077: remove Parquet BIT_PACKED def level support

The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.

Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e60aea300f9e83db67f56a5efd07c35c.

The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.

Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.

Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/test_scanners.py
5 files changed, 53 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/9241/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1918/


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Feb 2018 04:30:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1920/


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Feb 2018 18:05:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 3: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 19:42:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1915/


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Feb 2018 01:44:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test:

http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@55
PS2, Line 55: # IMPALA-6077: unsupported BIT_PACKED encoding fails when materializing columns.
            : select count(id), count(tinyint_col), count(smallint_col), count(int_col),
            :   count(bigint_col), count(float_col), count(double_col), count(date_string_col),
            :   count(string_col), count(timestamp_col), count(year), count(month), count(day)
            : from alltypesagg_bitpacked
This query seems to be the same as the next query, and should not materialize columns.


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@64
PS2, Line 64: materializing
I am not 100% sure about this, but I think that if a column is not complex, and the stats are filled, then count can be served from column chunk stats without reading any data page, so this error will not be returned.

This may not be a problem for this specific parquet file, but I would mention it in a comment, or replace the query with something that has to read the data pages.


http://gerrit.cloudera.org:8080/#/c/9241/2/testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test@65
PS2, Line 65: select count(id), count(tinyint_col), count(smallint_col), count(int_col),
            :   count(bigint_col), count(float_col), count(double_col), count(date_string_col),
            :   count(string_col), count(timestamp_col), count(year), count(month), count(day)
Is it necessary to list every column here? If one column is enough for the test, then I would prefer if it were shorter (for the sake of readability).



-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 14:58:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Csaba Ringhofer, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9241

to look at the new patch set (#4).

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................

IMPALA-6077: remove Parquet BIT_PACKED def level support

The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.

Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e60aea300f9e83db67f56a5efd07c35c.

The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.

Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.

Cherry-picks: not for 2.x.

Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/test_scanners.py
5 files changed, 53 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/9241/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1915/


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 21:51:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 3: Code-Review+2

I assume this is not going to the 2.x branch?


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 21:38:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1918/


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Feb 2018 08:04:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Feb 2018 18:05:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Feb 2018 21:59:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................

IMPALA-6077: remove Parquet BIT_PACKED def level support

The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.

Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e60aea300f9e83db67f56a5efd07c35c.

The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.

Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.

Cherry-picks: not for 2.x.

Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Reviewed-on: http://gerrit.cloudera.org:8080/9241
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/test_scanners.py
5 files changed, 53 insertions(+), 32 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9241 )

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................


Patch Set 5: Code-Review+2

Good point, updated the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Feb 2018 21:51:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6077: remove Parquet BIT PACKED def level support

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9241

to look at the new patch set (#2).

Change subject: IMPALA-6077: remove Parquet BIT_PACKED def level support
......................................................................

IMPALA-6077: remove Parquet BIT_PACKED def level support

The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.

Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e60aea300f9e83db67f56a5efd07c35c.

The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.

Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.

Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/test_scanners.py
5 files changed, 37 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/9241/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Gerrit-Change-Number: 9241
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>