You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2019/04/26 11:30:53 UTC

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12621


Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

TODO: Currently the delta coders only support 32-bit integers. For 64
bit integers, we have to extend the functionality of BitWriter and
BatchedBitReader.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/parquet-common.h
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,238 insertions(+), 56 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12621/8/be/src/exec/parquet/parquet-delta-coder-test-data.h
File be/src/exec/parquet/parquet-delta-coder-test-data.h:

http://gerrit.cloudera.org:8080/#/c/12621/8/be/src/exec/parquet/parquet-delta-coder-test-data.h@454
PS8, Line 454: const std::vector<int32_t> values_are_the_same_plain = {3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12621/8/be/src/exec/parquet/parquet-delta-coder-test-data.h@471
PS8, Line 471: const std::vector<int32_t> delta_is_zero_for_each_block_plain = {0, 0, 0, 0, 0, 0, 0, 0, 0,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 10:00:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2933/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 13:26:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Also added a benchmark (parquet-delta-benchmark.cc) that compares the
space and CPU performance of plain, dictionary and delta encoding.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Reviewed-on: http://gerrit.cloudera.org:8080/12621
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.cc
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,888 insertions(+), 13 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 22
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 13:

We decided to split this into multiple subtasks. A part of this change (with some modifications) is https://gerrit.cloudera.org/#/c/13737/.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 12:09:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12621

to look at the new patch set (#7).

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

TODO: Currently the delta coders only support 32-bit integers. For 64
bit integers, we have to extend the functionality of BitWriter and
BatchedBitReader.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/parquet-common.h
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,239 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/7
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

Hmm, this CR somehow got forgotten. Anyway, I'm planning to take a look in the following days. Daniel, do you plan to continue this work?


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:12:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
13 files changed, 4,227 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/14
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
14 files changed, 4,247 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/10
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 21: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 21
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 15:23:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

TODO: Mention the benchmarks.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.cc
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,751 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/19
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 19
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 10:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2963/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 13:35:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

(8 comments)

I've ran over it and left a few nit comments.

http://gerrit.cloudera.org:8080/#/c/12621/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12621/16//COMMIT_MSG@17
PS16, Line 17: 
Could you write something about the benchmark? At least what are the results of the measurements.


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/benchmarks/parquet-delta-benchmark.cc
File be/src/benchmarks/parquet-delta-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/benchmarks/parquet-delta-benchmark.cc@436
PS16, Line 436: ,
nit: line too long


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/benchmarks/parquet-delta-benchmark.cc@480
PS16, Line 480: ride);
nit: line too long


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/benchmarks/parquet-delta-benchmark.cc@1058
PS16, Line 1058: 200, 400};
nit: line too long


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/benchmarks/parquet-delta-benchmark.cc@1068
PS16, Line 1068: tride == 100 || config.stride == 400 */)
nit: line too long


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/exec/parquet/parquet-delta-decoder.h
File be/src/exec/parquet/parquet-delta-decoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/exec/parquet/parquet-delta-decoder.h@44
PS16, Line 44:     {
             :     }
nit: you could put these to L43.


http://gerrit.cloudera.org:8080/#/c/12621/2/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/2/be/src/exec/parquet/parquet-delta-encoder.h@149
PS2, Line 149: er_pos_, output_buffer_len
> Checking the code of CountLeadingZeros, I can see it uses __builtin_clz, wh
Yeah, CountLeadingZeros should check this.


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/util/bit-packing.cc@69
PS16, Line 69: bool* __restrict__ decode_error);
nit: fits previous line



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Jun 2020 17:10:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5015/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 27 Sep 2019 13:13:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 8:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2953/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 10:20:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
14 files changed, 4,247 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/12
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12621

to look at the new patch set (#8).

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

TODO: Currently the delta coders only support 32-bit integers. For 64
bit integers, we have to extend the functionality of BitWriter and
BatchedBitReader.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 2,123 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/8
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5009/


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 22:50:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 19
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Apr 2023 09:51:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12621/8/be/src/exec/parquet/parquet-delta-coder-test-data.h
File be/src/exec/parquet/parquet-delta-coder-test-data.h:

PS8: 
This patch set was added by mistake.



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 10:02:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 27 Sep 2019 17:25:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3715/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 22 Jun 2019 10:30:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2932/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 12:21:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 19:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc
File be/src/benchmarks/parquet-delta-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc@1060
PS19, Line 1060:   const std::vector<int> strides = {4, 8, 12, 16, 20, 30, 40, 50, 80, 100, 120, 150, 180, 200, 400};
nit: long line


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc@1062
PS19, Line 1062:   /// This is used to tune which configurations should be measured.
               :   auto filter = [] (const Config& config) -> bool {
               :     return true
               :         && config.parquet_type == Config::INT32
               :         && config.out_type == Config::int32
               :         // && (int) config.parquet_type == (int) config.out_type
               :         && config.access == Config::BATCH
               :         // && config.encoding == Config::PLAIN && config.mean_delta == 1
               :         && (config.stride == 4 /* || config.stride == 8 || config.stride == 20 || config.stride == 100 || config.stride == 400 */)
               :         && config.mean_delta == 1
               :         ;
What is the long term intention with this filter?
It would be nice to find 8-16 combinations to run and insert the results to the beginning of this file


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc
File be/src/exec/parquet/parquet-delta-decoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@181
PS19, Line 181:     num_buffered_values_ - next_buffered_value_index_;
nit: +2 indentation


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@298
PS19, Line 298: template int ParquetDeltaDecoder<int32_t>::NextValuesConverted<int8_t>(int num_values,
> I'm not sure it is a good idea to explicitly instantiate NextValuesConverte
We may be able to remove most int64_t ones, see my comment in bit-packing.cc


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h@54
PS19, Line 54:     static constexpr int MAX_TOTAL_VALUE_COUNT = 16000;
Note that there is a query option for setting max row count in page: parquet_page_row_count_limit


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc
File be/src/exec/parquet/parquet-delta-encoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@117
PS19, Line 117:   std::memmove(new_data_start_address, old_data_start_address, data_len);
Can you skip this if reserved_space_for_header_ == actual_header_size? Even if memmove would optimize this case I think that the code would be clearer.

Alternatively we could avoid the copy and point to a bit later point, similarly to Arrow


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@289
PS19, Line 289:       miniblock_index * miniblock_size_in_values_;
nit: indentation


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc@75
PS19, Line 75: INSTANTIATE_UNPACK_AND_DELTA_DECODE(int8_t, int64_t);
             : INSTANTIATE_UNPACK_AND_DELTA_DECODE(int16_t, int64_t);
             : INSTANTIATE_UNPACK_AND_DELTA_DECODE(int32_t, int64_t);
Do we need these combinations?
We only seem to support BIGINT with int64 Parquet columns:
https://github.com/apache/impala/blob/cf28a4c5292fdb3504d1fe11183c78243ed148a4/be/src/exec/parquet/parquet-metadata-utils.cc#L54



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 19
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 17:21:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4332/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 17:25:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
13 files changed, 4,228 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/15
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 10:

Rebasing. Adding and correcting licence headers.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 12:45:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4653/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 18:57:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 21
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 09:57:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5009/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 18:36:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3720/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Jun 2019 10:59:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4382/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Aug 2019 11:00:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 9:

No Builds Executed


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 10:33:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 20:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc
File be/src/benchmarks/parquet-delta-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc@1060
PS19, Line 1060: 
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc
File be/src/exec/parquet/parquet-delta-decoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@181
PS19, Line 181:   if (!SKIP) {
> nit: +2 indentation
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@298
PS19, Line 298:   if (UNLIKELY(!min_delta_read)) return false;
> We may be able to remove most int64_t ones, see my comment in bit-packing.c
Done, also removed the unsigned instantiations as Impala has only signed types.


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h@54
PS19, Line 54:     static constexpr int DEFAULT_MAX_TOTAL_VALUE_COUNT 
> Note that there is a query option for setting max row count in page: parque
I introduced a new parameter to Init(), 'max_page_value_count', which can be used to set the value to 'parquet_page_row_count_limit' when we integrate the encoder with the Parquet writer.

Left 16000 as a default.


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc
File be/src/exec/parquet/parquet-delta-encoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@117
PS19, Line 117:   const int data_len = output_buffer_pos_ - old_data_start_address;
> Can you skip this if reserved_space_for_header_ == actual_header_size? Even
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@289
PS19, Line 289:     (delta_buffer_.size() + miniblock_size_in_values_ - 1) / miniblock_size_in_values_;
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc@75
PS19, Line 75: 
             : // Required for bit-packing-benchmark.cc.
             : template
> Do we need these combinations?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 20
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 17:24:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12621/14/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/14/be/src/util/bit-packing.cc@63
PS14, Line 63: #define INSTANTIATE_UNPACK_AND_DELTA_DECODE(OUT_TYPE, PARQUET_TYPE)                                         \
line too long (109 > 90)


http://gerrit.cloudera.org:8080/#/c/12621/14/be/src/util/bit-packing.cc@65
PS14, Line 65:   BitPacking::UnpackAndDeltaDecodeValues<OUT_TYPE>(int bit_width,                            \
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/12621/14/be/src/util/bit-packing.cc@67
PS14, Line 67:       PARQUET_TYPE delta_offset, int64_t num_values, OUT_TYPE* __restrict__ out, int64_t stride, \
line too long (98 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 16:43:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
14 files changed, 4,252 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/13
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-coder-test-data.h
File be/src/exec/parquet/parquet-delta-coder-test-data.h:

http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-coder-test-data.h@454
PS6, Line 454: const std::vector<int32_t> values_are_the_same_plain = {3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-coder-test-data.h@471
PS6, Line 471: const std::vector<int32_t> delta_is_zero_for_each_block_plain = {0, 0, 0, 0, 0, 0, 0, 0, 0,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-encoder.h@47
PS6, Line 47:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/12621/6/be/src/exec/parquet/parquet-delta-encoder.h@115
PS6, Line 115:       const int header_size = HeaderSize(most_negative_first_value, MAX_TOTAL_VALUE_COUNT);
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 11:31:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 20: Code-Review+2

Thanks for the work on this! It can be merged from my side.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 20
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 06:31:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9338/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 21
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 09:57:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13091/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 20
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 17:46:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has restored this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has abandoned this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
13 files changed, 4,227 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/16
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 16:

Is this something that's still needed a review? THis would still be a great feature, obviously.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jun 2020 22:39:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3056/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 May 2019 12:49:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
14 files changed, 4,247 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/11
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Draft - Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12621

to look at the new patch set (#9).

Change subject: IMPALA-8253: Draft - Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Draft - Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

TODO: Currently the delta coders only support 32-bit integers. For 64
bit integers, we have to extend the functionality of BitWriter and
BatchedBitReader.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitWriter and BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/parquet-common.h
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/CMakeLists.txt
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils-test.cc
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,239 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 19:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc
File be/src/exec/parquet/parquet-delta-decoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@298
PS19, Line 298: template int ParquetDeltaDecoder<int32_t>::NextValuesConverted<int8_t>(int num_values,
I'm not sure it is a good idea to explicitly instantiate NextValuesConverted() here because there are too many possibilities and the compile time of this file is very long. Maybe we should move the NextValuesConverted() implementation back to the header.


http://gerrit.cloudera.org:8080/#/c/12621/2/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/2/be/src/exec/parquet/parquet-delta-encoder.h@149
PS2, Line 149: 
> Yeah, CountLeadingZeros should check this.
I opened IMPALA-12086 for this.


http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/16/be/src/util/bit-packing.cc@69
PS16, Line 69: 
> nit: fits previous line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 19
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Apr 2023 09:28:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................

IMPALA-8253: Parquet delta encoding and decoding.

Implemented an encoder and decoder for the Parquet delta encoding (see
https://github.com/apache/parquet-format/blob/master/Encodings.md).

The coders are not integrated with Impala yet, they provide an interface
that Impala could use.

Added new methods to BitWriter and BatchedBitReader handling Uleb and
ZigZag integers for 64 bits.

Also added a benchmark (parquet-delta-benchmark.cc) that compares the
space and CPU performance of plain, dictionary and delta encoding.

Testing:
  - Added new tests for the encoder and decoder
  - Tests covering the additions in BitPacking, BitWriter and
    BatchedBitReader.

Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/parquet-delta-benchmark.cc
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-delta-coder-test-data.h
A be/src/exec/parquet/parquet-delta-coder-test.cc
A be/src/exec/parquet/parquet-delta-decoder.cc
A be/src/exec/parquet/parquet-delta-decoder.h
A be/src/exec/parquet/parquet-delta-encoder.cc
A be/src/exec/parquet/parquet-delta-encoder.h
M be/src/util/bit-packing-test.cc
M be/src/util/bit-packing.cc
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
15 files changed, 4,888 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12621/20
-- 
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 20
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>