You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2017/11/01 00:54:38 UTC

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8436


Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 42 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#2).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 38 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 22:58:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 8:

> Uploaded patch set 7.
Oops, there was a rebase... to see the changes that fix CHAR type handling, see diff between patch 6 and 8.

Tim, can you rerun the tests for me?


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 00:04:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 01:12:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1491/


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 03:02:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934
PS4, Line 934: null
> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942
PS4, Line 942:   // 3. If the column type is not string, and the dictionary page is not compressed,
> I'm not 100% confident that we have test coverage for this case - we have s
I have checked, and this branch is executed if exploration strategy > core.


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944
PS4, Line 944:   ScopedBuffer uncompressed_buffer(parent_->dictionary_pool_->mem_tracker());
> nit: unnecessary blank line
Done


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964
PS4, Line 964:  
> nit: != nullptr (we prefer explicit checks against null to make it visually
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:14:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1491/


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 23:33:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@959
PS1, Line 959:       if (tmp_buffer.TryAllocate(uncompressed_size)) {
It turned out that this can violate a dcheck, if uncompressed_size  is 0, which seems valid if every value of a column in a partition is NULL. This is the case in a partition of functional_parquet.alltypesagg/tinyint_col, which lead to a crash in one of the tests. I think that there should be a table/column with all NULL values by design to test this case explicitly.



-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 00:44:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Reviewed-on: http://gerrit.cloudera.org:8080/8436
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 33 insertions(+), 18 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 04:40:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#5).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 33 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return false; }
Add comment. Make pure if possible. Alternatively, you could do something similar to PageContainsTupleData().


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
add "override"


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
how about rewording this to "Column readers for types that require ..."


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
better name


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
Move this to L954


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) {
Can you add a brief comment to explain how the 4 different cases are handled?


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@987
PS1, Line 987:     if (DictionaryReferencesBuffer()) {
             :       memcpy(dict_values, data_, data_size);
             :     }
Single line :)



-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 19:16:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return false; }
> Add comment. Make pure if possible. Alternatively, you could do something s
I have removed this and the other similar functions, and used directly slot_desc_->type().IsVarLenStringType() instead.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
> add "override"
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
> how about rewording this to "Column readers for types that require ..."
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
> nit: single line
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
> better name
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
> Move this to L954
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) {
> Can you add a brief comment to explain how the 4 different cases are handle
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:39:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#3).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 36 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#7).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 33 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#4).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 35 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 4:

(4 comments)

Looks close.

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934
PS4, Line 934: NULL
nullptr


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942
PS4, Line 942:   // 3. If the column type is not string, and the dictionary page is not compressed,
I'm not 100% confident that we have test coverage for this case - we have somewhat limited testing of uncompressed parquet (the parquet/none tests are actually snappy-compressed since that's the default).

I think the main coverage we have is tests that insert into parquet and read it back. I took a look and it seems like tests/query_test/test_insert_parquet.py should exercise this code path, but could you confirm? E.g. add a log message on each branch and make sure that they appear in the logs.


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944
PS4, Line 944: 
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964
PS4, Line 964: )
nit: != nullptr (we prefer explicit checks against null to make it visually obvious that it's a null check).



-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Nov 2017 00:39:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 23:33:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 6:

> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1491/

There seems to be some problem with the handling if char type, I will check it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 15:06:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1504/


-- 
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 01:12:44 +0000
Gerrit-HasComments: No