You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2016/09/01 19:44:56 UTC
[Impala-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
Hello Michael Ho, Internal Jenkins,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/4250
to review the following change.
Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
......................................................................
IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
HdfsScanner::StartNewRowBatch() is called once per row batch
by the parquet scanner to allocate a new row batch and tuple
buffer. Similarly, a scratch batch is created for each row
batch in HdfsParquetScanner::AssembleRows() which also contains
the tuple buffer. In reality, only the tuple buffer in the
scratch batch is used. So, the tuple buffer allocated by
HdfsScanner::StartNewRowBatch() is unused memory for the
parquet scanner.
This change fixes the problem above by implementing
HdfsParquetScanner::StartNewRowBatch() which creates
a new row batch without allocating the tuple buffer.
With this patch, the memory consumption when
materializing very wide tuples is reduced by half.
Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Reviewed-on: http://gerrit.cloudera.org:8080/4064
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 1522da3510a36635e3fc694b26211554fcd2793a)
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scanner.h
3 files changed, 14 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/50/4250/1
--
To view, visit http://gerrit.cloudera.org:8080/4250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
[Impala-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.
Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
......................................................................
Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/4250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: abandon
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>