You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/04/04 01:18:02 UTC
[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Hello Marcel Kornacker,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6000
to look at the new patch set (#5).
Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
......................................................................
IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Implements HdfsLzoTextTextScanner::GetNext() and changes
ProcessSplit() to repeatedly call GetNext() to share the core
scanning code between the legacy ProcessSplit() interface
(ProcessSpit()) and the new GetNext() interface.
These changes were tricky:
- The scanner used to rely on the ability to attach a batch
to the row-batch queue for freeing resources
- This patch attempts to preserve the resource-freeing behavior
by clearing resources as soon as they are complete
- In particular, the scanner attempts to skip corrupt/invalid
data blocks, and we should avoid accumulating memory
unnecessarily
The other changes are mostly straightforward:
- Add a RowBatch parameter to various functions
- Add a MemPool parameter to various functions for attaching
memory of completed resources that may still be references
by returned batches
- Change Close() to free all resources when a nullptr
RowBatch is passed
Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster
Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
15 files changed, 293 insertions(+), 216 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/6000/5
--
To view, visit http://gerrit.cloudera.org:8080/6000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>