You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/04/04 01:18:02 UTC

[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6000

to look at the new patch set (#5).

Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
......................................................................

IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

Implements HdfsLzoTextTextScanner::GetNext() and changes
ProcessSplit() to repeatedly call GetNext() to share the core
scanning code between the legacy ProcessSplit() interface
(ProcessSpit()) and the new GetNext() interface.

These changes were tricky:
- The scanner used to rely on the ability to attach a batch
  to the row-batch queue for freeing resources
- This patch attempts to preserve the resource-freeing behavior
  by clearing resources as soon as they are complete
- In particular, the scanner attempts to skip corrupt/invalid
  data blocks, and we should avoid accumulating memory
  unnecessarily

The other changes are mostly straightforward:
- Add a RowBatch parameter to various functions
- Add a MemPool parameter to various functions for attaching
  memory of completed resources that may still be references
  by returned batches
- Change Close() to free all resources when a nullptr
  RowBatch is passed

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
15 files changed, 293 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/6000/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>