You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Henry Robinson (Code Review)" <ge...@cloudera.org> on 2016/06/28 22:28:25 UTC

[Impala-CR](cdh5-2.6.0 5.8.0) IMPALA-3798: Disable per-split filtering for sequence-based scanners

Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3526

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................

IMPALA-3798: Disable per-split filtering for sequence-based scanners

If a runtime filter rejects a sequence-based format's header split (but
not the entire file, which may happen if the filter has not arrived in
time), the scanner will never mark all splits for that file
complete. This is because BaseSequenceScanner issues scan ranges after
parsing the header splits, and until those ranges are processed,
RangeComplete() and AddDiskIoRanges() will not be called - those methods
update progress_ and num_unqueued_files_
respectively. HdfsScanNode::ScannerThread() reads those variables to
decide whether to exit, and as a result will spin forever.

This bug therefore only shows up when there is >1 scan range per file.

This patch disables per-split filtering for Avro, RC and sequence files
in lieu of a permanent fix which marks all scan ranges for a file as
done as soon as one range is filtered out.

Testing:

A custom cluster test is added which disables file filtering, emulating
the race condition that leads to the hang when a query that filters
scan ranges is run. Without the fix, this test hangs, with the fix the
query completes as expected. MAX_SCAN_RANGE_LENGTH is used to ensure >1
scan range per file.

Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-scan-node.cc
A tests/custom_cluster/test_seq_file_filtering.py
3 files changed, 86 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/26/3526/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Henry Robinson <he...@cloudera.com>

[Impala-CR](cdh5-2.6.0 5.8.0) IMPALA-3798: Disable per-split filtering for sequence-based scanners

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................


IMPALA-3798: Disable per-split filtering for sequence-based scanners

If a runtime filter rejects a sequence-based format's header split (but
not the entire file, which may happen if the filter has not arrived in
time), the scanner will never mark all splits for that file
complete. This is because BaseSequenceScanner issues scan ranges after
parsing the header splits, and until those ranges are processed,
RangeComplete() and AddDiskIoRanges() will not be called - those methods
update progress_ and num_unqueued_files_
respectively. HdfsScanNode::ScannerThread() reads those variables to
decide whether to exit, and as a result will spin forever.

This bug therefore only shows up when there is >1 scan range per file.

This patch disables per-split filtering for Avro, RC and sequence files
in lieu of a permanent fix which marks all scan ranges for a file as
done as soon as one range is filtered out.

Testing:

A custom cluster test is added which disables file filtering, emulating
the race condition that leads to the hang when a query that filters
scan ranges is run. Without the fix, this test hangs, with the fix the
query completes as expected. MAX_SCAN_RANGE_LENGTH is used to ensure >1
scan range per file.

Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Reviewed-on: http://gerrit.cloudera.org:8080/3526
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Henry Robinson <he...@cloudera.com>
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-scan-node.cc
A tests/custom_cluster/test_seq_file_filtering.py
3 files changed, 86 insertions(+), 10 deletions(-)

Approvals:
  Henry Robinson: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>

[Impala-CR](cdh5-2.6.0 5.8.0) IMPALA-3798: Disable per-split filtering for sequence-based scanners

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.6.0 5.8.0) IMPALA-3798: Disable per-split filtering for sequence-based scanners

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................


Patch Set 1: Verified+1

Core build passed: http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/2144/

-- 
To view, visit http://gerrit.cloudera.org:8080/3526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-HasComments: No