You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/10/27 05:20:00 UTC
[jira] [Commented] (IMPALA-7335) Assertion Failure - test_corrupt_files

    [ https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665921#comment-16665921 ] 

ASF subversion and git services commented on IMPALA-7335:
---------------------------------------------------------

Commit 449fe73d2145bd22f0f857623c3652a097f06d73 in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=449fe73 ]

IMPALA-7662: fix error race when scanner open fails

This is very similar to IMPALA-7335, except happens
when 'progress_' is incremented in the call chain
HdfsScanNode::ProcessSplit
-> HdfsScanNodeBase::CreateAndOpenScanner()
-> HdfsScanner::Close()

The fix required restructuring the code so that
SetDoneInternal() is called with the error *before*
HdfsScanner::Close(). This required a refactoring because
HdfsScanNodeBase doesn't actually know about SetDoneInternal().

My fix is to put the common logic between HdfsScanNode and
HdfsScanNodeMt into a helper in HdfsScanNodeBase, then in
HdfsScanNode, make sure to call SetDoneInternal() before
closing the scanner.

I also reworked HdfsScanNode::ProcessSplit() to handle error propagation
internally. I think the joint responsibility between ProcessSplit() and
its caller for handling errors made things harder than necessary.

Testing:
Added a debug action and test that reproduced the race before the fix.

Change-Id: I45a61210ca7d057b048c77d9f2f2695ec450f19b
Reviewed-on: http://gerrit.cloudera.org:8080/11596
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Assertion Failure - test_corrupt_files
> --------------------------------------
>
>                 Key: IMPALA-7335
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7335
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.1.0
>            Reporter: nithya
>            Assignee: Pooja Nilangekar
>            Priority: Blocker
>              Labels: broken-build
>             Fix For: Impala 3.1.0
>
>
> test_corrupt_files fails 
>  
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option: \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from pytest)
>  
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files     self.run_test_case('QueryTest/parquet-abort-on-error', vector) common/impala_test_suite.py:420: in run_test_case     assert False, "Expected exception: %s" % expected_str E   AssertionError: Expected exception: Column metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
>     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
>     assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Column metadata states there are 11 values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org