You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Brown (JIRA)" <ji...@apache.org> on 2017/11/20 20:39:00 UTC

[jira] [Resolved] (IMPALA-6109) Hbase in minicluster appears to be flaky

     [ https://issues.apache.org/jira/browse/IMPALA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Brown resolved IMPALA-6109.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

{noformat}
commit 10334260d893c8bca03349f23e34ad8832aa1d51
Author: Lars Volker <lv...@cloudera.com>
Date:   Fri Nov 17 15:24:22 2017 -0800

    IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255

    The test puts the HDFS name node into safe mode to trigger an "Unknown
    Error 255" and verifies that the error details can be obtained correctly
    via the libHDFS API. However, putting the name node into safe mode can
    trip up HBase (HBASE-18738), which causes sporadic failures of our other
    HBase tests. To prevent this, we xfail the test until the HBase issue
    has been addressed (or we find a better way to trigger a 255 error).
    IMPALA-6212 tracks re-enabling the test in the future.

    Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3
    Reviewed-on: http://gerrit.cloudera.org:8080/8590
    Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
    Tested-by: Impala Public Jenkins
{noformat}

> Hbase in minicluster appears to be flaky
> ----------------------------------------
>
>                 Key: IMPALA-6109
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6109
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.11.0
>            Reporter: Tim Armstrong
>            Assignee: Lars Volker
>            Priority: Critical
>              Labels: flaky
>             Fix For: Impala 2.11.0
>
>
> I saw a bunch of hbase-related tests failing with errors along the lines below:
>     metadata.test_compute_stats.TestHbaseComputeStats.test_hbase_compute_stats[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     metadata.test_compute_stats.TestHbaseComputeStats.test_hbase_compute_stats_incremental[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_hbase_queries.TestHBaseQueries.test_hbase_scan_node[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_join_queries.TestJoinQueries.test_joins_against_hbase[batch_size: 0 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
>     query_test.test_hbase_queries.TestHBaseQueries.test_hbase_row_key[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_observability.TestObservability.test_scan_summary
>     query_test.test_hbase_queries.TestHBaseQueries.test_hbase_filters[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_scanners.TestScannersAllTableFormats.test_scanners[batch_size: 0 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_mt_dop.TestMtDop.test_mt_dop[mt_dop: 2 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     query_test.test_hbase_queries.TestHBaseQueries.test_hbase_subquery[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: hbase/none]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | location: CLOSE | action: FAIL | query: select 1 from alltypessmall order by id]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: OPEN | action: CANCEL | query: select row_number() over (partition by int_col order by id) from alltypessmall]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | location: GETNEXT_SCANNER | action: MEM_LIMIT_EXCEEDED | query: select * from alltypes]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: GETNEXT | action: MEM_LIMIT_EXCEEDED | query: select 1 from alltypessmall order by id limit 100]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: PREPARE_SCANNER | action: MEM_LIMIT_EXCEEDED | query: select count(int_col) from alltypessmall group by id]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: OPEN | action: MEM_LIMIT_EXCEEDED | query: select count(*) from alltypessmall]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select c from (select id c from alltypessmall order by id limit 10) v where c = 1]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select * from alltypessmall union all select * from alltypessmall]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select 1 from alltypessmall a join alltypessmall b on a.id != b.id]
>     failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | location: PREPARE | action: FAIL | query: select 1 from alltypessmall a join alltypessmall b on a.id = b.id]
> {noformat}
> E    Query aborted:RuntimeException: couldn't retrieve HBase table (functional_hbase.alltypessmall) info:
> E   This server is in the failed servers list: localhost/127.0.0.1:16202
> E   CAUSED BY: FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:16202
> {noformat}
> {noformat}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E    MESSAGE: RuntimeException: couldn't retrieve HBase table (functional_hbase.alltypessmall) info:
> E   Connection refused
> E   CAUSED BY: ConnectException: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)