You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Attila Jeges (JIRA)" <ji...@apache.org> on 2019/04/25 15:45:00 UTC
[jira] [Comment Edited] (IMPALA-8452) Avro scanner seems broken

    [ https://issues.apache.org/jira/browse/IMPALA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826186#comment-16826186 ] 

Attila Jeges edited comment on IMPALA-8452 at 4/25/19 3:44 PM:
---------------------------------------------------------------

[~kwho] Both 'test_fuzz_alltypes' and 'test_tpch_scan_ranges' rely on random test values, so it makes sense that they don't fail consistently in all test runs.

I can confirm that 'test_tpch_scan_ranges' is not related to IMPALA-7368. I was able to reproduce it consistently without IMPALA-7368 on my Ubuntu dev box after modifying the test s follows:
{code}
  def test_tpch_scan_ranges(self, vector):
    # Remove randomness from the test:
    ### Randomly adjust the scan range length to exercise different code paths.
    ## max_scan_range_length = \
    ##     int(vector.get_value('scan_range_length') * (random.random() + 0.5))
    max_scan_range_length = 8412307

    LOG.info("max_scan_range_length={0}".format(max_scan_range_length))
    vector.get_value('exec_option')['max_scan_range_length'] = max_scan_range_length
    self.run_test_case('tpch-scan-range-lengths', vector)
{code}

I didn't have time to reproduce 'test_fuzz_alltypes'  since it is more involved, but I suspect that it is not related to IMPALA-7368 either.


was (Author: attilaj):
[~kwho] Both 'test_fuzz_alltypes' and 'test_tpch_scan_ranges' rely on random test values, so it makes sense that they don't fail consistently in all test runs.

I can confirm that 'test_tpch_scan_ranges' is not related to IMPALA-7368. I was able to reproduce it consistently without IMPALA-7368 on my Ubuntu dev box after modifying the test s follows:
{code}
  def test_tpch_scan_ranges(self, vector):
    # Remove randomness from the test:
    ## Randomly adjust the scan range length to exercise different code paths.
    # max_scan_range_length = \
    #     int(vector.get_value('scan_range_length') * (random.random() + 0.5))
    max_scan_range_length = 8412307

    LOG.info("max_scan_range_length={0}".format(max_scan_range_length))
    vector.get_value('exec_option')['max_scan_range_length'] = max_scan_range_length
    self.run_test_case('tpch-scan-range-lengths', vector)
{code}

I didn't have time to reproduce 'test_fuzz_alltypes'  since it is more involved, but I suspect that it is not related to IMPALA-7368 either.

> Avro scanner seems broken
> -------------------------
>
>                 Key: IMPALA-8452
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8452
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Michael Ho
>            Assignee: Attila Jeges
>            Priority: Blocker
>              Labels: broken-build, wrongresults
>
> A few scanner tests started failing recently on Centos6. Coincidentally, both of them only started happening after [this commit|https://github.com/apache/impala/commit/b5805de3e65fd1c7154e4169b323bb38ddc54f4f]. [~attilaj], can you please take a look and reassign if you think that commit is unrelated ? 
> Oddly enough, this has shown up on Centos6. Other exhaustive runs with Centos7 seem to work fine. May be it's related to some platform's library ?
> In the first case, a select count star from an avro table hangs for 2 hours:
> {noformat}
> query_test/test_scanners_fuzz.py:83: in test_fuzz_alltypes
>     self.run_fuzz_test(vector, src_db, table_name, unique_database, table_name)
> query_test/test_scanners_fuzz.py:201: in run_fuzz_test
>     result = self.execute_query(query, query_options = query_options)
> common/impala_test_suite.py:619: in wrapper
>     return function(*args, **kwargs)
> common/impala_test_suite.py:650: in execute_query
>     return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:721: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:180: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:183: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:360: in __execute_query
>     self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:384: in wait_for_finished
>     time.sleep(0.05)
> E   Failed: Timeout >7200s
> SET client_identifier=query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_alltypes[protocol:beeswax|exec_option:{'debug_action':None;'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:avro/none];
> SET batch_size=1;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=True;
> SET abort_on_error=False;
> SET mem_limit=512m;
> -- executing against localhost:21000
> select count(*) from test_fuzz_alltypes_2cdcb963.alltypes q;
> -- 2019-04-24 04:14:31,857 INFO     MainThread: Started query 2049069f9f5e3aa8:f2fd47ff00000000
> {noformat}
> The second case has to do with incorrect number of rows in a select count star from tpch_avro.lineitem:
> {noformat}
> query_test/test_scanners.py:947: in test_tpch_scan_ranges
>     self.run_test_case('tpch-scan-range-lengths', vector)
> common/impala_test_suite.py:517: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:370: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:449: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:271: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E     6001215 != 6000679
> -- 2019-04-24 03:43:42,805 INFO     MainThread: max_scan_range_length=8412307
> SET client_identifier=query_test/test_scanners.py::TestTpchScanRangeLengths::()::test_tpch_scan_ranges[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|;
> -- executing against localhost:21000
> use tpch_avro;
> -- 2019-04-24 03:43:42,814 INFO     MainThread: Started query c04e1968443b52fc:5b99b1b300000000
> SET client_identifier=query_test/test_scanners.py::TestTpchScanRangeLengths::()::test_tpch_scan_ranges[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET max_scan_range_length=8412307;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> select count(*)
> from lineitem;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org