You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/04/26 15:47:01 UTC

[jira] [Commented] (IMPALA-10559) TestScratchLimit seems flaky

    [ https://issues.apache.org/jira/browse/IMPALA-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332526#comment-17332526 ] 

ASF subversion and git services commented on IMPALA-10559:
----------------------------------------------------------

Commit 91d2ab2116293b8f45afd02d029d47233877c538 in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=91d2ab2 ]

IMPALA-10584: Defer advancing read page if stream only has 2 pages.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
are only 2 pages in the stream. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page. The next GetNext call will be responsible to advance the read
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Reviewed-on: http://gerrit.cloudera.org:8080/17195
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> TestScratchLimit seems flaky
> ----------------------------
>
>                 Key: IMPALA-10559
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10559
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Yida Wu
>            Assignee: Riza Suminto
>            Priority: Blocker
>              Labels: broken-build
>             Fix For: Impala 4.0
>
>
>  
> The TestScratchLimit testcase runs fine alone. But if runs TestScratchDir first, then runs TestScratchLimit, it will always fail as follow shows unless restarts the services (e.g. using "./buildall.sh -noclean -notests -format -start_minicluster -start_impala_cluster").
> {code:java}
> query_test/test_scratch_limit.py::TestScratchLimit::test_with_low_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> query_test/test_scratch_limit.py::TestScratchLimit::test_without_specifying_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> query_test/test_scratch_limit.py::TestScratchLimit::test_with_high_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> query_test/test_scratch_limit.py::TestScratchLimit::test_with_zero_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> query_test/test_scratch_limit.py::TestScratchLimit::test_with_unlimited_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> query_test/test_scratch_limit.py::TestScratchLimit::test_with_zero_scratch_limit_no_memory_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw1] PASSED query_test/test_scratch_limit.py::TestScratchLimit::test_with_zero_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw5] FAILED query_test/test_scratch_limit.py::TestScratchLimit::test_with_low_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw4] FAILED query_test/test_scratch_limit.py::TestScratchLimit::test_with_high_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw2] FAILED query_test/test_scratch_limit.py::TestScratchLimit::test_without_specifying_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw3] FAILED query_test/test_scratch_limit.py::TestScratchLimit::test_with_unlimited_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw0] PASSED query_test/test_scratch_limit.py::TestScratchLimit::test_with_zero_scratch_limit_no_memory_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> ------------------------------------------- generated xml file: /home/yida/Impala/logs/ee_tests/results/TEST-impala-parallel.xml --------------------------------------------
> ========================================================================== short test summary info ==========================================================================
> FAIL query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_low_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> FAIL query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_high_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> FAIL query_test/test_scratch_limit.py::TestScratchLimit::()::test_without_specifying_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> FAIL query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_unlimited_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> ================================================================================= FAILURES ==================================================================================
>  TestScratchLimit.test_with_low_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw5] linux2 -- Python 2.7.16 /home/yida/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scratch_limit.py:90: in test_with_low_scratch_limit
>  assert expected_error % scratch_limit_in_bytes in str(e)
> E assert ('Scratch space limit of %s bytes exceeded' % 25165824) in 'ImpalaBeeswaxException:\n Query aborted:Could not create files in any configured scratch directories (--scratch_dirs=...ous errors that may have prevented creating or writing scratch files. The following directories were at capacity: \n\n'
> E + where 'ImpalaBeeswaxException:\n Query aborted:Could not create files in any configured scratch directories (--scratch_dirs=...ous errors that may have prevented creating or writing scratch files. The following directories were at capacity: \n\n' = str(ImpalaBeeswaxException())
> --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_low_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshol;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-03-03 07:12:08,115 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-03-03 07:12:08,140 INFO MainThread: Closing active operation
> --------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_low_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshol;
> SET scratch_limit=24m;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> SET buffer_pool_limit=32m;
> -- executing against localhost:21000
>  select o_orderdate, o_custkey, o_comment
>  from tpch.orders
>  order by o_orderdate
>  ;
> -- 2021-03-03 07:12:08,329 INFO MainThread: Started query f44b4a50bec6b10d:e45a053d00000000
>  TestScratchLimit.test_with_high_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw4] linux2 -- Python 2.7.16 /home/yida/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scratch_limit.py:74: in test_with_high_scratch_limit
>  self.execute_query_expect_success(self.client, self.spilling_sort_query, exec_option)
> common/impala_test_suite.py:809: in wrapper
>  return function(*args, **kwargs)
> common/impala_test_suite.py:817: in execute_query_expect_success
>  result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:918: in __execute_query
>  return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>  return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
>  handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
>  self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:386: in wait_for_finished
>  raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Could not create files in any configured scratch directories (--scratch_dirs=) on backend 'yida-OptiPlex-7060:27001'. 0 of scratch is currently in use by this Impala Daemon (0 by this query). See logs for previous errors that may have prevented creating or writing scratch files. The following directories were at capacity:
> --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_high_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-03-03 07:12:08,114 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-03-03 07:12:08,136 INFO MainThread: Closing active operation
> --------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_high_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> SET scratch_limit=500m;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> SET buffer_pool_limit=32m;
> -- executing against localhost:21000
>  select o_orderdate, o_custkey, o_comment
>  from tpch.orders
>  order by o_orderdate
>  ;
> -- 2021-03-03 07:12:08,329 INFO MainThread: Started query 814fccc9c1242331:83738b0800000000
>  TestScratchLimit.test_without_specifying_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw2] linux2 -- Python 2.7.16 /home/yida/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scratch_limit.py:118: in test_without_specifying_scratch_limit
>  self.execute_query_expect_success(self.client, self.spilling_sort_query, exec_option)
> common/impala_test_suite.py:809: in wrapper
>  return function(*args, **kwargs)
> common/impala_test_suite.py:817: in execute_query_expect_success
>  result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:918: in __execute_query
>  return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>  return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
>  handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
>  self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:386: in wait_for_finished
>  raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Could not create files in any configured scratch directories (--scratch_dirs=) on backend 'yida-OptiPlex-7060:27002'. 0 of scratch is currently in use by this Impala Daemon (0 by this query). See logs for previous errors that may have prevented creating or writing scratch files. The following directories were at capacity:
> --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_without_specifying_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-03-03 07:12:08,114 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-03-03 07:12:08,137 INFO MainThread: Closing active operation
> --------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_without_specifying_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> SET buffer_pool_limit=32m;
> -- executing against localhost:21000
>  select o_orderdate, o_custkey, o_comment
>  from tpch.orders
>  order by o_orderdate
>  ;
> -- 2021-03-03 07:12:08,329 INFO MainThread: Started query 944af93b81ee62cf:ab33f08800000000
>  TestScratchLimit.test_with_unlimited_scratch_limit[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> [gw3] linux2 -- Python 2.7.16 /home/yida/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scratch_limit.py:110: in test_with_unlimited_scratch_limit
>  self.execute_query_expect_success(self.client, self.spilling_sort_query, exec_option)
> common/impala_test_suite.py:809: in wrapper
>  return function(*args, **kwargs)
> common/impala_test_suite.py:817: in execute_query_expect_success
>  result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:918: in __execute_query
>  return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>  return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
>  handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
>  self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:386: in wait_for_finished
>  raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Could not create files in any configured scratch directories (--scratch_dirs=) on backend 'yida-OptiPlex-7060:27001'. 0 of scratch is currently in use by this Impala Daemon (0 by this query). See logs for previous errors that may have prevented creating or writing scratch files. The following directories were at capacity:
> --------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_unlimited_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_th;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-03-03 07:12:08,114 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-03-03 07:12:08,146 INFO MainThread: Closing active operation
> --------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
> SET client_identifier=query_test/test_scratch_limit.py::TestScratchLimit::()::test_with_unlimited_scratch_limit[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_th;
> SET scratch_limit=-1;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> SET buffer_pool_limit=32m;
> -- executing against localhost:21000
>  select o_orderdate, o_custkey, o_comment
>  from tpch.orders
>  order by o_orderdate
>  ;
> -- 2021-03-03 07:12:08,329 INFO MainThread: Started query 5c492cfd62287b03:f9c3370b00000000
> ==================================================================== 4 failed, 2 passed in 14.11 seconds ====================================================================
> Running TestExecutor with args: ['--ignore', 'query_test', '--ignore', 'catalog_service', '--ignore', 'hs2', '--ignore', 'experiments', '--ignore', 'data_errors', '--ignore', 'performance', '--ignore', 'beeswax', '--ignore', 'authorization', '--ignore', 'metadata', '--ignore', 'shell', '--ignore', 'benchmark', '--ignore', 'infra', '--ignore', 'statestore', '--ignore', 'failure', '--ignore', 'webserver', '--ignore', 'comparison', '--ignore', 'stress', '--ignore', 'util', '--ignore', 'custom_cluster', '--ignore', 'common', '--ignore', 'observability', '--ignore', 'unittests', '--resultlog', '/home/yida/Impala/logs/ee_tests/results/TEST-impala-verify-metrics.log', '--junitxml', '/home/yida/Impala/logs/ee_tests/results/TEST-impala-verify-metrics.xml', 'verifiers/test_verify_metrics.py']
> rootLoggerLevel = INFO
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org