You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Jin Chul Kim <ji...@gmail.com> on 2017/11/16 02:05:58 UTC

A test failure at Jenkins build is not reproducible on my local

Hi,

I am trying to look into the build error:
https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/
(The relevant code change: https://gerrit.cloudera.org/#/c/8355/)

There was a test failure at "TestAllocFail.test_alloc_fail_init". I ran the
following command but it always passed on my change: ./tests/run-tests.py
tests/custom_cluster/test_alloc_fail.py

I changed the options below on Impala-shell and then retry it again, but
the failure did not happen.
exec_option: {'batch_size': 0, 'num_nodes': 0,
'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

I am using Ubuntu 16.04 TLS and debug binary. Is there any differences
between Jenkins env. and my env.?

] =========================== short test summary info
============================
] XFAIL
custom_cluster/test_alloc_fail.py::TestAllocFail::()::test_alloc_fail_update[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
'disable_codegen': False, 'abort_on_error': 1,
'exec_single_node_rows_threshold': 0} | table_format: text/none]
]   IMPALA-2925: the execution is not deterministic so some tests sometimes
don't fail as expected
] FAIL
custom_cluster/test_alloc_fail.py::TestAllocFail::()::test_alloc_fail_init[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
'disable_codegen': False, 'abort_on_error': 1,
'exec_single_node_rows_threshold': 0} | table_format: text/none]
] =================================== FAILURES
===================================
]  TestAllocFail.test_alloc_fail_init[exec_option: {'batch_size': 0,
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
table_format: text/none]
] custom_cluster/test_alloc_fail.py:34: in test_alloc_fail_init
]     self.run_test_case('QueryTest/alloc-fail-init', vector)
] common/impala_test_suite.py:398: in run_test_case
]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
] common/impala_test_suite.py:277: in __verify_exceptions
]     (expected_str, actual_str)
] E   AssertionError: Unexpected exception string. Expected:
FunctionContext::Allocate() failed to allocate 4 bytes.
] E   Not found in actual: ImpalaBeeswaxException: Query
aborted:FunctionContext::Allocate() failed to allocate 5000 bytes.

Best regards,
Jinchul

Re: A test failure at Jenkins build is not reproducible on my local

Posted by Philip Zeyliger <ph...@cloudera.com>.
>
>
> Regarding guidance of E2E tests in "How to load and run Impala tests
> <https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+load+and+run+Impala+tests>"
> wiki page, I wish there are more detailed information. Let me put the
> information on the wiki after you answer my question: What is the
> difference between "./test/run-tests.py" and "impala-py.test"?
>

Thank you for volunteering to update the wiki page.

py.test (https://docs.pytest.org/en/latest/) is a framework for writing
tests in Python. It's got a bunch of features.

impala-py.test is a small wrapper that invokes py.test in the right
environment (i.e., from the "virtualenv") and with LD_LIBRARY_PATH updated.
When you're using impala-py.test, you're using py.test's mechanisms.

Meanwhile, test/run-tests.py is a wrapper that invokes py.test a few times.
It uses environment variables for control. Then it executes any "serial"
tests, and then it executes some stress tests, and then, it instructs
py.test to execute the "parallel" tests (those not marked serial or stress)
with a bunch of parallelism. If you plotted your CPU graph while this was
happening, you'd see something like. You can tell where the parallel
testing happens about 3/4 of the way through.
[image: Inline image 1]

To make things even more complicated, there's bin/run-all-tests.sh, which
invokes run-tests.py (but also "mvn test" and the C++ tests), and
buildall.sh, which invokes run-all-tests.sh.

Cheers,

-- Philip

Re: A test failure at Jenkins build is not reproducible on my local

Posted by Michael Brown <mi...@cloudera.com>.
>  Does "tests/custom_cluster/test_alloc_fail.py" run with "run-tests.py"?

The answer to this is no, though I couldn't tell you the historical reasons
for this, if any. For now I have cleared up cwiki and also answered you at
IMPALA-6229.

On Wed, Nov 15, 2017 at 8:10 PM, Jin Chul Kim <ji...@gmail.com> wrote:

> Hi Michael,
>
> As I described in the previous email, I am using debug binary. I could
> reproduce the issue using either impala-py.test or the last option which
> restart impala cluster with enablement of stress_fn_ctx_alloc. My change
> replaces rand_r in C library with std::mt19937 in C++ library, so
> Allocate<std::mt19937>() is introduced. FYI, sizeof(std::mt19937) is 5000
> bytes.
>
> By the way, I still have a curiosity. The command below with "run-tests.py"
> looks fine to me because it shows tests are finished successfully. I guess
> it shows false positive error. If "tests/custom_cluster/test_
> alloc_fail.py"
> cannot run with "run-tests.py", test should be finished with an error due
> to incompatibility case. Would you please clarify two things?
> 1. The false positive error
> 2. Does "tests/custom_cluster/test_alloc_fail.py" run with "run-tests.py"?
>
> Regarding guidance of E2E tests in "How to load and run Impala tests
> <https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+load+and+run+Impala+tests>"
> wiki page, I wish there are more detailed information. Let me put the
> information on the wiki after you answer my question: What is the
> difference between "./test/run-tests.py" and "impala-py.test"?
>
> $ ./tests/run-tests.py tests/custom_cluster/test_alloc_fail.py
> ...
> ============================================================
> ============================================================
> =====================================================
> test session starts
> ============================================================
> ============================================================
> ======================================================
> platform linux2 -- Python 2.7.12, pytest-2.9.2, py-1.4.32, pluggy-0.3.1 --
> /home/jinchulkim/Impala/bin/../infra/python/env/bin/python
> cachedir: .cache
> rootdir: /home/jinchulkim/Impala/tests, inifile: pytest.ini
> plugins: xdist-1.15.0, random-0.2
> collected 2 items
>
> verifiers/test_verify_metrics.py::TestValidateMetrics::test_
> metrics_are_zero
> PASSED
> verifiers/test_verify_metrics.py::TestValidateMetrics::test_
> num_unused_buffers
> PASSED
> ============================================================
> ============================================================
> ===================================================
> 2 passed in 0.11 seconds
> ============================================================
> ============================================================
> ===================================================
>
> Best regards,
> Jinchul
>
> 2017-11-16 11:23 GMT+09:00 Michael Ho <kw...@cloudera.com>:
>
> > That's odd. Please double check you are running debug builds.
> Alternately,
> > you can try:
> >
> > impala-py.test tests/custom_cluster/test_alloc_fail.py
> >
> > Another approach is to try doing what the test is trying to achieve
> > manually:
> >
> > ./bin/start-impala-cluster.py --impalad_args="--stress_fn_ctx_alloc=1"
> > ./bin/impala-shell.sh -q "select rand() from functional.alltypes"
> >
> > My suspicion is that your change modified some states used for rand() and
> > it bloated from 4 bytes to 5000 bytes.
> > The error string needs to be updated as a result. That said, did anyone
> ask
> > about the change in memory usage during code review ?
> >
> > On Wed, Nov 15, 2017 at 6:05 PM, Jin Chul Kim <ji...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to look into the build error:
> > > https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/
> > > (The relevant code change: https://gerrit.cloudera.org/#/c/8355/)
> > >
> > > There was a test failure at "TestAllocFail.test_alloc_fail_init". I
> ran
> > > the
> > > following command but it always passed on my change:
> ./tests/run-tests.py
> > > tests/custom_cluster/test_alloc_fail.py
> > >
> > > I changed the options below on Impala-shell and then retry it again,
> but
> > > the failure did not happen.
> > > exec_option: {'batch_size': 0, 'num_nodes': 0,
> > > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> > > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}
> > >
> > > I am using Ubuntu 16.04 TLS and debug binary. Is there any differences
> > > between Jenkins env. and my env.?
> > >
> > > ] =========================== short test summary info
> > > ============================
> > > ] XFAIL
> > > custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> > > test_alloc_fail_update[exec_option:
> > > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> > > 'disable_codegen': False, 'abort_on_error': 1,
> > > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> > > ]   IMPALA-2925: the execution is not deterministic so some tests
> > sometimes
> > > don't fail as expected
> > > ] FAIL
> > > custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> > > test_alloc_fail_init[exec_option:
> > > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> > > 'disable_codegen': False, 'abort_on_error': 1,
> > > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> > > ] =================================== FAILURES
> > > ===================================
> > > ]  TestAllocFail.test_alloc_fail_init[exec_option: {'batch_size': 0,
> > > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen':
> > > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> > > table_format: text/none]
> > > ] custom_cluster/test_alloc_fail.py:34: in test_alloc_fail_init
> > > ]     self.run_test_case('QueryTest/alloc-fail-init', vector)
> > > ] common/impala_test_suite.py:398: in run_test_case
> > > ]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> > > ] common/impala_test_suite.py:277: in __verify_exceptions
> > > ]     (expected_str, actual_str)
> > > ] E   AssertionError: Unexpected exception string. Expected:
> > > FunctionContext::Allocate() failed to allocate 4 bytes.
> > > ] E   Not found in actual: ImpalaBeeswaxException: Query
> > > aborted:FunctionContext::Allocate() failed to allocate 5000 bytes.
> > >
> > > Best regards,
> > > Jinchul
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael
> >
>

Re: A test failure at Jenkins build is not reproducible on my local

Posted by Jin Chul Kim <ji...@gmail.com>.
Hi Michael,

As I described in the previous email, I am using debug binary. I could
reproduce the issue using either impala-py.test or the last option which
restart impala cluster with enablement of stress_fn_ctx_alloc. My change
replaces rand_r in C library with std::mt19937 in C++ library, so
Allocate<std::mt19937>() is introduced. FYI, sizeof(std::mt19937) is 5000
bytes.

By the way, I still have a curiosity. The command below with "run-tests.py"
looks fine to me because it shows tests are finished successfully. I guess
it shows false positive error. If "tests/custom_cluster/test_alloc_fail.py"
cannot run with "run-tests.py", test should be finished with an error due
to incompatibility case. Would you please clarify two things?
1. The false positive error
2. Does "tests/custom_cluster/test_alloc_fail.py" run with "run-tests.py"?

Regarding guidance of E2E tests in "How to load and run Impala tests
<https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests>"
wiki page, I wish there are more detailed information. Let me put the
information on the wiki after you answer my question: What is the
difference between "./test/run-tests.py" and "impala-py.test"?

$ ./tests/run-tests.py tests/custom_cluster/test_alloc_fail.py
...
=============================================================================================================================================================================
test session starts
==============================================================================================================================================================================
platform linux2 -- Python 2.7.12, pytest-2.9.2, py-1.4.32, pluggy-0.3.1 --
/home/jinchulkim/Impala/bin/../infra/python/env/bin/python
cachedir: .cache
rootdir: /home/jinchulkim/Impala/tests, inifile: pytest.ini
plugins: xdist-1.15.0, random-0.2
collected 2 items

verifiers/test_verify_metrics.py::TestValidateMetrics::test_metrics_are_zero
PASSED
verifiers/test_verify_metrics.py::TestValidateMetrics::test_num_unused_buffers
PASSED
===========================================================================================================================================================================
2 passed in 0.11 seconds
===========================================================================================================================================================================

Best regards,
Jinchul

2017-11-16 11:23 GMT+09:00 Michael Ho <kw...@cloudera.com>:

> That's odd. Please double check you are running debug builds. Alternately,
> you can try:
>
> impala-py.test tests/custom_cluster/test_alloc_fail.py
>
> Another approach is to try doing what the test is trying to achieve
> manually:
>
> ./bin/start-impala-cluster.py --impalad_args="--stress_fn_ctx_alloc=1"
> ./bin/impala-shell.sh -q "select rand() from functional.alltypes"
>
> My suspicion is that your change modified some states used for rand() and
> it bloated from 4 bytes to 5000 bytes.
> The error string needs to be updated as a result. That said, did anyone ask
> about the change in memory usage during code review ?
>
> On Wed, Nov 15, 2017 at 6:05 PM, Jin Chul Kim <ji...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to look into the build error:
> > https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/
> > (The relevant code change: https://gerrit.cloudera.org/#/c/8355/)
> >
> > There was a test failure at "TestAllocFail.test_alloc_fail_init". I ran
> > the
> > following command but it always passed on my change: ./tests/run-tests.py
> > tests/custom_cluster/test_alloc_fail.py
> >
> > I changed the options below on Impala-shell and then retry it again, but
> > the failure did not happen.
> > exec_option: {'batch_size': 0, 'num_nodes': 0,
> > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}
> >
> > I am using Ubuntu 16.04 TLS and debug binary. Is there any differences
> > between Jenkins env. and my env.?
> >
> > ] =========================== short test summary info
> > ============================
> > ] XFAIL
> > custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> > test_alloc_fail_update[exec_option:
> > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> > 'disable_codegen': False, 'abort_on_error': 1,
> > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> > ]   IMPALA-2925: the execution is not deterministic so some tests
> sometimes
> > don't fail as expected
> > ] FAIL
> > custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> > test_alloc_fail_init[exec_option:
> > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> > 'disable_codegen': False, 'abort_on_error': 1,
> > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> > ] =================================== FAILURES
> > ===================================
> > ]  TestAllocFail.test_alloc_fail_init[exec_option: {'batch_size': 0,
> > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> > table_format: text/none]
> > ] custom_cluster/test_alloc_fail.py:34: in test_alloc_fail_init
> > ]     self.run_test_case('QueryTest/alloc-fail-init', vector)
> > ] common/impala_test_suite.py:398: in run_test_case
> > ]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> > ] common/impala_test_suite.py:277: in __verify_exceptions
> > ]     (expected_str, actual_str)
> > ] E   AssertionError: Unexpected exception string. Expected:
> > FunctionContext::Allocate() failed to allocate 4 bytes.
> > ] E   Not found in actual: ImpalaBeeswaxException: Query
> > aborted:FunctionContext::Allocate() failed to allocate 5000 bytes.
> >
> > Best regards,
> > Jinchul
> >
>
>
>
> --
> Thanks,
> Michael
>

Re: A test failure at Jenkins build is not reproducible on my local

Posted by Jim Apple <jb...@cloudera.com>.
I missed this in code review. Michael, will you weigh in on that issue
on https://gerrit.cloudera.org/#/c/8355/?

On Wed, Nov 15, 2017 at 6:23 PM, Michael Ho <kw...@cloudera.com> wrote:
> That's odd. Please double check you are running debug builds. Alternately,
> you can try:
>
> impala-py.test tests/custom_cluster/test_alloc_fail.py
>
> Another approach is to try doing what the test is trying to achieve
> manually:
>
> ./bin/start-impala-cluster.py --impalad_args="--stress_fn_ctx_alloc=1"
> ./bin/impala-shell.sh -q "select rand() from functional.alltypes"
>
> My suspicion is that your change modified some states used for rand() and
> it bloated from 4 bytes to 5000 bytes.
> The error string needs to be updated as a result. That said, did anyone ask
> about the change in memory usage during code review ?
>
> On Wed, Nov 15, 2017 at 6:05 PM, Jin Chul Kim <ji...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to look into the build error:
>> https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/
>> (The relevant code change: https://gerrit.cloudera.org/#/c/8355/)
>>
>> There was a test failure at "TestAllocFail.test_alloc_fail_init". I ran
>> the
>> following command but it always passed on my change: ./tests/run-tests.py
>> tests/custom_cluster/test_alloc_fail.py
>>
>> I changed the options below on Impala-shell and then retry it again, but
>> the failure did not happen.
>> exec_option: {'batch_size': 0, 'num_nodes': 0,
>> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
>> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}
>>
>> I am using Ubuntu 16.04 TLS and debug binary. Is there any differences
>> between Jenkins env. and my env.?
>>
>> ] =========================== short test summary info
>> ============================
>> ] XFAIL
>> custom_cluster/test_alloc_fail.py::TestAllocFail::()::
>> test_alloc_fail_update[exec_option:
>> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
>> 'disable_codegen': False, 'abort_on_error': 1,
>> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
>> ]   IMPALA-2925: the execution is not deterministic so some tests sometimes
>> don't fail as expected
>> ] FAIL
>> custom_cluster/test_alloc_fail.py::TestAllocFail::()::
>> test_alloc_fail_init[exec_option:
>> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
>> 'disable_codegen': False, 'abort_on_error': 1,
>> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
>> ] =================================== FAILURES
>> ===================================
>> ]  TestAllocFail.test_alloc_fail_init[exec_option: {'batch_size': 0,
>> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
>> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
>> table_format: text/none]
>> ] custom_cluster/test_alloc_fail.py:34: in test_alloc_fail_init
>> ]     self.run_test_case('QueryTest/alloc-fail-init', vector)
>> ] common/impala_test_suite.py:398: in run_test_case
>> ]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
>> ] common/impala_test_suite.py:277: in __verify_exceptions
>> ]     (expected_str, actual_str)
>> ] E   AssertionError: Unexpected exception string. Expected:
>> FunctionContext::Allocate() failed to allocate 4 bytes.
>> ] E   Not found in actual: ImpalaBeeswaxException: Query
>> aborted:FunctionContext::Allocate() failed to allocate 5000 bytes.
>>
>> Best regards,
>> Jinchul
>>
>
>
>
> --
> Thanks,
> Michael

Re: A test failure at Jenkins build is not reproducible on my local

Posted by Michael Ho <kw...@cloudera.com>.
That's odd. Please double check you are running debug builds. Alternately,
you can try:

impala-py.test tests/custom_cluster/test_alloc_fail.py

Another approach is to try doing what the test is trying to achieve
manually:

./bin/start-impala-cluster.py --impalad_args="--stress_fn_ctx_alloc=1"
./bin/impala-shell.sh -q "select rand() from functional.alltypes"

My suspicion is that your change modified some states used for rand() and
it bloated from 4 bytes to 5000 bytes.
The error string needs to be updated as a result. That said, did anyone ask
about the change in memory usage during code review ?

On Wed, Nov 15, 2017 at 6:05 PM, Jin Chul Kim <ji...@gmail.com> wrote:

> Hi,
>
> I am trying to look into the build error:
> https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/
> (The relevant code change: https://gerrit.cloudera.org/#/c/8355/)
>
> There was a test failure at "TestAllocFail.test_alloc_fail_init". I ran
> the
> following command but it always passed on my change: ./tests/run-tests.py
> tests/custom_cluster/test_alloc_fail.py
>
> I changed the options below on Impala-shell and then retry it again, but
> the failure did not happen.
> exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}
>
> I am using Ubuntu 16.04 TLS and debug binary. Is there any differences
> between Jenkins env. and my env.?
>
> ] =========================== short test summary info
> ============================
> ] XFAIL
> custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> test_alloc_fail_update[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> ]   IMPALA-2925: the execution is not deterministic so some tests sometimes
> don't fail as expected
> ] FAIL
> custom_cluster/test_alloc_fail.py::TestAllocFail::()::
> test_alloc_fail_init[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> ] =================================== FAILURES
> ===================================
> ]  TestAllocFail.test_alloc_fail_init[exec_option: {'batch_size': 0,
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> table_format: text/none]
> ] custom_cluster/test_alloc_fail.py:34: in test_alloc_fail_init
> ]     self.run_test_case('QueryTest/alloc-fail-init', vector)
> ] common/impala_test_suite.py:398: in run_test_case
> ]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> ] common/impala_test_suite.py:277: in __verify_exceptions
> ]     (expected_str, actual_str)
> ] E   AssertionError: Unexpected exception string. Expected:
> FunctionContext::Allocate() failed to allocate 4 bytes.
> ] E   Not found in actual: ImpalaBeeswaxException: Query
> aborted:FunctionContext::Allocate() failed to allocate 5000 bytes.
>
> Best regards,
> Jinchul
>



-- 
Thanks,
Michael