You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Gergely Fürnstáhl (Jira)" <ji...@apache.org> on 2022/08/03 09:43:00 UTC

[jira] [Resolved] (IMPALA-11114) calculate_tval fails with ZeroDevisionError if the standard deviations are 0

     [ https://issues.apache.org/jira/browse/IMPALA-11114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gergely Fürnstáhl resolved IMPALA-11114.
----------------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

> calculate_tval fails with ZeroDevisionError if the standard deviations are 0
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-11114
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11114
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Gergely Fürnstáhl
>            Assignee: Gergely Fürnstáhl
>            Priority: Minor
>             Fix For: Impala 4.1.0
>
>
> Possible cause:
> _Rounding of the data or other forms of truncation could give zero standard deviation when in fact you have some. And if the difference that you are trying to measure is within your measurement error that is a problem not addressed by the t-test._
> [https://stats.stackexchange.com/questions/78570/t-test-with-sample-standard-deviation-of-zero-possible/275879]
> Full log:
> {code:java}
> Traceback (most recent call last):
>   File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 1131, in <module>
>     report = Report(grouped, ref_grouped)
>   File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 494, in __init__
>     self.__analyze()
>   File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 514, in __analyze
>     query_comparison_row = Report.QueryComparisonRow(results, ref_results)
>   File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 370, in __init__
>     self.__check_perf_change_significance(results, ref_results))
>   File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 390, in __check_perf_change_significance
>     ref_stat[AVG], ref_stat[STDDEV], ref_stat[ITERATIONS])
>   File "/home/gfurnstahl/Impala/tests/util/calculation_util.py", line 65, in calculate_tval
>     return (avg - ref_avg) / sem
> ZeroDivisionError: float division by zero
> Traceback (most recent call last):
>   File "bin/single_node_perf_run.py", line 359, in <module>
>     main()
>   File "bin/single_node_perf_run.py", line 349, in main
>     perf_ab_test(options, args)
>   File "bin/single_node_perf_run.py", line 267, in perf_ab_test
>     compare(temp_dir, hash_a, hash_b)
>   File "bin/single_node_perf_run.py", line 175, in compare
>     report_benchmark_results(file_a, file_b, description)
>   File "bin/single_node_perf_run.py", line 166, in report_benchmark_results
>     stdout=f)
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py", line 190, in check_call
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py', '--reference_result_file=/home/gfurnstahl/Impala/perf_results/perf_run_0SdUw7/a87f8c5df9f6fbf8d468921642d7ec3d37c5f4de.json', '--input_result_file=/home/gfurnstahl/Impala/perf_results/perf_run_0SdUw7/b4d04112559c3f04ebf42b36deb1cd537dea78c4.json', '--report_description="a87f8c5df9f6fbf8d468921642d7ec3d37c5f4de vs b4d04112559c3f04ebf42b36deb1cd537dea78c4"']' returned non-zero exit status 1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)