You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/02/15 23:43:00 UTC

[jira] [Commented] (IMPALA-11113) single_node_perf_run.py throws UnicodeDecodeError for TPCDS dataset

    [ https://issues.apache.org/jira/browse/IMPALA-11113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492911#comment-17492911 ] 

ASF subversion and git services commented on IMPALA-11113:
----------------------------------------------------------

Commit 182617ee87aaf23abe46dd0cc5b82133e4d41803 in impala's branch refs/heads/master from Gergely Fürnstáhl
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=182617e ]

IMPALA-11113 and IMPALA-11114: fixed single_node_perf_run.py for TPCDS

Fixed the UTF-8 UnicodeDecodeError which was thrown while dumping and
loading the json file. Now the script ignores non-decodable characters.

Fixed the ZeroDevisionError coming from t-test when the standard
deviations were 0. "(N/A) Invalid t-test type" is shown for significant
changes and a hint at the end if any invalid t-test was detected.

Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Reviewed-on: http://gerrit.cloudera.org:8080/18215
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Quanlong Huang <hu...@gmail.com>


> single_node_perf_run.py throws UnicodeDecodeError for TPCDS dataset
> -------------------------------------------------------------------
>
>                 Key: IMPALA-11113
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11113
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Gergely Fürnstáhl
>            Assignee: Gergely Fürnstáhl
>            Priority: Minor
>
> Possible fix:
> [https://stackoverflow.com/questions/19872773/unicodedecodeerror-while-using-json-dumps]
> Exception:
> Traceback (most recent call last):
>   File "/home/gfurnstahl/Impala/bin/run-workload.py", line 280, in <module>
>     json.dump(result_map, f, cls=CustomJSONEncoder)
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/__init__.py", line 189, in dump
>     for chunk in iterable:
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 434, in _iterencode
>     for chunk in _iterencode_dict(o, _current_indent_level):
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
>     for chunk in chunks:
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 332, in _iterencode_list
>     for chunk in chunks:
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 443, in _iterencode
>     for chunk in _iterencode(o, _current_indent_level):
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 434, in _iterencode
>     for chunk in _iterencode_dict(o, _current_indent_level):
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
>     for chunk in chunks:
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/encoder.py", line 313, in _iterencode_list
>     yield buf + _encoder(value)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xc9 in position 47: invalid continuation byte
> Traceback (most recent call last):
>   File "./bin/single_node_perf_run.py", line 359, in <module>
>     main()
>   File "./bin/single_node_perf_run.py", line 349, in main
>     perf_ab_test(options, args)
>   File "./bin/single_node_perf_run.py", line 256, in perf_ab_test
>     run_workload(temp_dir, workloads, options)
>   File "./bin/single_node_perf_run.py", line 154, in run_workload
>     configured_call(run_workload)
>   File "./bin/single_node_perf_run.py", line 94, in configured_call
>     return subprocess.check_call(["bash", "-c", cmd])
>   File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py", line 190, in check_call
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['bash', '-c', 'source /home/gfurnstahl/Impala/bin/impala-config.sh && /home/gfurnstahl/Impala/bin/run-workload.py --workloads=tpcds:10 --impalads=localhost:21000 --results_json_file=/home/gfurnstahl/Impala/perf_results/perf_run_l1WHcn/27a1b4c1203fd1fc7929d23659eed0861703e9e1.json --query_iterations=3 --table_formats=parquet/none --plan_first']' returned non-zero exit status 1



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org