You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/09/07 18:57:00 UTC

[jira] [Commented] (IMPALA-5802) COMPUTE STATS uses MT_DOP=4 by default

    [ https://issues.apache.org/jira/browse/IMPALA-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924961#comment-16924961 ] 

ASF subversion and git services commented on IMPALA-5802:
---------------------------------------------------------

Commit eb680e4633256562f1a1a33feec5005e8fc60e2f in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=eb680e4 ]

IMPALA-5802: use mt scan node for all formats

This remove special-casing of the sequence-based
file formats (Avro, RC, Seq) where they used the
legacy scan node instead of the multi-threaded
scan node when mt_dop was enabled.

There was no particular reason to do this: the
code path are all already in use and should be more
resource-efficient with multithreading enabled.

Testing:
Updated planner tests to reflect that MT scan
is used. Removed PARALLELPLANS for the Hive
3 Avro test because it does not provide
important coverage and required updating.

Performance:
Some targeted benchmarks showed no difference in
performance.

Query:
set mt_dop=4; select min(l_orderkey), min(l_comment) from lineitem;

tpch_avro Before: 0.51 0.41 0.51 0.41 0.51
tpch_avro After: 0.41 0.41 0.41 0.41 0.41
tpch_rc Before: 0.31 0.31 0.31 0.31 0.31
tpch_rc After: 0.31 0.31 0.31 0.31 0.31
tpch_seq_gzip Before: 2.32 2.22 2.22 2.22 2.32
tpch_seq_gzip After: 2.22 2.22 2.22 2.32 2.32

Query:
unset mt_dop; compute stats lineitem;

tpch_avro Before: 1.21 1.21 1.21 1.21 1.21
tpch_avro After: 1.21 1.31 1.21 1.31 1.21
tpch_rc Before: 1.31 1.41 1.31 1.31 1.31
tpch_rc After: 1.31 1.41 1.31 1.31 1.31
tpch_seq_gzip Before: 2.82 2.72 2.71 2.92 2.71
tpch_seq_gzip After: 2.82 2.82 2.81 2.71 2.92

Change-Id: I8a91d2e5c2ebb617b7643cd676cb3490c190a68a
Reviewed-on: http://gerrit.cloudera.org:8080/14171
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> COMPUTE STATS uses MT_DOP=4 by default
> --------------------------------------
>
>                 Key: IMPALA-5802
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5802
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend, Frontend
>    Affects Versions: Impala 2.9.0
>            Reporter: Alexander Behm
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: compute-stats
>
> Now that IMPALA-3905 has been completely addressed we should run COMPUTE STATS with MT_DOP=4 by default, regardless of file format. The motivation is consistency and speeding up COMPUTE STATS in most cases.
> This task is a continuation of IMPALA-4572.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org