You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/03/27 17:23:00 UTC

[jira] [Commented] (IMPALA-8005) Randomize partitioning exchanges destinations

    [ https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068920#comment-17068920 ] 

ASF subversion and git services commented on IMPALA-8005:
---------------------------------------------------------

Commit df6196e064bc7453bee8c7e644bb591391ee3ce2 in impala's branch refs/heads/master from Anurag Mantripragada
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df6196e ]

IMPALA-8005: Randomize partitioning exchanges.

Currently, we use the same hash seed for partitioning exchanges at
the sender. For a table with skew in distribution in the shuffling
keys, multiple queries using the same shuffling keys for exchanges
will end up hashing to the same destination fragments running on
a particular host and potentially overloading that host.

This patch seeds the hash with query id. This will ensure that
the partitioning exchanges do not always hash to the
same destination with same shuffling keys.

Testing:
Added a test to data-stream-test to verify the data values at
destination are different for different queries.

Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468
Reviewed-on: http://gerrit.cloudera.org:8080/15497
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Randomize partitioning exchanges destinations
> ---------------------------------------------
>
>                 Key: IMPALA-8005
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8005
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Michael Ho
>            Assignee: Anurag Mantripragada
>            Priority: Major
>              Labels: ramp-up
>
> Currently, we use the same hash seed for partitioning exchanges at the sender. For a table with skew in distribution in the shuffling keys, multiple queries using the same shuffling keys for exchanges will end up hashing to the same destination fragments running on particular host and potentially overloading that host.
> We should consider using the query id or other query specific information to seed the hashing function to randomize the destinations for different queries. Thanks to [~tlipcon] for pointing this problem out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org