You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Lakshmi Manasa Gaduputi (Jira)" <ji...@apache.org> on 2022/03/11 03:42:00 UTC

[jira] [Created] (SAMZA-2728) [Elasticity] improve distribution of messages across elastic tasks

Lakshmi Manasa Gaduputi created SAMZA-2728:
----------------------------------------------

             Summary: [Elasticity] improve distribution of messages across elastic tasks
                 Key: SAMZA-2728
                 URL: https://issues.apache.org/jira/browse/SAMZA-2728
             Project: Samza
          Issue Type: Improvement
            Reporter: Lakshmi Manasa Gaduputi
            Assignee: Lakshmi Manasa Gaduputi


Symptom: When elasticity is enabled, for certain kind of input streams, some of the containers are not processing anything when container count = elastic task count = elasticity factor X original task count.

Cause: The input stream where this was observed had its message keys such that key.hashcode()%elasticiyFactor was always even for some partitions and odd for other partitions. This lead to some of the elastic tasks no getting any messages. This is not a bug in the elasticity code but rather a skew in the input stream’s key distribution.

can be fixed via key bucket computation aka key.hashCode()%elasticityFactor is modified to (key.hashCode%31)%elasicityFactor and max value for elasticity factor is limited to 16 to be able to use 31 safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)