You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by "Pramod Immaneni (JIRA)" <ji...@apache.org> on 2016/02/17 18:54:18 UTC

[jira] [Created] (APEXCORE-348) Load based stream partitioning

Pramod Immaneni created APEXCORE-348:
----------------------------------------

             Summary: Load based stream partitioning
                 Key: APEXCORE-348
                 URL: https://issues.apache.org/jira/browse/APEXCORE-348
             Project: Apache Apex Core
          Issue Type: Improvement
            Reporter: Pramod Immaneni
            Assignee: Pramod Immaneni


There are scenarios where the downstream partitions of an upstream operator are generally not performing uniformly resulting in an overall sub-optimal performance dictated by the slowest partitions. The reasons could be data related such as some partitions are receiving more data to process than the others or could be environment related such as some partitions are running slower than others because they are on heavily loaded nodes.

A solution based on currently available functionality in the engine would be to write a StreamCodec implementation to distribute data among the partitions such that each partition is receiving similar amount of data to process. We should consider adding StreamCodecs like these to the library but these however do not solve the problem when it is environment related.

For that a better and more comprehensive approach would be look at how data is being consumed by the downstream partitions from the BufferServer and use that information to make decisions on how to send future data. If some partitions are behind others in consuming data then data can be directed to the other partitions. One way to do this would be to relay this type of statistical and positional information from BufferServer to the upstream publishers. The publishers can use this information in ways such as making it available to StreamCodecs to affect destination of future data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)