You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "Pramod Immaneni (JIRA)" <ji...@apache.org> on 2016/02/17 18:54:18 UTC
[jira] [Created] (APEXCORE-348) Load based stream partitioning
Pramod Immaneni created APEXCORE-348:
----------------------------------------
Summary: Load based stream partitioning
Key: APEXCORE-348
URL: https://issues.apache.org/jira/browse/APEXCORE-348
Project: Apache Apex Core
Issue Type: Improvement
Reporter: Pramod Immaneni
Assignee: Pramod Immaneni
There are scenarios where the downstream partitions of an upstream operator are generally not performing uniformly resulting in an overall sub-optimal performance dictated by the slowest partitions. The reasons could be data related such as some partitions are receiving more data to process than the others or could be environment related such as some partitions are running slower than others because they are on heavily loaded nodes.
A solution based on currently available functionality in the engine would be to write a StreamCodec implementation to distribute data among the partitions such that each partition is receiving similar amount of data to process. We should consider adding StreamCodecs like these to the library but these however do not solve the problem when it is environment related.
For that a better and more comprehensive approach would be look at how data is being consumed by the downstream partitions from the BufferServer and use that information to make decisions on how to send future data. If some partitions are behind others in consuming data then data can be directed to the other partitions. One way to do this would be to relay this type of statistical and positional information from BufferServer to the upstream publishers. The publishers can use this information in ways such as making it available to StreamCodecs to affect destination of future data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)