You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/07/17 01:09:05 UTC

[jira] [Updated] (SAMZA-245) Improve SystemConsumers performance

     [ https://issues.apache.org/jira/browse/SAMZA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Riccomini updated SAMZA-245:
----------------------------------

    Attachment: SAMZA-245-1.patch

Attaching re-based patch. All tests pass. RB is available at:

https://reviews.apache.org/r/23588/

# Rebased original patch.
# Fixed tests to compile and pass again.
# Found a bug in original patch, where dropping deserialization messages could lead to never consuming from the stream again, so fixed that.

One thing to discuss here is the refreshThreshold. We need a way to trigger polling, so I introduced this, which is a global lower bound that defines when we'll start polling systems for more messages. The problem is that this concept conflicts with the TieredPriorityChooser (see SAMZA-342), where we might want to consume real time messages immediately, even while processing batch messages. In such a case, with this patch, the batch messages would fill the buffer, and cause the real time streams not to be polled until all of the batch messages are processed. One work around for this would be to raise refreshThreshold to a very high number, so that you're always polling, but perhaps there's a better solution? Maybe stream-specific polling thresholds, or something.

> Improve SystemConsumers performance
> -----------------------------------
>
>                 Key: SAMZA-245
>                 URL: https://issues.apache.org/jira/browse/SAMZA-245
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>             Fix For: 0.8.0
>
>         Attachments: SAMZA-245-1.patch, SAMZA-245.0.patch
>
>
> As part of SAMZA-220, a more radical patch was proposed. This patch appears to improve SystemConsumers' performance pretty significantly, while also reducing its complexity. The decision was made to move this change into the 0.8.0 release, rather than the 0.7.0 release, since it's a fairly risky change.
> This ticket is to explore updating SystemConsumers to eliminate almost all loops in order to increase performance in the Samza container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)