You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2014/04/17 21:35:16 UTC
[jira] [Assigned] (SAMZA-220) SystemConsumers is slow when
consuming from a large number of partitions
[ https://issues.apache.org/jira/browse/SAMZA-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan reassigned SAMZA-220:
---------------------------------
Assignee: Chris Riccomini
> SystemConsumers is slow when consuming from a large number of partitions
> ------------------------------------------------------------------------
>
> Key: SAMZA-220
> URL: https://issues.apache.org/jira/browse/SAMZA-220
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Assignee: Chris Riccomini
> Attachments: SAMZA-220.0.patch, SAMZA-220.1.patch
>
>
> We have observed poor throughput when a SamzaContainer is consuming many partitions (100s). The more partitions, the worse the performance gets.
> When hooking up VisualVM, two operations take up more than 65% of the CPU in SystemConsumers:
> {code}
> refresh.maybeCall()
> updateMessageChooser
> {code}
> The problem is that we run each of these operations once before every process() call to a StreamTask. Both of these operations iterate over *all* SystemStreamPartitions that the SystemConsumers is consuming from. If you have hundreds of partitions, it means you do two loops of 100+ items for every message you process. This is true even if the SystemConsumers buffer has a lot of messages (10,000+), and also true even if most systemStreamPartitions have no messages available.
> I have two proposed solutions to this problem:
> 1. Only call refresh.maybeCall() when the total number of buffered messages in the SystemConsumers has dropped below some low watermark.
> 2. Only have updateMessageChooser call messageChooser.update for systemStreamPartitions that actually *have* a message.
> I have implemented this and deployed it on a few jobs, and I am seeing significant performance improvement. From 10k-20k msgs/sec to 50k+.
> The trade off, as I see it is really around (1), which will introduce a little latency for topics that are low volume. In such a case, the time from when a message arrives to when it gets refreshed in the buffer, and updated in the chooser increases.
--
This message was sent by Atlassian JIRA
(v6.2#6252)