You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/24 18:12:05 UTC
[jira] [Commented] (STORM-855) Add tuple batching

    [ https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906568#comment-14906568 ] 

ASF GitHub Bot commented on STORM-855:
--------------------------------------

Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/694#issuecomment-142975644
  
    @mjsax did you get a chance to look at the ack tuples issue. This is going to be great perf improvement would like to see this merged in.


> Add tuple batching
> ------------------
>
>                 Key: STORM-855
>                 URL: https://issues.apache.org/jira/browse/STORM-855
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: Matthias J. Sax
>            Assignee: Matthias J. Sax
>            Priority: Minor
>
> In order to increase Storm's throughput, multiple tuples can be grouped together in a batch of tuples (ie, fat-tuple) and transfered from producer to consumer at once.
> The initial idea is taken from https://github.com/mjsax/aeolus. However, we aim to integrate this feature deep into the system (in contrast to building it on top), what has multiple advantages:
>   - batching can be even more transparent to the user (eg, no extra direct-streams needed to mimic Storm's data distribution patterns)
>   - fault-tolerance (anchoring/acking) can be done on a tuple granularity (not on a batch granularity, what leads to much more replayed tuples -- and result duplicates -- in case of failure)
> The aim is to extend TopologyBuilder interface with an additional parameter 'batch_size' to expose this feature to the user. Per default, batching will be disabled.
> This batching feature has pure tuple transport purpose, ie, tuple-by-tuple processing semantics are preserved. An output batch is assembled at the producer and completely disassembled at the consumer. The consumer output can be batched again, however, independent of batched or non-batched input. Thus, batches can be of different size for each producer-consumer pair. Furthermore, consumers can receive batches of different size from different producers (including regular non batched input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)