You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/09/26 04:49:04 UTC

[jira] [Updated] (STORM-1028) Eventhub spout meta data

     [ https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-1028:
--------------------------------
    Component/s: storm-kafka
                 Storm-eventhubs

> Eventhub spout meta data
> ------------------------
>
>                 Key: STORM-1028
>                 URL: https://issues.apache.org/jira/browse/STORM-1028
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: Storm-eventhubs, storm-kafka
>            Reporter: Mads Mætzke Tandrup
>
> Event hub (and Kafka) play well into event source architectures as event ingest point for later Storm processing to downstream stateful consumers.
> Advanced event stream processing, such as replaying parts of a stream, requires that the downstream consumers can synchronise different "stream runs" to their stateful view, which itself can be seen as an aggregation of all previous events. To set up the right context for re-processing the stream in a deterministic way, they need to sync their view with the incoming old data. To be able to do this, they need knowledge of the event sequenceNumber and partition.
> For example, if you have a bolt that calculates total_order_amount for a stream of orders, and emits order tuples with the total_order_amount calculated for all previous orders, replaying an order event should not change total_order_amount. I.e. orders with a higher sequenceNumber than the order being processed should not be included in total_order_amount.
> This synchronisation can be achieved if the bolt has access to the parition and sequenceNumber from eventHub.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)