You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Roshan Naik (JIRA)" <ji...@apache.org> on 2017/01/19 21:13:26 UTC
[jira] [Created] (STORM-2308) Support for Non-replayable Sources
Roshan Naik created STORM-2308:
----------------------------------
Summary: Support for Non-replayable Sources
Key: STORM-2308
URL: https://issues.apache.org/jira/browse/STORM-2308
Project: Apache Storm
Issue Type: Sub-task
Components: storm-core
Affects Versions: 2.0.0
Reporter: Roshan Naik
In order to recover from failures without data loss, Storm (and other streaming systems) places the responsibility of buffering events on the source system. In the event of a crash or other failure, in-flight events can be re-fetched from the source and their processing can be retried on recovery. A nice benefit of this approach is that it keeps Storm’s architecture simple.
While it is desirable to avoid the complexities of creating an internal reliable buffering system, it is not necessary to restrict Spouts to accept data only from persistent sources such Kafka, Hdfs or databases. Some amount of data loss is acceptable in many uses cases. Storm already supports such use cases by allowing ACK-ing to be disabled.
Users who can tolerate data loss, benefit from having spouts that can accept data directly from a wider variety of sources such as HTTP, TCP/UDP, Syslog, Flume etc. For such use cases, by not forcing all data to go through a system like Kafka, end-to-end latency improves in addition to simplifying management and reducing cost of the data pipeline. Users who care about not losing data can always funnel the incoming data via Kafka or another persistent store and enable ACKs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)