You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samoa.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/05 09:49:13 UTC
[jira] [Commented] (SAMOA-40) Add Kafka stream reader modules to
consume data from Kafka framework
[ https://issues.apache.org/jira/browse/SAMOA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272131#comment-15272131 ]
ASF GitHub Bot commented on SAMOA-40:
-------------------------------------
Github user nicolas-kourtellis commented on the pull request:
https://github.com/apache/incubator-samoa/pull/32#issuecomment-217114668
Should we merge this? Anything else to be added?
> Add Kafka stream reader modules to consume data from Kafka framework
> --------------------------------------------------------------------
>
> Key: SAMOA-40
> URL: https://issues.apache.org/jira/browse/SAMOA-40
> Project: SAMOA
> Issue Type: Task
> Components: Infrastructure, SAMOA-API
> Environment: OS X Version 10.10.3
> Reporter: Vishal Karande
> Priority: Minor
> Labels: features
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Apache SAMOA is designed to process streaming data and develop streaming machine learning
> algorithm. Currently, SAMOA framework supports stream data read from Arff files only.
> Thus, while using SAMOA as a streaming machine learning component in real time use-cases,
> writing and reading data from files is slow and inefficient.
> A single Kafka broker can handle hundreds of megabytes of reads and writes per second
> from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA will
> not only improve performance but also make SAMOA pluggable to many real time machine
> learning use cases such as Internet of Things(IoT).
> GOAL:
> Add code that enables SAMOA to read data from Apache Kafka as a stream data.
> Kafka stream reader supports following different options for streaming:
> a) Topic selection - Kafka topic to read data
> b) Partition selection - Kafka partition to read data
> c) Batching - Number of data instances read from Kafka in one read request to Kafka
> d) Configuration options - Kafka port number, seed information, time delay between two read requests
> Components:
> KafkaReader - Consists for APIs to read data from Kafka
> KafkaStream - Stream source for SAMOA providing data read from Kafka
> Dependencies for Kafka are added in pom.xml for in samoa-api component.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)