You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2019/06/30 02:57:00 UTC

[jira] [Assigned] (KYLIN-3679) Fetch Kafka topic with Spark streaming

     [ https://issues.apache.org/jira/browse/KYLIN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shaofeng SHI reassigned KYLIN-3679:
-----------------------------------

    Assignee: weibin0516

Awesome! [~codingforfun] please go ahead, pull request to Kylin github is welcomed

> Fetch Kafka topic with Spark streaming
> --------------------------------------
>
>                 Key: KYLIN-3679
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3679
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Spark Engine
>            Reporter: Shaofeng SHI
>            Assignee: weibin0516
>            Priority: Major
>
> Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist to HDFS for subsequent processing. If user selects to use Spark engine, we can use Spark streaming API to do this. Spark streaming can read the Kafka message in a given offset range as a RDD, then it would be easy to process;
> https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 
> With Spark streaming, Kylin can also easily connect with other data source like Kinesis, Flume, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)