You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "leesf (Jira)" <ji...@apache.org> on 2019/11/18 12:37:00 UTC

[jira] [Comment Edited] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

    [ https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976500#comment-16976500 ] 

leesf edited comment on HUDI-288 at 11/18/19 12:36 PM:
-------------------------------------------------------

[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer the second solution that we can write a new tool that wraps the current DeltaStreamer, just uses the kafka topic regex to identify all topics that need to be ingested, and just creates one delta streamer each topic within a SINGLE spark application. This solution is easier compared to the first solution.

Few questions. 
If the topics need to be ingested do not in regex pattern, should we also allow users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to _targetBasePath _is one-to-one, should we allow users to specify multi targetBasePath while consuming many topics, I think only one targetBasePath is simpler but does it make sense? and the same question to the config _targetTableName_ in hive.


was (Author: xleesf):
[~vinoth] Sorry for late feedback. After a closer look to code paths, I prefer the second solution that we can write a new tool that wraps the current DeltaStreamer, just uses the kafka topic regex to identify all topics that need to be ingested, and just creates one delta streamer each topic within a SINGLE spark application. This solution is easier compared to the first solution.

Two questions. If the topics need to be ingested do not in regex pattern, should we also allow users to list all topics explicitly? 
Second, in currenty data flow, the relationship of kafka topic to _targetBasePath _is one-to-one, should we allow users to specify multi targetBasePath while consuming many topics? and the same to the config _targetTableName_ in hive.

> Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
> -------------------------------------------------------------------------------------
>
>                 Key: HUDI-288
>                 URL: https://issues.apache.org/jira/browse/HUDI-288
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: Vinoth Chandar
>            Assignee: leesf
>            Priority: Major
>
> https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@<dev.hudi.apache.org> has all the context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)