You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/04 01:13:28 UTC

[GitHub] [hudi] masterlemmi opened a new issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

masterlemmi opened a new issue #1791:
URL: https://github.com/apache/hudi/issues/1791


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? Yes
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   1. I need to listen to multiple kafka topics and save messages to corresponding tables. Does DeltaStreamer allow that? and does the processing of each stream/topic run in parallel?
   
   2. I am also exploring Spark Streaming and using Spark DataSource.  basically something like this
   `DSTREAM.map(x =>(x.topic, List(x.value())))
         .reduceByKey(_:::_)
         .map(processAndSavetoHudi)
        .print()
   `
   
   Is it possible to run hudi upserts from the executor tasks (i.e. from the Dstream.map function) ? The foreachrdd function doesn't process streams in parallel, so I am trying to use the map function and save each stream to hudi from the workers.
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 5.2
   
   * Spark version : 2.4.5.
   
   * Hive version : 2.3.3
   
   * Hadoop version : 2.8
   
   * Storage (HDFS/S3/GCS..) : no
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-655207777


   @masterlemmi while we wait for @pratyakshsharma , you can just build hudi off master and simply run this class 
   https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L206
   
   using spark-submit (instead of the other class for DeltaStreamer) .. hopefully, the cli is similar to deltastreamer/self-explanatory..
   
   @pratyakshsharma that reminds me.. did we file a task to get this documented. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-653754592


   Ack. @masterlemmi have you checked the code for HoodieMultiTableDeltaStreamer in master branch? Does that suffice your use case? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1791:
URL: https://github.com/apache/hudi/issues/1791


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-668660685


   closing this ticket as we have jira to track


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] masterlemmi commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
masterlemmi commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-653780147


   @pratyakshsharma i haven't yet actually. I have only been experiementing with the Spark Data Source and I saw the DeltaStreamer documented which i thought might be useful for my usecase. Do you have a sample config/code that I can try out to help me get started with multiple topics/tables. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-656616321


   @vinothchandar Yes we do have tasks already filed - https://issues.apache.org/jira/browse/HUDI-766, https://issues.apache.org/jira/browse/HUDI-769. Resuming the work on these from today. :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-653706540


   Support for this has landed onto master.. 
   @pratyakshsharma can you chime in here and possibly work closely with @masterlemmi and get it hardened more before the 0.6.0 release?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] masterlemmi commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

Posted by GitBox <gi...@apache.org>.
masterlemmi commented on issue #1791:
URL: https://github.com/apache/hudi/issues/1791#issuecomment-655226661


   sure. will do. thanks @vinothchandar 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org