You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/20 17:14:12 UTC

[GitHub] [hudi] vinothchandar commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

vinothchandar commented on issue #1839:
URL: https://github.com/apache/hudi/issues/1839#issuecomment-661204597


   @rubenssoto yes. we already support incremental queries using the spark datasource. It seems like the only thing missing here is that you want the spark structured streaming integration? (which we can add after 0.6.0)
   https://hudi.apache.org/docs/querying_data.html#spark-incr-query
   
   https://www.youtube.com/watch?v=1w3IpavhSWA actually talks about a production use-case we build using an incremental query + some grouping on the sink side. Unlike delta, Hudi actually has record level metadata around arrival times and thus does not need anything like ignoreChanges. 
   
   I am not sure if I am missing something around your use-case, but feels like you should be able to get this working incrementally end-end with what we have today (again, we can add spark streaming read support.. if there are hands to help.. cc @garyli1019? :)) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org