You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/20 12:24:03 UTC

[GitHub] [hudi] poiyyq opened a new issue #1999: What difference from spark and deltaStreamer? which more efficient？

poiyyq opened a new issue #1999:
URL: https://github.com/apache/hudi/issues/1999


   as I know, deltaStreamer is a tool to operate hudi . 
   
   If it's just a tool, I just use spark syntax instant of deltaStreamer. right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] poiyyq closed issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

poiyyq closed issue #1999:
URL: https://github.com/apache/hudi/issues/1999


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1999:
URL: https://github.com/apache/hudi/issues/1999#issuecomment-678038887


   DeltaStreamer gives you ability to continuously ingest data from upstream sources such as kafka/DFS log files and other hoodie tables.  It manages checkpoints as well. Yes, you can also use Spark DataSource for writes where your input batch is a Spark dataset.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] poiyyq commented on issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

poiyyq commented on issue #1999:
URL: https://github.com/apache/hudi/issues/1999#issuecomment-678060098


   I can use spark streaming or flink to consume kafka data , then write hoodie table with Spark DataSource, right? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] poiyyq commented on issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

poiyyq commented on issue #1999:
URL: https://github.com/apache/hudi/issues/1999#issuecomment-678060189


   I can use spark streaming or flink to consume kafka data , then write hoodie table with Spark DataSource, right? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1999:
URL: https://github.com/apache/hudi/issues/1999#issuecomment-678112475


   Flink support is still in the works. Structured streaming support is already available. Please look out for 0.6.0 release sometime next week where we have added support for running async compaction in structured streaming. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] poiyyq removed a comment on issue #1999: What difference from spark and deltaStreamer? which more efficient？

Posted by GitBox <gi...@apache.org>.

poiyyq removed a comment on issue #1999:
URL: https://github.com/apache/hudi/issues/1999#issuecomment-678060189


   I can use spark streaming or flink to consume kafka data , then write hoodie table with Spark DataSource, right? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org