You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/30 15:02:49 UTC

[GitHub] [hudi] codope commented on issue #3724: [SUPPORT] Spark start reading stream from hudi dataset starting from given commit time

codope commented on issue #3724:
URL: https://github.com/apache/hudi/issues/3724#issuecomment-931404620


   Generally, for incremental queries we need to set following configs:
   ```
   "hoodie.datasource.query.type" : "incremental",
   "hoodie.datasource.read.begin.instanttime" : "commit_time_to_read_from"
   ```
   Did you try using these configs? You can also take a look at [TestStructuredStreaming](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala#L101) for an example usage.
   
   > What I'm trying to do is to obtain changes that are happening in one hudi dataset to then create incremental pipeline in spark and process them further.
   
   For this, I would also suggest to take a look at HoodieIncrSource and setup a deltastreamer job using that source. For an example, take a look at [TestHoodieDeltaStreamer](https://github.com/apache/hudi/blob/47ed91799943271f219419cf209793a98b3f09b5/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java#L1225).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org