You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/23 23:28:10 UTC

[GitHub] [hudi] vinothchandar commented on issue #3161: [SUPPORT] Spark Structured Streaming from Kafka to HoodieStreamingSink missing messages

vinothchandar commented on issue #3161:
URL: https://github.com/apache/hudi/issues/3161#issuecomment-926234522


   @cb149 Couple of things can happen that can lose messages in this setup. Can't tell if any of them are going on. 
   
   - If Kafka itself runs out of retention i.e deletes log segments, may lose some. So if you could understand whether the missing offset range was ever close to the tail of the log, that can explain this. 
   - The streaming writer has a choice of whether it wants to ignore a failed micro batch or not. For whatever reason, say s3 is flaky, if it fails, it moves on default to keep the app running. 
   
   ```
     val STREAMING_IGNORE_FAILED_BATCH: ConfigProperty[String] = ConfigProperty
       .key("hoodie.datasource.write.streaming.ignore.failed.batch")
       .defaultValue("true")
       .withDocumentation("Config to indicate whether to ignore any non exception error (e.g. writestatus error)"
         + " within a streaming microbatch")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org