You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/15 21:52:26 UTC

[GitHub] [hudi] cb149 commented on issue #3161: [SUPPORT] Spark Structured Streaming from Kafka to HoodieStreamingSink missing messages

cb149 commented on issue #3161:
URL: https://github.com/apache/hudi/issues/3161#issuecomment-920408212


   @codope `startingOffsets` is set to `earliest`.
   
   I changed my application from scheduled every hour to every two hours to allow for more new messages in the topic and I haven't really seen the issue more than a couple of times since then.
   
   I added a failsafe comparing `numInputRows` and `endOffset - startOffset`, but in the last month or so the only time I was alerted of a missmatch betwen those was when there were 0 new messages for the input topic and one of the offsets was increased by 1 for some reason.
   
   I somehow get the feeling that the issue is unrelated to Hudi and probably has its cause somewhere with Kafka or Spark, maybe somehow with the relatively unbalanced Kafka partitions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org