You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/29 17:23:44 UTC

[GitHub] [spark] gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-526282547
 
 
   By reading the doc without super deep understanding I've found this in the caveats section:
   ```
   If job failed before ResumeTransaction more than 60 seconds, the default value
   ofconfiguration transaction.timeout.ms, data send to Kafka cluster will be discarded
   and lead todata loss.So We set transaction.timeout.ms to 900000, the default
   value of max.transaction.timeout.msin Kafka cluster, to reduce the risk of data loss
   if user not defined
   ```
   The `to reduce the risk of data loss` part disturbs me a bit, is it exactly-once then or not?
   @HeartSaVioR AFAIK you've proposed exactly once SPIP before but there were concerns.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org