You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/03 15:02:04 UTC

[GitHub] [iceberg] davseitsev commented on issue #1286: Slow parallel operations fail to commit

davseitsev commented on issue #1286:
URL: https://github.com/apache/iceberg/issues/1286#issuecomment-668071699


   > You mean the spark streaming job will commit to iceberg table for every 10 seconds ? If sure, then it seems we committed iceberg too frequently and it's easy to cause transaction conflicts.
   
   Because of specific of Spark Structured Streaming in micro batch mode rare triggering (let's say once every 5m) will cause spikes in resource usage. I'd like to keep small batches if it's posible.
   
   > Or just write to the data file without committing the transaction and after some another interval to commit the transaction.
   
   I'm not sure how to implement this with Spark Streaming. I would appriciate if you can point me to the pice of documentation or source code where I can read about it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org