You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@hudi.apache.org by Max Zheng <mz...@plaid.com.INVALID> on 2021/02/09 17:28:55 UTC

Recovery on concurrent writes to single dataset?

Hi all,

Does anyone know whether recovery is possible if multiple writers
concurrently write to a single Hudi dataset? eg. if two duplicate Spark
applications are started up on accident that index to the same dataset.
Would the dataset be permanently corrupted/difficult to recover? As far as
I can tell this is undefined behavior, but I was wondering if Hudi can/will
rollback successfully if this happens. Currently using 0.6.0.

Thanks!
Max