You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Pratyaksh Sharma <pr...@gmail.com> on 2020/02/23 11:35:30 UTC

Multiple clean instants with same timestamp

Hi,

I recently came across a strange issue for table T. For the same timestamp,
2 clean instants were present in .hoodie folder, one of them in completed
state and other one in inflight state. As a result, if I try to run cleaner
or DeltaStreamer for this table T, it was failing with the below exception
-

20/02/23 09:44:25 INFO HoodieCleanClient: There were previously unfinished
cleaner operations. Finishing Instant=[==>20200210174836__clean__INFLIGHT]
20/02/23 09:44:25 INFO HoodieCleanClient: hoodie clean instant in
execution: [20200210174836__clean__COMPLETED], with state: COMPLETED
20/02/23 09:44:25 INFO HoodieCleanClient: clean instant is inflight: false
20/02/23 09:44:25 ERROR ApplicationMaster: User class threw exception:
java.lang.IllegalArgumentException
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.hudi.HoodieCleanClient.runClean(HoodieCleanClient.java:145)
at
org.apache.hudi.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:85)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

Has anyone else faced a similar situation? What is the workaround to fix
this apart from manually deleting the file itself from S3 folder.

Attached screenshot shows the concerned instants. Also code is attached
with custom logs printed.