You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/20 04:34:07 UTC

[GitHub] [hudi] KarthickAN opened a new issue #2970: [SUPPORT] Failed to upsert for commit time

KarthickAN opened a new issue #2970:
URL: https://github.com/apache/hudi/issues/2970


   Hi,
   I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error.
   
   py4j.protocol.Py4JJavaError: An error occurred while calling o318.save.
   : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253
   	at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62)
   	at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45)
   	at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88)
   	at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
   	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException
   	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384)
   	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139)
   	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89)
   	at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55)
   	... 38 more
   
   Below are my hudi config:::
   
   SmallFileSize = 104857600
   MaxFileSize = 125829120
   RecordSize = 35
   CompressionRatio = 5
   InsertSplitSize = 3500000
   IndexBloomNumEntries = 1500000
   KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator
   RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp
   TableType = COPY_ON_WRITE
   PartitionPathFields = date,sourceid
   HiveStylePartitioning = True
   WriteOperation = upsert
   CompressionCodec = snappy
   CommitsRetained = 1
   CombineBeforeInsert = True
   PrecombineField = timestamp
   InsertDropDuplicates = False
   InsertShuffleParallelism = 100
   
   Environment Description
   
   Hudi version : 0.6.0
   
   Spark version : 2.4.3
   
   Hadoop version : 2.8.5-amzn-1
   
   Storage (HDFS/S3/GCS..) : S3
   
   Running on Docker? (yes/no) : No. Running on AWS Glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
deep-teliacompany commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-860487690


   Hi, does Hudi 0.8.0 supports concurrency or from which version concurrecy is supported??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] KarthickAN commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
KarthickAN commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-851736504


   I made sure there are no other jobs running in parallel and I didn't face this issue. Thank you. We can close this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
deep-teliacompany commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-922052140


   if want to run jobs in parallel updaing  update same directory then can try Hoodie locking mechanism- 
   https://hudi.apache.org/docs/concurrency_control/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Natielle edited a comment on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
Natielle edited a comment on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-921882784


   I had the same problem and I was sure that no other jobs running in parallel. The root problem was the partition column containing a value ".". To solve, I transformed "." into null value (with python was None/NaN value).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Natielle commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
Natielle commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-921882784


   I had the same problem and I was sure that no other jobs running in parallel. The root problem was the partition column containing a value ".".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] puremachinery commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
puremachinery commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-873490018


   I'm getting this issue using hudi 0.8.0 and with no other jobs running in parallel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-847398667


   @KarthickAN Yes, like we discussed over slack, hudi 0.6.0 doesn't allow concurrent writes. To give you an idea of what's happening, Hudi timeline transitions are from `requested` to `inflight` to `completed`. At point in time, this transition can be performed only once. This exception is basically saying the transition has already happened and someone else is trying to do the same transition - this is mostly possible when 2 different jobs are writing to the same table with the same `writeClient` instance. 
   Can you make sure that only 1 single writer it writing to the table ? If you still get the exception, that would be a bug that needs investigation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-894396347


   Same as @puremachinery. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-851719044


   @KarthickAN Any updates on this one ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] matthiasdg commented on issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
matthiasdg commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-895837764


   Should this work with 0.8.0 and jobs in parallel? Here it doesn't


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] KarthickAN closed issue #2970: [SUPPORT] Failed to upsert for commit time

Posted by GitBox <gi...@apache.org>.
KarthickAN closed issue #2970:
URL: https://github.com/apache/hudi/issues/2970


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org