You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/20 04:34:07 UTC
[GitHub] [hudi] KarthickAN opened a new issue #2970: [SUPPORT] Failed to upsert for commit time
KarthickAN opened a new issue #2970:
URL: https://github.com/apache/hudi/issues/2970
Hi,
I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error.
py4j.protocol.Py4JJavaError: An error occurred while calling o318.save.
: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62)
at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45)
at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88)
at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89)
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55)
... 38 more
Below are my hudi config:::
SmallFileSize = 104857600
MaxFileSize = 125829120
RecordSize = 35
CompressionRatio = 5
InsertSplitSize = 3500000
IndexBloomNumEntries = 1500000
KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator
RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp
TableType = COPY_ON_WRITE
PartitionPathFields = date,sourceid
HiveStylePartitioning = True
WriteOperation = upsert
CompressionCodec = snappy
CommitsRetained = 1
CombineBeforeInsert = True
PrecombineField = timestamp
InsertDropDuplicates = False
InsertShuffleParallelism = 100
Environment Description
Hudi version : 0.6.0
Spark version : 2.4.3
Hadoop version : 2.8.5-amzn-1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No. Running on AWS Glue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
deep-teliacompany commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-860487690
Hi, does Hudi 0.8.0 supports concurrency or from which version concurrecy is supported??
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KarthickAN commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
KarthickAN commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-851736504
I made sure there are no other jobs running in parallel and I didn't face this issue. Thank you. We can close this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] deep-teliacompany commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
deep-teliacompany commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-922052140
if want to run jobs in parallel updaing update same directory then can try Hoodie locking mechanism-
https://hudi.apache.org/docs/concurrency_control/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] Natielle edited a comment on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
Natielle edited a comment on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-921882784
I had the same problem and I was sure that no other jobs running in parallel. The root problem was the partition column containing a value ".". To solve, I transformed "." into null value (with python was None/NaN value).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] Natielle commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
Natielle commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-921882784
I had the same problem and I was sure that no other jobs running in parallel. The root problem was the partition column containing a value ".".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] puremachinery commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
puremachinery commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-873490018
I'm getting this issue using hudi 0.8.0 and with no other jobs running in parallel.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-847398667
@KarthickAN Yes, like we discussed over slack, hudi 0.6.0 doesn't allow concurrent writes. To give you an idea of what's happening, Hudi timeline transitions are from `requested` to `inflight` to `completed`. At point in time, this transition can be performed only once. This exception is basically saying the transition has already happened and someone else is trying to do the same transition - this is mostly possible when 2 different jobs are writing to the same table with the same `writeClient` instance.
Can you make sure that only 1 single writer it writing to the table ? If you still get the exception, that would be a bug that needs investigation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nochimow commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-894396347
Same as @puremachinery.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-851719044
@KarthickAN Any updates on this one ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] matthiasdg commented on issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
matthiasdg commented on issue #2970:
URL: https://github.com/apache/hudi/issues/2970#issuecomment-895837764
Should this work with 0.8.0 and jobs in parallel? Here it doesn't
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KarthickAN closed issue #2970: [SUPPORT] Failed to upsert for commit time
Posted by GitBox <gi...@apache.org>.
KarthickAN closed issue #2970:
URL: https://github.com/apache/hudi/issues/2970
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org