You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/09 15:36:56 UTC

[GitHub] [hudi] tfontana1 opened a new issue, #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

tfontana1 opened a new issue, #5821:
URL: https://github.com/apache/hudi/issues/5821

   I am using using a [Spark on k8 Operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) to submit spark applications to an executor and driver. When trying to write the results of the application back to azure storage I am getting the following error.
   
   Using the following jar versions on the executor and driver
   
       com.microsoft.azure:azure-storage:8.6.6
       org.apache.hadoop:hadoop-azure:3.3.1
       org.apache.hadoop:hadoop-common:3.3.1
       org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Submit a job from an operator to spark executor
   2. Spark executor tries to write files to Hudi
   3.Fails
   
   **Expected behavior**
   
   Hudi should be able to write to files to Azure storage through abfs. 
   
   **Environment Description**
   
   * Hudi version : org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0
   
   * Spark version : Apache Spark GA spark-3.2.1-bin-hadoop3.2.tgz 
   
   * Hive version :
   
   * Hadoop version :  3.3.1? We are using this  'spark-3.2.1-bin-hadoop3.2.tgz '
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   Yes
   
   
   **Stacktrace**
   
   ```Writing to abfs://container@storageaccount.dfs.core.windows.net/folder/host
   22/06/08 20:45:15 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
   Hudi write failed: host
   An error occurred while calling o146.save.
   : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220608204512566
   at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62)
   at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46)
   at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:119)
   at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:103)
   at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:159)
   at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)
   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:275)
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
   at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
   at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
   at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
   at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.base/java.lang.reflect.Method.invoke(Unknown Source)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   at py4j.Gateway.invoke(Gateway.java:282)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:238)
   at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8) (10.42.5.12 executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
   at java.base/java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown Source)
   at java.base/java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(Unknown Source) ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rkkalluri commented on issue #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

Posted by GitBox <gi...@apache.org>.

rkkalluri commented on issue #5821:
URL: https://github.com/apache/hudi/issues/5821#issuecomment-1152995429

   @tfontana1  what version of Java and Scala are you using on your runtime. Can you give us some code and steps to reproduce this error. Can you reproduce this on your local machine first.
   
   It seems to be an issue with version mismatch somewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #5821:
URL: https://github.com/apache/hudi/issues/5821#issuecomment-1152986007

   @rajesh kalluri : 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rkkalluri commented on issue #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

Posted by GitBox <gi...@apache.org>.

rkkalluri commented on issue #5821:
URL: https://github.com/apache/hudi/issues/5821#issuecomment-1152996149

   I would also recommend trying https://hudi.apache.org/releases/release-0.10.1/ instead of 0.10.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tfontana1 commented on issue #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

Posted by GitBox <gi...@apache.org>.

tfontana1 commented on issue #5821:
URL: https://github.com/apache/hudi/issues/5821#issuecomment-1162023459

   Thank you, I will close this. I ended up using Hudi 0.11.0 but the issue was with my operator and drivers having mismatched jar versions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tfontana1 closed issue #5821: [SUPPORT] Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS

Posted by GitBox <gi...@apache.org>.

tfontana1 closed issue #5821: [SUPPORT]  Failed to upsert for commit time - PySpark Unable to Write to Hudi with ABFS
URL: https://github.com/apache/hudi/issues/5821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org