You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/03/23 19:13:00 UTC

[jira] [Assigned] (HUDI-3642) NullPointerException during multi-writer conflict resolution

     [ https://issues.apache.org/jira/browse/HUDI-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo reassigned HUDI-3642:
-------------------------------

    Assignee: Sagar Sumit

> NullPointerException during multi-writer conflict resolution
> ------------------------------------------------------------
>
>                 Key: HUDI-3642
>                 URL: https://issues.apache.org/jira/browse/HUDI-3642
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Sagar Sumit
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> Scenario: multi-writer test, one writer doing ingesting with Deltastreamer continuous mode, COW, inserts, async clustering and cleaning (partitions under 2022/1, 2022/2), another writer with Spark datasource doing backfills to different partitions (2021/12).  
> 0.10.0 no MT, clustering instant is inflight (failing it in the middle before upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before.
> For 0.10.0, we hit this NPE from backfill job.  Need to see if this can happen for latest master.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hudi.client.transaction.ConcurrentOperation.init(ConcurrentOperation.java:121)
>   at org.apache.hudi.client.transaction.ConcurrentOperation.<init>(ConcurrentOperation.java:61)
>   at org.apache.hudi.client.utils.TransactionUtils.lambda$resolveWriteConflictIfAny$0(TransactionUtils.java:69)
>   at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
>   at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
>   at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
>   at org.apache.hudi.client.utils.TransactionUtils.resolveWriteConflictIfAny(TransactionUtils.java:67)
>   at org.apache.hudi.client.SparkRDDWriteClient.preCommit(SparkRDDWriteClient.java:501)
>   at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:195)
>   at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
>   at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:633)
>   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:284)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
>   at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
>   at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
>   at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
>   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
>   at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
>   at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
>   at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
>   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
>   at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
>   at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
>   at $anonfun$res0$1(backfill_before.scala:57)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>   ... 60 elided {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)