You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2021/05/10 07:00:12 UTC
[jira] [Updated] (HUDI-1888) Fix NPE in
`RowKeyGenertorHelper#getNestedFieldVal` when row writer is enabled
[ https://issues.apache.org/jira/browse/HUDI-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-1888:
----------------------------
Description:
When row writer is enabled, NullPointerException is thrown when inserting records with partition path in a nested field.
To reproduce:
{code:java}
df.write.format("hudi")
.option(OPERATION_OPT_KEY, "bulk_insert")
.option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
.option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
.option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
.option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
.option("hoodie.metadata.enable", "true")
.option("hoodie.datasource.write.row.writer.enable", "true")
.option("hoodie.bulkinsert.shuffle.parallelism", "2")
.mode(SaveMode.Overwrite)
.save(basePath){code}
Stacktrace:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
at org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
at org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
... 22 more
{code}
This happens when the value in the nested field of the partition path is null. The method above does not handle this properly.
> Fix NPE in `RowKeyGenertorHelper#getNestedFieldVal` when row writer is enabled
> -------------------------------------------------------------------------------
>
> Key: HUDI-1888
> URL: https://issues.apache.org/jira/browse/HUDI-1888
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Major
>
> When row writer is enabled, NullPointerException is thrown when inserting records with partition path in a nested field.
>
> To reproduce:
> {code:java}
> df.write.format("hudi")
> .option(OPERATION_OPT_KEY, "bulk_insert")
> .option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
> .option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
> .option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
> .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
> .option("hoodie.metadata.enable", "true")
> .option("hoodie.datasource.write.row.writer.enable", "true")
> .option("hoodie.bulkinsert.shuffle.parallelism", "2")
> .mode(SaveMode.Overwrite)
> .save(basePath){code}
>
> Stacktrace:
> {code:java}
> Caused by: java.lang.NullPointerException
> at org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
> at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
> at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
> at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
> at org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
> at org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
> ... 22 more
> {code}
>
> This happens when the value in the nested field of the partition path is null. The method above does not handle this properly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)