You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2021/05/10 07:00:12 UTC
[jira] [Updated] (HUDI-1888) Fix NPE in `RowKeyGenertorHelper#getNestedFieldVal` when row writer is enabled

     [ https://issues.apache.org/jira/browse/HUDI-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-1888:
----------------------------
    Description: 
When row writer is enabled, NullPointerException is thrown when inserting records with partition path in a nested field.

 

To reproduce:
{code:java}
df.write.format("hudi")
  .option(OPERATION_OPT_KEY, "bulk_insert")
  .option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
  .option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
  .option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
  .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
  .option("hoodie.metadata.enable", "true")
  .option("hoodie.datasource.write.row.writer.enable", "true")
  .option("hoodie.bulkinsert.shuffle.parallelism", "2")
  .mode(SaveMode.Overwrite)
  .save(basePath){code}
 

Stacktrace:
{code:java}
Caused by: java.lang.NullPointerException
	at org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
	at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
	at org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
	at org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
	... 22 more
{code}
 

 This happens when the value in the nested field of the partition path is null.  The method above does not handle this properly.

> Fix NPE in `RowKeyGenertorHelper#getNestedFieldVal` when row writer is enabled 
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-1888
>                 URL: https://issues.apache.org/jira/browse/HUDI-1888
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>
> When row writer is enabled, NullPointerException is thrown when inserting records with partition path in a nested field.
>  
> To reproduce:
> {code:java}
> df.write.format("hudi")
>   .option(OPERATION_OPT_KEY, "bulk_insert")
>   .option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
>   .option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
>   .option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
>   .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>   .option("hoodie.metadata.enable", "true")
>   .option("hoodie.datasource.write.row.writer.enable", "true")
>   .option("hoodie.bulkinsert.shuffle.parallelism", "2")
>   .mode(SaveMode.Overwrite)
>   .save(basePath){code}
>  
> Stacktrace:
> {code:java}
> Caused by: java.lang.NullPointerException
> 	at org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
> 	at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
> 	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
> 	at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
> 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> 	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> 	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> 	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> 	at org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
> 	at org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
> 	at org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
> 	... 22 more
> {code}
>  
>  This happens when the value in the nested field of the partition path is null.  The method above does not handle this properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)