You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Rahil Chertara (Jira)" <ji...@apache.org> on 2022/06/13 21:07:00 UTC

[jira] [Assigned] (HUDI-4240) Revisit TestCOWDataSourceStorage#testCopyOnWriteStorage

     [ https://issues.apache.org/jira/browse/HUDI-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rahil Chertara reassigned HUDI-4240:
------------------------------------

    Assignee: Rahil Chertara

> Revisit TestCOWDataSourceStorage#testCopyOnWriteStorage
> -------------------------------------------------------
>
>                 Key: HUDI-4240
>                 URL: https://issues.apache.org/jira/browse/HUDI-4240
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Rahil Chertara
>            Assignee: Rahil Chertara
>            Priority: Major
>
> Within the pr (Support Hadoop 3.x Hive 3.x and Spark 3.x) [https://github.com/apache/hudi/pull/5786,|https://github.com/apache/hudi/pull/5786]The testCopyOnWriteStorage has an issue with the test case where `nation` is added to the recordKeys. When debugging further it seems that this is due to an issue with avro 1.10.2 being used since it adds the following to the schema in HoodieTestDataGenerator line 319
>  
>  
> ```
> “nation”:“Canada”x
> ```
> instead of adding
> ```
> “nation”:
> { “bytes”:“Canada” }
> ```
> This leads to the exception later for this test case since when nation is being retrieved from the record, since `getNestedFieldVal` expects the value to be nested as opposed to a String.
> ``` at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:514)
>  at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:487)
>  at org.apache.hudi.keygen.KeyGenUtils.getRecordKey(KeyGenUtils.java:96)
>  at org.apache.hudi.keygen.ComplexAvroKeyGenerator.getRecordKey(ComplexAvroKeyGenerator.java:47)
>  at org.apache.hudi.keygen.ComplexKeyGenerator.getRecordKey(ComplexKeyGenerator.java:53)
>  at org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:65)
>  at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$10(HoodieSparkSqlWriter.scala:279)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
>  at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:224)
>  at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
>  at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
>  at [org.apache.spark.storage.BlockManager.org|http://org.apache.spark.storage.blockmanager.org/]$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
>  at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
>  at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>  at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:131)
>  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750)
> ```
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)