You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/18 10:35:07 UTC
[GitHub] [incubator-hudi] simonqin opened a new issue #1021: how can i deal this problem when partition's value changed with the same row_key?

simonqin opened a new issue #1021: how can i deal this problem when partition's value changed with the same row_key? 
URL: https://github.com/apache/incubator-hudi/issues/1021
 
 
   // Create the write client to write some records in
       HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder().withPath(tablePath)
               .withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2)
               .forTable(tableName)
               .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(IndexType.GLOBAL_BLOOM).build())
               .withCompactionConfig(
                       HoodieCompactionConfig.newBuilder().archiveCommitsWith(11, 12).build()).build();
   
   First,i insert one record into table:
       String partitionPath = "2016/03/15";
       HoodieKey key = new HoodieKey("1", partitionPath);
       HoodieRecord record = new HoodieRecord(key,HoodieTestDataGenerator.generateRandomValue(key, commitTime));
   second, upsert one record into table:
       String partitionPath = "2016/04/15";
       HoodieKey key = new HoodieKey("1", partitionPath);
       HoodieRecord record = new HoodieRecord(key, 
       HoodieTestDataGenerator.generateRandomValue(key, commitTime));
   
   error log:
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.timeline.HoodieActiveTimeline  - Loaded instants java.util.stream.ReferencePipeline$Head@d02b1c7
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.view.AbstractTableFileSystemView  - Building file system view for partition (2016/04/15)
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.view.AbstractTableFileSystemView  - #files found in partition (2016/04/15) =0, Time taken =0
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.view.AbstractTableFileSystemView  - addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.view.HoodieTableFileSystemView  - Adding file-groups for partition :2016/04/15, #FileGroups=0
   14738 [Executor task launch worker-0] INFO  com.uber.hoodie.common.table.view.AbstractTableFileSystemView  - Time to load partition (2016/04/15) =0
   14754 [Executor task launch worker-0] ERROR com.uber.hoodie.table.HoodieCopyOnWriteTable  - Error upserting bucketType UPDATE for partition :0
   java.util.NoSuchElementException: No value present
   	at com.uber.hoodie.common.util.Option.get(Option.java:112)
   	at com.uber.hoodie.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:71)
   	at com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
   	at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
   	at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
   	at com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
   	at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   	at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
   	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
   	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
   	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
   	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
   	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   	at org.apache.spark.scheduler.Task.run(Task.scala:99)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services