You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/18 10:35:07 UTC
[GitHub] [incubator-hudi] simonqin opened a new issue #1021: how can i deal
this problem when partition's value changed with the same row_key?
simonqin opened a new issue #1021: how can i deal this problem when partition's value changed with the same row_key?
URL: https://github.com/apache/incubator-hudi/issues/1021
// Create the write client to write some records in
HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder().withPath(tablePath)
.withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2)
.forTable(tableName)
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(IndexType.GLOBAL_BLOOM).build())
.withCompactionConfig(
HoodieCompactionConfig.newBuilder().archiveCommitsWith(11, 12).build()).build();
First,i insert one record into table:
String partitionPath = "2016/03/15";
HoodieKey key = new HoodieKey("1", partitionPath);
HoodieRecord record = new HoodieRecord(key,HoodieTestDataGenerator.generateRandomValue(key, commitTime));
second, upsert one record into table:
String partitionPath = "2016/04/15";
HoodieKey key = new HoodieKey("1", partitionPath);
HoodieRecord record = new HoodieRecord(key,
HoodieTestDataGenerator.generateRandomValue(key, commitTime));
error log:
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants java.util.stream.ReferencePipeline$Head@d02b1c7
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file system view for partition (2016/04/15)
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found in partition (2016/04/15) =0, Time taken =0
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.view.AbstractTableFileSystemView - addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding file-groups for partition :2016/04/15, #FileGroups=0
14738 [Executor task launch worker-0] INFO com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load partition (2016/04/15) =0
14754 [Executor task launch worker-0] ERROR com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType UPDATE for partition :0
java.util.NoSuchElementException: No value present
at com.uber.hoodie.common.util.Option.get(Option.java:112)
at com.uber.hoodie.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:71)
at com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
at com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services