You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/25 23:53:57 UTC
[GitHub] [hudi] shenh062326 edited a comment on pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key
shenh062326 edited a comment on pull request #1868:
URL: https://github.com/apache/hudi/pull/1868#issuecomment-663917822
Add a performance test, which insert 100000 records, 1000 fileGroups, each fileGroup's weight is 0.001.
```
public void partitionWeightPerformance() throws Exception {
final String testPartitionPath = "2016/09/26";
int totalInsertNum = 100000;
HoodieWriteConfig config = makeHoodieClientConfigBuilder()
.withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(0)
.insertSplitSize(100).autoTuneInsertSplits(false).build()).build();
HoodieClientTestUtils.fakeCommit(basePath, "001");
metaClient = HoodieTableMetaClient.reload(metaClient);
HoodieCopyOnWriteTable table = (HoodieCopyOnWriteTable) HoodieTable.create(metaClient, config, hadoopConf);
HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator(new String[]{testPartitionPath});
List<HoodieRecord> insertRecords = dataGenerator.generateInserts("001", totalInsertNum);
WorkloadProfile profile = new WorkloadProfile(jsc.parallelize(insertRecords));
UpsertPartitioner partitioner = new UpsertPartitioner(profile, jsc, table, config);
for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();
Map<Integer, Integer> partition2numRecords = new HashMap<Integer, Integer>();
for (HoodieRecord hoodieRecord : insertRecords) {
int partition = partitioner.getPartition(new Tuple2<>(
hoodieRecord.getKey(), Option.ofNullable(hoodieRecord.getCurrentLocation())));
if (!partition2numRecords.containsKey(partition)) {
partition2numRecords.put(partition, 0);
}
partition2numRecords.put(partition, partition2numRecords.get(partition) + 1);
}
System.out.println("cost: " + (System.currentTimeMillis() - start));
}
}
```
Test it ten times, the result before the optimization:
```
cost: 190
cost: 122
cost: 150
cost: 100
cost: 104
cost: 114
cost: 104
cost: 110
cost: 104
cost: 117
```
The result after the optimization:
```
cost: 154
cost: 83
cost: 77
cost: 84
cost: 85
cost: 84
cost: 87
cost: 99
cost: 102
cost: 85
```
It can significantly optimize performance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org