You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/25 23:50:53 UTC

[GitHub] [hudi] shenh062326 commented on pull request #1868: [HUDI-1083] Minor optimization in determining insert bucket location for a given key

shenh062326 commented on pull request #1868:
URL: https://github.com/apache/hudi/pull/1868#issuecomment-663917822


   Add a performance test, which insert 100000 records, 1000 fileGroups, each fileGroup's weight is 0.001.
   ```
     public void partitionWeightPerformance() throws Exception {
       final String testPartitionPath = "2016/09/26";
       int totalInsertNum = 100000;
   
       HoodieWriteConfig config = makeHoodieClientConfigBuilder()
           .withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(0)
               .insertSplitSize(100).autoTuneInsertSplits(false).build()).build();
   
       HoodieClientTestUtils.fakeCommit(basePath, "001");
       metaClient = HoodieTableMetaClient.reload(metaClient);
       HoodieCopyOnWriteTable table = (HoodieCopyOnWriteTable) HoodieTable.create(metaClient, config, hadoopConf);
       HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator(new String[]{testPartitionPath});
       List<HoodieRecord> insertRecords = dataGenerator.generateInserts("001", totalInsertNum);
       WorkloadProfile profile = new WorkloadProfile(jsc.parallelize(insertRecords));
       UpsertPartitioner partitioner = new UpsertPartitioner(profile, jsc, table, config);
   
       for (int i = 0; i < 10; i++) {
         long start = System.currentTimeMillis();
         Map<Integer, Integer> partition2numRecords = new HashMap<Integer, Integer>();
         for (HoodieRecord hoodieRecord : insertRecords) {
           int partition = partitioner.getPartition(new Tuple2<>(
               hoodieRecord.getKey(), Option.ofNullable(hoodieRecord.getCurrentLocation())));
           if (!partition2numRecords.containsKey(partition)) {
             partition2numRecords.put(partition, 0);
           }
           partition2numRecords.put(partition, partition2numRecords.get(partition) + 1);
         }
   
         System.out.println("cost: " + (System.currentTimeMillis() - start));
       }
     }
   ```
   Test it ten times, the result before the optimization:
   ```
     cost: 190
     cost: 122
     cost: 150
     cost: 100
     cost: 104
     cost: 114
     cost: 104
     cost: 110
     cost: 104
     cost: 117
   ```
   
   The result after the optimization:
   ```
     cost: 154
     cost: 83
     cost: 77
     cost: 84
     cost: 85
     cost: 84
     cost: 87
     cost: 99
     cost: 102
     cost: 85
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org