You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/26 08:47:28 UTC

[GitHub] [hudi] aresa7796 commented on issue #3533: [SUPPORT]How to use MOR Table to Merge small file?

aresa7796 commented on issue #3533:
URL: https://github.com/apache/hudi/issues/3533#issuecomment-906216326


     Hi,  @liujinhui1994  I use clustering , got same result.
   
   ```
   df.write
         .format("org.apache.hudi")
         .options(getQuickstartWriteConfigs)
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "timestamp")
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_track_id")
         .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
         .option("hoodie.parquet.small.file.limit", "0")
         .option("hoodie.clustering.inline", "true")
         .option("hoodie.clustering.inline.max.commits", "4")
         .option("hoodie.clustering.plan.strategy.target.file.max.bytes", "1073741824")
         .option("hoodie.clustering.plan.strategy.small.file.limit", "629145600")
         .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.GLOBAL_BLOOM.name())
         .option("hoodie.table.name", "hudi_example")
         .mode(SaveMode.Append)
         .save("/opt/hudi_example")
   ```
   I executed this code 6 times.
   ```
   ├── .hoodie
   │   ├── .20210826162656.deltacommit.crc
   │   ├── .20210826162656.deltacommit.inflight.crc
   │   ├── .20210826162656.deltacommit.requested.crc
   │   ├── .20210826162754.deltacommit.crc
   │   ├── .20210826162754.deltacommit.inflight.crc
   │   ├── .20210826162754.deltacommit.requested.crc
   │   ├── .20210826162830.deltacommit.crc
   │   ├── .20210826162830.deltacommit.inflight.crc
   │   ├── .20210826162830.deltacommit.requested.crc
   │   ├── .20210826162859.deltacommit.crc
   │   ├── .20210826162859.deltacommit.inflight.crc
   │   ├── .20210826162859.deltacommit.requested.crc
   │   ├── .20210826162904.replacecommit.crc
   │   ├── .20210826162904.replacecommit.inflight.crc
   │   ├── .20210826162904.replacecommit.requested.crc
   │   ├── .20210826162935.deltacommit.crc
   │   ├── .20210826162935.deltacommit.inflight.crc
   │   ├── .20210826162935.deltacommit.requested.crc
   │   ├── .aux
   │   │   └── .bootstrap
   │   │       ├── .fileids
   │   │       └── .partitions
   │   ├── .hoodie.properties.crc
   │   ├── .temp
   │   │   └── 20210826162904
   │   │       └── default
   │   │           ├── .bb37db55-ebb9-4a07-977b-5fb0d4340193-0_0-44-48_20210826162904.parquet.marker.CREATE.crc
   │   │           └── bb37db55-ebb9-4a07-977b-5fb0d4340193-0_0-44-48_20210826162904.parquet.marker.CREATE
   │   ├── 20210826162656.deltacommit
   │   ├── 20210826162656.deltacommit.inflight
   │   ├── 20210826162656.deltacommit.requested
   │   ├── 20210826162754.deltacommit
   │   ├── 20210826162754.deltacommit.inflight
   │   ├── 20210826162754.deltacommit.requested
   │   ├── 20210826162830.deltacommit
   │   ├── 20210826162830.deltacommit.inflight
   │   ├── 20210826162830.deltacommit.requested
   │   ├── 20210826162859.deltacommit
   │   ├── 20210826162859.deltacommit.inflight
   │   ├── 20210826162859.deltacommit.requested
   │   ├── 20210826162904.replacecommit
   │   ├── 20210826162904.replacecommit.inflight
   │   ├── 20210826162904.replacecommit.requested
   │   ├── 20210826162935.deltacommit
   │   ├── 20210826162935.deltacommit.inflight
   │   ├── 20210826162935.deltacommit.requested
   │   ├── archived
   │   └── hoodie.properties
   └── default
       ├── ..hoodie_partition_metadata.crc
       ├── .3e5a8289-01bc-4769-9c6f-f2ae6c355420-0_0-30-36_20210826162859.parquet.crc
       ├── .5f4d5381-ea37-4eb1-b8c7-2717facd0a50-0_0-30-33_20210826162754.parquet.crc
       ├── .bb37db55-ebb9-4a07-977b-5fb0d4340193-0_0-44-48_20210826162904.parquet.crc
       ├── .c1e69535-55b2-4a3a-ad58-e9be0a999304-0_0-29-29_20210826162656.parquet.crc
       ├── .cd2bf248-92bc-426c-a568-245ee89a0a17-0_0-30-34_20210826162830.parquet.crc
       ├── .d8942dd4-9097-4bf4-b2ea-59eaab48af77-0_0-30-33_20210826162935.parquet.crc
       ├── .hoodie_partition_metadata
       ├── 3e5a8289-01bc-4769-9c6f-f2ae6c355420-0_0-30-36_20210826162859.parquet
       ├── 5f4d5381-ea37-4eb1-b8c7-2717facd0a50-0_0-30-33_20210826162754.parquet
       ├── bb37db55-ebb9-4a07-977b-5fb0d4340193-0_0-44-48_20210826162904.parquet
       ├── c1e69535-55b2-4a3a-ad58-e9be0a999304-0_0-29-29_20210826162656.parquet
       ├── cd2bf248-92bc-426c-a568-245ee89a0a17-0_0-30-34_20210826162830.parquet
       └── d8942dd4-9097-4bf4-b2ea-59eaab48af77-0_0-30-33_20210826162935.parquet
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org