You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/05/15 08:08:01 UTC

[GitHub] [incubator-hudi] rolandjohann commented on issue #1625: [SUPPORT] MOR upsert table grows in size when ingesting same records

rolandjohann commented on issue #1625:
URL: https://github.com/apache/incubator-hudi/issues/1625#issuecomment-629095378


   @bvaradar 
   After 15 runs the filesystem looks like this:
   ```bash
   $ tree -a /tmp/visitors_hudi_mor/                                                                               
   /tmp/visitors_hudi_mor/
   ├── .hoodie
   │   ├── .20200514221320.clean.crc
   │   ├── .20200514221320.clean.inflight.crc
   │   ├── .20200514221320.clean.requested.crc
   │   ├── .20200514221407.clean.crc
   │   ├── .20200514221407.clean.inflight.crc
   │   ├── .20200514221407.clean.requested.crc
   │   ├── .20200514221449.clean.crc
   │   ├── .20200514221449.clean.inflight.crc
   │   ├── .20200514221449.clean.requested.crc
   │   ├── .20200514221539.clean.crc
   │   ├── .20200514221539.clean.inflight.crc
   │   ├── .20200514221539.clean.requested.crc
   │   ├── .20200514221623.clean.crc
   │   ├── .20200514221623.clean.inflight.crc
   │   ├── .20200514221623.clean.requested.crc
   │   ├── .20200514221623.deltacommit.crc
   │   ├── .20200514221623.deltacommit.inflight.crc
   │   ├── .20200514221623.deltacommit.requested.crc
   │   ├── .20200514221648.commit.crc
   │   ├── .20200514221648.compaction.inflight.crc
   │   ├── .20200514221648.compaction.requested.crc
   │   ├── .20200514221714.clean.crc
   │   ├── .20200514221714.clean.inflight.crc
   │   ├── .20200514221714.clean.requested.crc
   │   ├── .20200514221714.deltacommit.crc
   │   ├── .20200514221714.deltacommit.inflight.crc
   │   ├── .20200514221714.deltacommit.requested.crc
   │   ├── .20200514221759.clean.crc
   │   ├── .20200514221759.clean.inflight.crc
   │   ├── .20200514221759.clean.requested.crc
   │   ├── .20200514221759.deltacommit.crc
   │   ├── .20200514221759.deltacommit.inflight.crc
   │   ├── .20200514221759.deltacommit.requested.crc
   │   ├── .20200514221829.commit.crc
   │   ├── .20200514221829.compaction.inflight.crc
   │   ├── .20200514221829.compaction.requested.crc
   │   ├── .20200514221902.clean.crc
   │   ├── .20200514221902.clean.inflight.crc
   │   ├── .20200514221902.clean.requested.crc
   │   ├── .20200514221902.deltacommit.crc
   │   ├── .20200514221902.deltacommit.inflight.crc
   │   ├── .20200514221902.deltacommit.requested.crc
   │   ├── .20200514221947.clean.crc
   │   ├── .20200514221947.clean.inflight.crc
   │   ├── .20200514221947.clean.requested.crc
   │   ├── .20200514221947.deltacommit.crc
   │   ├── .20200514221947.deltacommit.inflight.crc
   │   ├── .20200514221947.deltacommit.requested.crc
   │   ├── .20200514222010.commit.crc
   │   ├── .20200514222010.compaction.inflight.crc
   │   ├── .20200514222010.compaction.requested.crc
   │   ├── .20200514222036.clean.crc
   │   ├── .20200514222036.clean.inflight.crc
   │   ├── .20200514222036.clean.requested.crc
   │   ├── .20200514222036.deltacommit.crc
   │   ├── .20200514222036.deltacommit.inflight.crc
   │   ├── .20200514222036.deltacommit.requested.crc
   │   ├── .20200514222122.clean.crc
   │   ├── .20200514222122.clean.inflight.crc
   │   ├── .20200514222122.clean.requested.crc
   │   ├── .20200514222122.deltacommit.crc
   │   ├── .20200514222122.deltacommit.inflight.crc
   │   ├── .20200514222122.deltacommit.requested.crc
   │   ├── .20200514222145.commit.crc
   │   ├── .20200514222145.compaction.inflight.crc
   │   ├── .20200514222145.compaction.requested.crc
   │   ├── .20200515094100.clean.crc
   │   ├── .20200515094100.clean.inflight.crc
   │   ├── .20200515094100.clean.requested.crc
   │   ├── .20200515094100.deltacommit.crc
   │   ├── .20200515094100.deltacommit.inflight.crc
   │   ├── .20200515094100.deltacommit.requested.crc
   │   ├── .20200515094159.clean.crc
   │   ├── .20200515094159.clean.inflight.crc
   │   ├── .20200515094159.clean.requested.crc
   │   ├── .20200515094159.deltacommit.crc
   │   ├── .20200515094159.deltacommit.inflight.crc
   │   ├── .20200515094159.deltacommit.requested.crc
   │   ├── .20200515094227.commit.crc
   │   ├── .20200515094227.compaction.inflight.crc
   │   ├── .20200515094227.compaction.requested.crc
   │   ├── .20200515094301.clean.crc
   │   ├── .20200515094301.clean.inflight.crc
   │   ├── .20200515094301.clean.requested.crc
   │   ├── .20200515094301.deltacommit.crc
   │   ├── .20200515094301.deltacommit.inflight.crc
   │   ├── .20200515094301.deltacommit.requested.crc
   │   ├── .20200515094401.clean.crc
   │   ├── .20200515094401.clean.inflight.crc
   │   ├── .20200515094401.clean.requested.crc
   │   ├── .20200515094401.deltacommit.crc
   │   ├── .20200515094401.deltacommit.inflight.crc
   │   ├── .20200515094401.deltacommit.requested.crc
   │   ├── .20200515094431.commit.crc
   │   ├── .20200515094431.compaction.inflight.crc
   │   ├── .20200515094431.compaction.requested.crc
   │   ├── .20200515094508.clean.crc
   │   ├── .20200515094508.clean.inflight.crc
   │   ├── .20200515094508.clean.requested.crc
   │   ├── .20200515094508.deltacommit.crc
   │   ├── .20200515094508.deltacommit.inflight.crc
   │   ├── .20200515094508.deltacommit.requested.crc
   │   ├── .20200515094611.clean.crc
   │   ├── .20200515094611.clean.inflight.crc
   │   ├── .20200515094611.clean.requested.crc
   │   ├── .20200515094611.deltacommit.crc
   │   ├── .20200515094611.deltacommit.inflight.crc
   │   ├── .20200515094611.deltacommit.requested.crc
   │   ├── .20200515094647.commit.crc
   │   ├── .20200515094647.compaction.inflight.crc
   │   ├── .20200515094647.compaction.requested.crc
   │   ├── .20200515094724.clean.crc
   │   ├── .20200515094724.clean.inflight.crc
   │   ├── .20200515094724.clean.requested.crc
   │   ├── .20200515094724.deltacommit.crc
   │   ├── .20200515094724.deltacommit.inflight.crc
   │   ├── .20200515094724.deltacommit.requested.crc
   │   ├── .20200515094830.clean.crc
   │   ├── .20200515094830.clean.inflight.crc
   │   ├── .20200515094830.clean.requested.crc
   │   ├── .20200515094830.deltacommit.crc
   │   ├── .20200515094830.deltacommit.inflight.crc
   │   ├── .20200515094830.deltacommit.requested.crc
   │   ├── .20200515094903.commit.crc
   │   ├── .20200515094903.compaction.inflight.crc
   │   ├── .20200515094903.compaction.requested.crc
   │   ├── .20200515094936.clean.crc
   │   ├── .20200515094936.clean.inflight.crc
   │   ├── .20200515094936.clean.requested.crc
   │   ├── .20200515094936.deltacommit.crc
   │   ├── .20200515094936.deltacommit.inflight.crc
   │   ├── .20200515094936.deltacommit.requested.crc
   │   ├── .20200515095034.clean.crc
   │   ├── .20200515095034.clean.inflight.crc
   │   ├── .20200515095034.clean.requested.crc
   │   ├── .20200515095034.deltacommit.crc
   │   ├── .20200515095034.deltacommit.inflight.crc
   │   ├── .20200515095034.deltacommit.requested.crc
   │   ├── .20200515095102.commit.crc
   │   ├── .20200515095102.compaction.inflight.crc
   │   ├── .20200515095102.compaction.requested.crc
   │   ├── .20200515095136.clean.crc
   │   ├── .20200515095136.clean.inflight.crc
   │   ├── .20200515095136.clean.requested.crc
   │   ├── .20200515095136.deltacommit.crc
   │   ├── .20200515095136.deltacommit.inflight.crc
   │   ├── .20200515095136.deltacommit.requested.crc
   │   ├── .20200515095249.clean.crc
   │   ├── .20200515095249.clean.inflight.crc
   │   ├── .20200515095249.clean.requested.crc
   │   ├── .20200515095249.deltacommit.crc
   │   ├── .20200515095249.deltacommit.inflight.crc
   │   ├── .20200515095249.deltacommit.requested.crc
   │   ├── .20200515095320.commit.crc
   │   ├── .20200515095320.compaction.inflight.crc
   │   ├── .20200515095320.compaction.requested.crc
   │   ├── .aux
   │   │   ├── .20200514221648.compaction.requested.crc
   │   │   ├── .20200514221829.compaction.requested.crc
   │   │   ├── .20200514222010.compaction.requested.crc
   │   │   ├── .20200514222145.compaction.requested.crc
   │   │   ├── .20200515094227.compaction.requested.crc
   │   │   ├── .20200515094431.compaction.requested.crc
   │   │   ├── .20200515094647.compaction.requested.crc
   │   │   ├── .20200515094903.compaction.requested.crc
   │   │   ├── .20200515095102.compaction.requested.crc
   │   │   ├── .20200515095320.compaction.requested.crc
   │   │   ├── 20200514221648.compaction.requested
   │   │   ├── 20200514221829.compaction.requested
   │   │   ├── 20200514222010.compaction.requested
   │   │   ├── 20200514222145.compaction.requested
   │   │   ├── 20200515094227.compaction.requested
   │   │   ├── 20200515094431.compaction.requested
   │   │   ├── 20200515094647.compaction.requested
   │   │   ├── 20200515094903.compaction.requested
   │   │   ├── 20200515095102.compaction.requested
   │   │   └── 20200515095320.compaction.requested
   │   ├── .hoodie.properties.crc
   │   ├── .temp
   │   ├── 20200514221320.clean
   │   ├── 20200514221320.clean.inflight
   │   ├── 20200514221320.clean.requested
   │   ├── 20200514221407.clean
   │   ├── 20200514221407.clean.inflight
   │   ├── 20200514221407.clean.requested
   │   ├── 20200514221449.clean
   │   ├── 20200514221449.clean.inflight
   │   ├── 20200514221449.clean.requested
   │   ├── 20200514221539.clean
   │   ├── 20200514221539.clean.inflight
   │   ├── 20200514221539.clean.requested
   │   ├── 20200514221623.clean
   │   ├── 20200514221623.clean.inflight
   │   ├── 20200514221623.clean.requested
   │   ├── 20200514221623.deltacommit
   │   ├── 20200514221623.deltacommit.inflight
   │   ├── 20200514221623.deltacommit.requested
   │   ├── 20200514221648.commit
   │   ├── 20200514221648.compaction.inflight
   │   ├── 20200514221648.compaction.requested
   │   ├── 20200514221714.clean
   │   ├── 20200514221714.clean.inflight
   │   ├── 20200514221714.clean.requested
   │   ├── 20200514221714.deltacommit
   │   ├── 20200514221714.deltacommit.inflight
   │   ├── 20200514221714.deltacommit.requested
   │   ├── 20200514221759.clean
   │   ├── 20200514221759.clean.inflight
   │   ├── 20200514221759.clean.requested
   │   ├── 20200514221759.deltacommit
   │   ├── 20200514221759.deltacommit.inflight
   │   ├── 20200514221759.deltacommit.requested
   │   ├── 20200514221829.commit
   │   ├── 20200514221829.compaction.inflight
   │   ├── 20200514221829.compaction.requested
   │   ├── 20200514221902.clean
   │   ├── 20200514221902.clean.inflight
   │   ├── 20200514221902.clean.requested
   │   ├── 20200514221902.deltacommit
   │   ├── 20200514221902.deltacommit.inflight
   │   ├── 20200514221902.deltacommit.requested
   │   ├── 20200514221947.clean
   │   ├── 20200514221947.clean.inflight
   │   ├── 20200514221947.clean.requested
   │   ├── 20200514221947.deltacommit
   │   ├── 20200514221947.deltacommit.inflight
   │   ├── 20200514221947.deltacommit.requested
   │   ├── 20200514222010.commit
   │   ├── 20200514222010.compaction.inflight
   │   ├── 20200514222010.compaction.requested
   │   ├── 20200514222036.clean
   │   ├── 20200514222036.clean.inflight
   │   ├── 20200514222036.clean.requested
   │   ├── 20200514222036.deltacommit
   │   ├── 20200514222036.deltacommit.inflight
   │   ├── 20200514222036.deltacommit.requested
   │   ├── 20200514222122.clean
   │   ├── 20200514222122.clean.inflight
   │   ├── 20200514222122.clean.requested
   │   ├── 20200514222122.deltacommit
   │   ├── 20200514222122.deltacommit.inflight
   │   ├── 20200514222122.deltacommit.requested
   │   ├── 20200514222145.commit
   │   ├── 20200514222145.compaction.inflight
   │   ├── 20200514222145.compaction.requested
   │   ├── 20200515094100.clean
   │   ├── 20200515094100.clean.inflight
   │   ├── 20200515094100.clean.requested
   │   ├── 20200515094100.deltacommit
   │   ├── 20200515094100.deltacommit.inflight
   │   ├── 20200515094100.deltacommit.requested
   │   ├── 20200515094159.clean
   │   ├── 20200515094159.clean.inflight
   │   ├── 20200515094159.clean.requested
   │   ├── 20200515094159.deltacommit
   │   ├── 20200515094159.deltacommit.inflight
   │   ├── 20200515094159.deltacommit.requested
   │   ├── 20200515094227.commit
   │   ├── 20200515094227.compaction.inflight
   │   ├── 20200515094227.compaction.requested
   │   ├── 20200515094301.clean
   │   ├── 20200515094301.clean.inflight
   │   ├── 20200515094301.clean.requested
   │   ├── 20200515094301.deltacommit
   │   ├── 20200515094301.deltacommit.inflight
   │   ├── 20200515094301.deltacommit.requested
   │   ├── 20200515094401.clean
   │   ├── 20200515094401.clean.inflight
   │   ├── 20200515094401.clean.requested
   │   ├── 20200515094401.deltacommit
   │   ├── 20200515094401.deltacommit.inflight
   │   ├── 20200515094401.deltacommit.requested
   │   ├── 20200515094431.commit
   │   ├── 20200515094431.compaction.inflight
   │   ├── 20200515094431.compaction.requested
   │   ├── 20200515094508.clean
   │   ├── 20200515094508.clean.inflight
   │   ├── 20200515094508.clean.requested
   │   ├── 20200515094508.deltacommit
   │   ├── 20200515094508.deltacommit.inflight
   │   ├── 20200515094508.deltacommit.requested
   │   ├── 20200515094611.clean
   │   ├── 20200515094611.clean.inflight
   │   ├── 20200515094611.clean.requested
   │   ├── 20200515094611.deltacommit
   │   ├── 20200515094611.deltacommit.inflight
   │   ├── 20200515094611.deltacommit.requested
   │   ├── 20200515094647.commit
   │   ├── 20200515094647.compaction.inflight
   │   ├── 20200515094647.compaction.requested
   │   ├── 20200515094724.clean
   │   ├── 20200515094724.clean.inflight
   │   ├── 20200515094724.clean.requested
   │   ├── 20200515094724.deltacommit
   │   ├── 20200515094724.deltacommit.inflight
   │   ├── 20200515094724.deltacommit.requested
   │   ├── 20200515094830.clean
   │   ├── 20200515094830.clean.inflight
   │   ├── 20200515094830.clean.requested
   │   ├── 20200515094830.deltacommit
   │   ├── 20200515094830.deltacommit.inflight
   │   ├── 20200515094830.deltacommit.requested
   │   ├── 20200515094903.commit
   │   ├── 20200515094903.compaction.inflight
   │   ├── 20200515094903.compaction.requested
   │   ├── 20200515094936.clean
   │   ├── 20200515094936.clean.inflight
   │   ├── 20200515094936.clean.requested
   │   ├── 20200515094936.deltacommit
   │   ├── 20200515094936.deltacommit.inflight
   │   ├── 20200515094936.deltacommit.requested
   │   ├── 20200515095034.clean
   │   ├── 20200515095034.clean.inflight
   │   ├── 20200515095034.clean.requested
   │   ├── 20200515095034.deltacommit
   │   ├── 20200515095034.deltacommit.inflight
   │   ├── 20200515095034.deltacommit.requested
   │   ├── 20200515095102.commit
   │   ├── 20200515095102.compaction.inflight
   │   ├── 20200515095102.compaction.requested
   │   ├── 20200515095136.clean
   │   ├── 20200515095136.clean.inflight
   │   ├── 20200515095136.clean.requested
   │   ├── 20200515095136.deltacommit
   │   ├── 20200515095136.deltacommit.inflight
   │   ├── 20200515095136.deltacommit.requested
   │   ├── 20200515095249.clean
   │   ├── 20200515095249.clean.inflight
   │   ├── 20200515095249.clean.requested
   │   ├── 20200515095249.deltacommit
   │   ├── 20200515095249.deltacommit.inflight
   │   ├── 20200515095249.deltacommit.requested
   │   ├── 20200515095320.commit
   │   ├── 20200515095320.compaction.inflight
   │   ├── 20200515095320.compaction.requested
   │   ├── archived
   │   │   ├── ..commits_.archive.1_1-0-1.crc
   │   │   ├── ..commits_.archive.2_1-0-1.crc
   │   │   ├── ..commits_.archive.3_1-0-1.crc
   │   │   ├── .commits_.archive.1_1-0-1
   │   │   ├── .commits_.archive.2_1-0-1
   │   │   └── .commits_.archive.3_1-0-1
   │   └── hoodie.properties
   ├── date=18386
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094647.log.1_0-362-3729.crc
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094647.log.2_0-415-4262.crc
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094903.log.1_0-470-4801.crc
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094903.log.2_0-523-5334.crc
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515095102.log.1_0-578-5873.crc
   │   ├── ..36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515095102.log.2_0-631-6406.crc
   │   ├── ..hoodie_partition_metadata.crc
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-317-3200_20200515094647.parquet.crc
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-425-4272_20200515094903.parquet.crc
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-533-5344_20200515095102.parquet.crc
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-641-6416_20200515095320.parquet.crc
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094647.log.1_0-362-3729
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094647.log.2_0-415-4262
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094903.log.1_0-470-4801
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515094903.log.2_0-523-5334
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515095102.log.1_0-578-5873
   │   ├── .36f23f0e-f2dc-41f2-be77-978a29b87251-0_20200515095102.log.2_0-631-6406
   │   ├── .hoodie_partition_metadata
   │   ├── 36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-317-3200_20200515094647.parquet
   │   ├── 36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-425-4272_20200515094903.parquet
   │   ├── 36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-533-5344_20200515095102.parquet
   │   └── 36f23f0e-f2dc-41f2-be77-978a29b87251-0_1-641-6416_20200515095320.parquet
   ├── date=18389
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.1_1-362-3730.crc
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.2_1-415-4263.crc
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.1_1-470-4802.crc
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.2_1-523-5335.crc
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.1_1-578-5874.crc
   │   ├── ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.2_1-631-6407.crc
   │   ├── ..hoodie_partition_metadata.crc
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_0-317-3199_20200515094647.parquet.crc
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_0-425-4271_20200515094903.parquet.crc
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_0-533-5343_20200515095102.parquet.crc
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_0-641-6415_20200515095320.parquet.crc
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.1_1-362-3730
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.2_1-415-4263
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.1_1-470-4802
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.2_1-523-5335
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.1_1-578-5874
   │   ├── .047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.2_1-631-6407
   │   ├── .hoodie_partition_metadata
   │   ├── 047ce897-2276-401f-9381-b4058fc94b07-0_0-317-3199_20200515094647.parquet
   │   ├── 047ce897-2276-401f-9381-b4058fc94b07-0_0-425-4271_20200515094903.parquet
   │   ├── 047ce897-2276-401f-9381-b4058fc94b07-0_0-533-5343_20200515095102.parquet
   │   └── 047ce897-2276-401f-9381-b4058fc94b07-0_0-641-6415_20200515095320.parquet
   └── date=18390
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094647.log.1_2-362-3731.crc
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094647.log.2_2-415-4264.crc
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094903.log.1_2-470-4803.crc
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094903.log.2_2-523-5336.crc
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515095102.log.1_2-578-5875.crc
       ├── ..00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515095102.log.2_2-631-6408.crc
       ├── ..hoodie_partition_metadata.crc
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-317-3201_20200515094647.parquet.crc
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-425-4273_20200515094903.parquet.crc
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-533-5345_20200515095102.parquet.crc
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-641-6417_20200515095320.parquet.crc
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094647.log.1_2-362-3731
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094647.log.2_2-415-4264
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094903.log.1_2-470-4803
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515094903.log.2_2-523-5336
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515095102.log.1_2-578-5875
       ├── .00a2ef18-6d72-432e-a45b-920ea2e68f97-0_20200515095102.log.2_2-631-6408
       ├── .hoodie_partition_metadata
       ├── 00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-317-3201_20200515094647.parquet
       ├── 00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-425-4273_20200515094903.parquet
       ├── 00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-533-5345_20200515095102.parquet
       └── 00a2ef18-6d72-432e-a45b-920ea2e68f97-0_2-641-6417_20200515095320.parquet
   ```
   The delta files (avro files) are pretty huge compared to the actual parquet files:
   ```bash
   $ ls -lah /tmp/visitors_hudi_mor/date=18389/                                                                    
   total 139M
   drwxr-xr-x 24 rolandjohann  768 Mai 15 09:53 .
   drwxr-xr-x  6 rolandjohann  192 Mai 14 22:04 ..
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:47 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.1_1-362-3730.crc
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:49 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.2_1-415-4263.crc
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:50 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.1_1-470-4802.crc
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:51 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.2_1-523-5335.crc
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:52 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.1_1-578-5874.crc
   -rw-r--r--  1 rolandjohann 155K Mai 15 09:53 ..047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.2_1-631-6407.crc
   -rw-r--r--  1 rolandjohann   12 Mai 14 22:04 ..hoodie_partition_metadata.crc
   -rw-r--r--  1 rolandjohann  38K Mai 15 09:46 .047ce897-2276-401f-9381-b4058fc94b07-0_0-317-3199_20200515094647.parquet.crc
   -rw-r--r--  1 rolandjohann  38K Mai 15 09:49 .047ce897-2276-401f-9381-b4058fc94b07-0_0-425-4271_20200515094903.parquet.crc
   -rw-r--r--  1 rolandjohann  38K Mai 15 09:51 .047ce897-2276-401f-9381-b4058fc94b07-0_0-533-5343_20200515095102.parquet.crc
   -rw-r--r--  1 rolandjohann  38K Mai 15 09:53 .047ce897-2276-401f-9381-b4058fc94b07-0_0-641-6415_20200515095320.parquet.crc
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:47 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.1_1-362-3730
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:49 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094647.log.2_1-415-4263
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:50 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.1_1-470-4802
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:51 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515094903.log.2_1-523-5335
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:52 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.1_1-578-5874
   -rw-r--r--  1 rolandjohann  20M Mai 15 09:53 .047ce897-2276-401f-9381-b4058fc94b07-0_20200515095102.log.2_1-631-6407
   -rw-r--r--  1 rolandjohann   94 Mai 14 22:04 .hoodie_partition_metadata
   -rw-r--r--  1 rolandjohann 4,7M Mai 15 09:46 047ce897-2276-401f-9381-b4058fc94b07-0_0-317-3199_20200515094647.parquet
   -rw-r--r--  1 rolandjohann 4,7M Mai 15 09:49 047ce897-2276-401f-9381-b4058fc94b07-0_0-425-4271_20200515094903.parquet
   -rw-r--r--  1 rolandjohann 4,7M Mai 15 09:51 047ce897-2276-401f-9381-b4058fc94b07-0_0-533-5343_20200515095102.parquet
   -rw-r--r--  1 rolandjohann 4,7M Mai 15 09:53 047ce897-2276-401f-9381-b4058fc94b07-0_0-641-6415_20200515095320.parquet
   ```
   
   Our current use case is to ingest records as stream to HDFS, register Hive Tables for analytics and be able to circumvent the small file problem. For this we implemented some kind of compaction job that reads the ingested files, coalescs to small amount of date based partitions and write back to HDFS target paths. To circumvent the risk of write/read races we implemented multi version partitioning our selves - creating a subdirectory for each partition (version) and register that specific path as partition location at hive. Periodically we delete partition versions older than 24h.
   
   As far as I can see Hudi is a perfect fit for that logic. The only thing that we are struggeling is is the compaction and complete cleanup of previous versions after 24h.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org