You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/12 17:09:27 UTC

[GitHub] [hudi] mauropelucchi commented on issue #2564: Hoodie clean is not deleting old files

mauropelucchi commented on issue #2564:
URL: https://github.com/apache/hudi/issues/2564#issuecomment-778321055


   @bvaradar We are running this conf for 2 separate locations:
   
   hudi_options = {
    'hoodie.table.name': table_name,
    'hoodie.datasource.write.recordkey.field': 'key',
    'hoodie.datasource.write.partitionpath.field': 'range_partition',
    'hoodie.datasource.write.table.name': 'tablename',
    'hoodie.datasource.write.precombine.field': 'update_date',
    'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
    'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
    'hoodie.consistency.check.enabled': True,
    'hoodie.bloom.index.filter.type': 'dynamic_v0',
    'hoodie.bloom.index.bucketized.checking': False,
    'hoodie.memory.merge.max.size': '2004857600000',
    'hoodie.upsert.shuffle.parallelism': parallelism,
    'hoodie.insert.shuffle.parallelism': parallelism,
    'hoodie.bulkinsert.shuffle.parallelism': parallelism,
    'hoodie.parquet.small.file.limit': '204857600',
    'hoodie.parquet.max.file.size': '434217728',
    'hoodie.memory.compaction.fraction': '384402653184',
    'hoodie.write.buffer.limit.bytes': str(128 * 1024 * 1024),
    'hoodie.compact.inline': True,
    'hoodie.compact.inline.max.delta.commits': 1,
    'hoodie.datasource.compaction.async.enable': False,
    'hoodie.parquet.compression.ratio': '0.35',
    'hoodie.logfile.max.size': '268435456',
    'hoodie.logfile.to.parquet.compression.ratio': '0.5',
    'hoodie.datasource.write.hive_style_partitioning': True,
    'hoodie.keep.min.commits': 5,
    'hoodie.keep.max.commits': 6,
    'hoodie.copyonwrite.record.size.estimate': 32,
    'hoodie.cleaner.commits.retained': 4,
    'hoodie.clean.automatic': True,
    'hoodie.datasource.write.operation': 'upsert'
   }
   
   For this folder, we cannot see the triggering of the cleaning task (but the hoodie.clean.automatic is activated): 
    
   ![image](https://user-images.githubusercontent.com/16307145/107798638-b968bd00-6d5c-11eb-819b-6c552414ca42.png)
   
   But for this folder, we can see the triggering of the cleaning task: 
   
   ![image](https://user-images.githubusercontent.com/16307145/107798701-cb4a6000-6d5c-11eb-8fc3-1f0de651cb75.png)
   
   both the script used to merge the data in the 2 folders use the same configuration.
   
   From hudi-cli (by the command cleans run), we also tried to force the cleaning task without any success.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org