You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/21 09:54:14 UTC
[GitHub] [hudi] mauropelucchi commented on issue #2564: Hoodie clean is not deleting old files
mauropelucchi commented on issue #2564:
URL: https://github.com/apache/hudi/issues/2564#issuecomment-864899684
Hello @vinothchandar @n3nash
We continue to have this type of issue
Let's me to share our situation, the configuration is the same for all the tables in our environment
![image](https://user-images.githubusercontent.com/16307145/122742906-bb9cd400-d286-11eb-8942-369d0ce69f41.png)
![image](https://user-images.githubusercontent.com/16307145/122742944-c8212c80-d286-11eb-9b3c-48972ce75fe6.png)
Second table:
![image](https://user-images.githubusercontent.com/16307145/122743276-25b57900-d287-11eb-89d9-86f2f7337ef3.png)
This is our current configuration:
```
def _get_hudi_options(self, table_name: str, parallelism: int):
return {
'hoodie.table.name': table_name,
'hoodie.datasource.write.recordkey.field': 'posting_key',
'hoodie.datasource.write.partitionpath.field': 'range_partition',
'hoodie.datasource.write.table.name': table_name,
'hoodie.datasource.write.precombine.field': 'update_date',
'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
'hoodie.consistency.check.enabled': True,
'hoodie.bloom.index.filter.type': 'dynamic_v0',
'hoodie.bloom.index.bucketized.checking': False,
'hoodie.memory.merge.max.size': '2004857600000',
'hoodie.upsert.shuffle.parallelism': parallelism,
'hoodie.insert.shuffle.parallelism': parallelism,
'hoodie.bulkinsert.shuffle.parallelism': parallelism,
'hoodie.parquet.small.file.limit': '204857600',
'hoodie.parquet.max.file.size': str(self.__parquet_max_file_size_byte),
'hoodie.memory.compaction.fraction': '384402653184',
'hoodie.write.buffer.limit.bytes': str(128 * 1024 * 1024),
'hoodie.compact.inline': True,
'hoodie.compact.inline.max.delta.commits': 1,
'hoodie.datasource.compaction.async.enable': False,
'hoodie.parquet.compression.ratio': '0.35',
'hoodie.logfile.max.size': '268435456',
'hoodie.logfile.to.parquet.compression.ratio': '0.5',
'hoodie.datasource.write.hive_style_partitioning': True,
'hoodie.keep.min.commits': 5,
'hoodie.keep.max.commits': 6,
'hoodie.copyonwrite.record.size.estimate': 32,
'hoodie.cleaner.commits.retained': 4,
'hoodie.clean.automatic': True
}
```
Any ideas?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org