You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/10 04:13:51 UTC
[GitHub] [hudi] kepplertreet commented on issue #7628: [SUPPORT] Hudi Metadata Column Stats Fail
kepplertreet commented on issue #7628:
URL: https://github.com/apache/hudi/issues/7628#issuecomment-1376709063
Hi @alexeykudinkin
We are using an integer id column as the `hoodie.datasource.write.recordkey.field`
Listing a few sample values
```
1263633528
1263633530
1263633531
```
As for the Hud configurations we are using :
```
0 hoodie.table.version 5
1 hoodie.datasource.write.operation upsert
2 hoodie.datasource.write.hive_style_partitioning true
3 hoodie.datasource.write.precombine.field _commit_time_ms
4 hoodie.datasource.write.commitmeta.key.prefix _
5 hoodie.datasource.write.insert.drop.duplicates true
6 hoodie.datasource.hive_sync.enable true
7 hoodie.datasource.hive_sync.use_jdbc true
8 hoodie.datasource.hive_sync.auto_create_database true
9 hoodie.datasource.hive_sync.support_timestamp false
10 hoodie.datasource.hive_sync.skip_ro_suffix true
11 hoodie.parquet.compression.codec snappy
12 hoodie.metrics.on false
13 hoodie.metrics.reporter.type PROMETHEUS_PUSHGATEWAY
14 hoodie.metrics.pushgateway.host <ip_address>
15 hoodie.metrics.pushgateway.port <port_number>
16 hoodie.metrics.pushgateway.random.job.name.suffix false
17 hoodie.metrics.pushgateway.report.period.seconds 30
18 hoodie.metadata.enable true
19 hoodie.metadata.metrics.enable true
20 hoodie.metadata.clean.async true
21 hoodie.metadata.index.column.stats.enable true
22 hoodie.metadata.index.bloom.filter.enable true
23 hoodie.metadata.index.async true
24 hoodie.write.concurrency.mode OPTIMISTIC_CONCURRENCY_CONTROL
25 hoodie.write.lock.provider org.apache.hudi.client.transaction.lock.FileSy...
26 hoodie.datasource.compaction.async.enable true
27 hoodie.compact.schedule.inline false
28 hoodie.compact.inline.trigger.strategy NUM_COMMITS
29 hoodie.compact.inline.max.delta.commits 2
30 hoodie.index.type BLOOM
31 hoodie.cleaner.policy.failed.writes LAZY
32 hoodie.clean.automatic true
33 hoodie.clean.async true
34 hoodie.cleaner.commits.retained 4
35 hoodie.write.lock.client.num_retries 10
36 hoodie.write.lock.wait_time_ms_between_retry 1000
37 hoodie.write.lock.num_retries 15
38 hoodie.write.lock.wait_time_ms 60000
39 hoodie.write.lock.zookeeper.connection_timeout_ms 15000
40 hoodie.bloom.index.use.metadata true
41 hoodie.archive.async true
42 hoodie.parquet.max.file.size 1073741824
43 hoodie.parquet.small.file.limit 1610612736
44 hoodie.table.name <table_name>
45 hoodie.datasource.write.table.name <table_name>
46 hoodie.datasource.write.table.type MERGE_ON_READ
47 hoodie.datasource.write.recordkey.field id
48 hoodie.datasource.write.partitionpath.field _year_month
49 hoodie.datasource.write.keygenerator.class org.apache.hudi.keygen.SimpleKeyGenerator
50 hoodie.datasource.hive_sync.table <table_name>
51 hoodie.datasource.hive_sync.database <database_name>
52 hoodie.metrics.pushgateway.job.name <database_name>.<table_name>
53 hoodie.write.lock.filesystem.path <table_name>
54 hoodie.insert.shuffle.parallelism 4
55 hoodie.upsert.shuffle.parallelism 4
56 hoodie.delete.shuffle.parallelism 4
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org