You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/10 04:13:51 UTC

[GitHub] [hudi] kepplertreet commented on issue #7628: [SUPPORT] Hudi Metadata Column Stats Fail

kepplertreet commented on issue #7628:
URL: https://github.com/apache/hudi/issues/7628#issuecomment-1376709063

   Hi @alexeykudinkin  
   
   We are using an integer id column as  the `hoodie.datasource.write.recordkey.field` 
   Listing a few sample values
   ```
   1263633528
   1263633530
   1263633531 
   ``` 
   
   As for the Hud configurations we are using : 
   ```
   0                                hoodie.table.version                                                  5
   1                   hoodie.datasource.write.operation                                             upsert
   2     hoodie.datasource.write.hive_style_partitioning                                               true
   3            hoodie.datasource.write.precombine.field                                    _commit_time_ms
   4       hoodie.datasource.write.commitmeta.key.prefix                                                  _
   5      hoodie.datasource.write.insert.drop.duplicates                                               true
   6                  hoodie.datasource.hive_sync.enable                                               true
   7                hoodie.datasource.hive_sync.use_jdbc                                               true
   8    hoodie.datasource.hive_sync.auto_create_database                                               true
   9       hoodie.datasource.hive_sync.support_timestamp                                              false
   10         hoodie.datasource.hive_sync.skip_ro_suffix                                               true
   11                   hoodie.parquet.compression.codec                                             snappy
   12                                  hoodie.metrics.on                                              false
   13                       hoodie.metrics.reporter.type                             PROMETHEUS_PUSHGATEWAY
   14                    hoodie.metrics.pushgateway.host                                       <ip_address>
   15                    hoodie.metrics.pushgateway.port                                       <port_number>
   16  hoodie.metrics.pushgateway.random.job.name.suffix                                              false
   17   hoodie.metrics.pushgateway.report.period.seconds                                                 30
   18                             hoodie.metadata.enable                                               true
   19                     hoodie.metadata.metrics.enable                                               true
   20                        hoodie.metadata.clean.async                                               true
   21          hoodie.metadata.index.column.stats.enable                                               true
   22          hoodie.metadata.index.bloom.filter.enable                                               true
   23                        hoodie.metadata.index.async                                               true
   24                      hoodie.write.concurrency.mode                     OPTIMISTIC_CONCURRENCY_CONTROL
   25                         hoodie.write.lock.provider  org.apache.hudi.client.transaction.lock.FileSy...
   26          hoodie.datasource.compaction.async.enable                                               true
   27                     hoodie.compact.schedule.inline                                              false
   28             hoodie.compact.inline.trigger.strategy                                        NUM_COMMITS
   29            hoodie.compact.inline.max.delta.commits                                                  2
   30                                  hoodie.index.type                                              BLOOM
   31                hoodie.cleaner.policy.failed.writes                                               LAZY
   32                             hoodie.clean.automatic                                               true
   33                                 hoodie.clean.async                                               true
   34                    hoodie.cleaner.commits.retained                                                  4
   35               hoodie.write.lock.client.num_retries                                                 10
   36       hoodie.write.lock.wait_time_ms_between_retry                                               1000
   37                      hoodie.write.lock.num_retries                                                 15
   38                     hoodie.write.lock.wait_time_ms                                              60000
   39  hoodie.write.lock.zookeeper.connection_timeout_ms                                              15000
   40                    hoodie.bloom.index.use.metadata                                               true
   41                               hoodie.archive.async                                               true
   42                       hoodie.parquet.max.file.size                                         1073741824
   43                    hoodie.parquet.small.file.limit                                         1610612736
   44                                  hoodie.table.name                                         <table_name>
   45                 hoodie.datasource.write.table.name                                         <table_name>
   46                 hoodie.datasource.write.table.type                                      MERGE_ON_READ
   47            hoodie.datasource.write.recordkey.field                                                 id
   48        hoodie.datasource.write.partitionpath.field                                        _year_month
   49         hoodie.datasource.write.keygenerator.class          org.apache.hudi.keygen.SimpleKeyGenerator
   50                  hoodie.datasource.hive_sync.table                                       <table_name>
   51               hoodie.datasource.hive_sync.database                                     <database_name>
   52                hoodie.metrics.pushgateway.job.name                          <database_name>.<table_name>
   53                  hoodie.write.lock.filesystem.path                                          <table_name>
   54                  hoodie.insert.shuffle.parallelism                                                  4
   55                  hoodie.upsert.shuffle.parallelism                                                  4
   56                  hoodie.delete.shuffle.parallelism                                                  4
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org