You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/21 16:28:52 UTC

[GitHub] [hudi] nikita-sheremet-clearscale edited a comment on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long

nikita-sheremet-clearscale edited a comment on issue #4044:
URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849011


   @xushiyan 
   Many thanks for the quick reply!!!
   
   Hudi config is:
   ```
   hoodie.datasource.hive_sync.database -> hudi
   hoodie.datasource.write.row.writer.enable -> false
   hoodie.parquet.small.file.limit -> 104857600
   hoodie.datasource.hive_sync.partition_fields -> y,m,d
   hoodie.bloom.index.filter.dynamic.max.entries -> 100000000
   hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.parquet.max.file.size -> 209715200
   hoodie.parquet.page.size -> 1048576
   hoodie.bloom.index.prune.by.ranges -> false
   hoodie.parquet.block.size -> 209715200
   hoodie.datasource.hive_sync.table -> table
   hoodie.index.type -> BLOOM
   hoodie.datasource.write.operation -> bulk_insert
   hoodie.datasource.hive_sync.enable -> true
   hoodie.datasource.write.recordkey.field -> device_uuid
   hoodie.table.name -> table
   hoodie.datasource.write.table.type -> COPY_ON_WRITE
   hoodie.datasource.write.hive_style_partitioning -> true
   hoodie.bulkinsert.sort.mode -> NONE
   hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.drop.partition.columns -> true
   hoodie.datasource.write.partitionpath.field -> y,m,d
   hoodie.bloom.index.filter.type -> DYNAMIC_V0
   hoodie.bulkinsert.shuffle.parallelism -> 15000
   hoodie.index.bloom.num_entries -> 100000
   ```
   
   The file 20211115193538.rollback contains the following:
   ```
   Objavro.schema�{"type":"record","name":"HoodieRollbackMetadata","namespace":"org.apache.hudi.avro.model","fields":[{"name":"startRollbackTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"commitsRollback","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieRollbackPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"rollbackLogFiles","type":["null",{"type":"map","values":"long","avro.java.string":"String"}],"default":null},{"name":"writtenLogFiles","type":["null",{"type":"map","values":"
 long","avro.java.string":"String"}],"default":null}]},"avro.java.string":"String"}},{"name":"version","type":["int","null"],"default":1},{"name":"instantsRollback","type":{"type":"array","items":{"type":"record","name":"HoodieInstantInfo","fields":[{"name":"commitTime","type":{"type":"string","avro.java.string":"String"}},{"name":"action","type":{"type":"string","avro.java.string":"String"}}]},"default":[]},"default":[]}]} �H�y�ܘ�3��M��_20211115193538���G20211108130330 � y=2021/m=04/d=30 y=2021/m=04/d=30     y=2020/m=06/d=30 y=2020/m=06/d=30     y=2020/m=11/d=30 y=2020/m=11/d=30     y=2020/m=11/d=23 y=2020/m=11/d=23     y=2020/m=11/d=24 y=2020/m=11/d=24     y=2020/m=11/d=21 y=2020/m=11/d=21     y=2020/m=11/d=22 y=2020/m=11/d=22     y=2020/m=11/d=27 y=2020/m=11/d=27     y=2020/m=11/d=28 y=2020/m=11/d=28     y=2020/m=11/d=25 y=2020/m=11/d=25     y=2020/m=11/d=26 y=2020/m=11/d=26     y=2020/m=11/d=20 y=2020/m=11/d=20     y=2020
 /m=11/d=29
   ```
   Then there is a list of partition key and file on s3. Why rollback operation is performed on so many partitions? Only partition with y=2021/m=07 was run.
   
   Btw hudi have a data and table. What happens when some partitions were deleted and then indexed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org