You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/21 16:28:52 UTC
[GitHub] [hudi] nikita-sheremet-clearscale edited a comment on issue #4044: [SUPPORT] Question on hudi's insert statment taking too long
nikita-sheremet-clearscale edited a comment on issue #4044:
URL: https://github.com/apache/hudi/issues/4044#issuecomment-974849011
@xushiyan
Many thanks for the quick reply!!!
Hudi config is:
```
hoodie.datasource.hive_sync.database -> hudi
hoodie.datasource.write.row.writer.enable -> false
hoodie.parquet.small.file.limit -> 104857600
hoodie.datasource.hive_sync.partition_fields -> y,m,d
hoodie.bloom.index.filter.dynamic.max.entries -> 100000000
hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.parquet.max.file.size -> 209715200
hoodie.parquet.page.size -> 1048576
hoodie.bloom.index.prune.by.ranges -> false
hoodie.parquet.block.size -> 209715200
hoodie.datasource.hive_sync.table -> table
hoodie.index.type -> BLOOM
hoodie.datasource.write.operation -> bulk_insert
hoodie.datasource.hive_sync.enable -> true
hoodie.datasource.write.recordkey.field -> device_uuid
hoodie.table.name -> table
hoodie.datasource.write.table.type -> COPY_ON_WRITE
hoodie.datasource.write.hive_style_partitioning -> true
hoodie.bulkinsert.sort.mode -> NONE
hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.datasource.write.drop.partition.columns -> true
hoodie.datasource.write.partitionpath.field -> y,m,d
hoodie.bloom.index.filter.type -> DYNAMIC_V0
hoodie.bulkinsert.shuffle.parallelism -> 15000
hoodie.index.bloom.num_entries -> 100000
```
The file 20211115193538.rollback contains the following:
```
Objavro.schema�{"type":"record","name":"HoodieRollbackMetadata","namespace":"org.apache.hudi.avro.model","fields":[{"name":"startRollbackTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"commitsRollback","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieRollbackPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"rollbackLogFiles","type":["null",{"type":"map","values":"long","avro.java.string":"String"}],"default":null},{"name":"writtenLogFiles","type":["null",{"type":"map","values":"
long","avro.java.string":"String"}],"default":null}]},"avro.java.string":"String"}},{"name":"version","type":["int","null"],"default":1},{"name":"instantsRollback","type":{"type":"array","items":{"type":"record","name":"HoodieInstantInfo","fields":[{"name":"commitTime","type":{"type":"string","avro.java.string":"String"}},{"name":"action","type":{"type":"string","avro.java.string":"String"}}]},"default":[]},"default":[]}]} �H�y�ܘ�3��M��_20211115193538���G20211108130330 � y=2021/m=04/d=30 y=2021/m=04/d=30 y=2020/m=06/d=30 y=2020/m=06/d=30 y=2020/m=11/d=30 y=2020/m=11/d=30 y=2020/m=11/d=23 y=2020/m=11/d=23 y=2020/m=11/d=24 y=2020/m=11/d=24 y=2020/m=11/d=21 y=2020/m=11/d=21 y=2020/m=11/d=22 y=2020/m=11/d=22 y=2020/m=11/d=27 y=2020/m=11/d=27 y=2020/m=11/d=28 y=2020/m=11/d=28 y=2020/m=11/d=25 y=2020/m=11/d=25 y=2020/m=11/d=26 y=2020/m=11/d=26 y=2020/m=11/d=20 y=2020/m=11/d=20 y=2020
/m=11/d=29
```
Then there is a list of partition key and file on s3. Why rollback operation is performed on so many partitions? Only partition with y=2021/m=07 was run.
Btw hudi have a data and table. What happens when some partitions were deleted and then indexed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org