You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Prashant Wason (Jira)" <ji...@apache.org> on 2022/05/14 00:51:00 UTC

[jira] [Commented] (HUDI-53) Implement Record level Index to map a record key to a pair #90

    [ https://issues.apache.org/jira/browse/HUDI-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536945#comment-17536945 ] 

Prashant Wason commented on HUDI-53:
------------------------------------

[Schema for the record index|https://github.com/apache/hudi/pull/5581/files#diff-66abb79a1d28adea3315c48a6fc334247c7fb9a795bd59f093f9c8bc2da1a91a] optimized to save the record_key -> fileID mapping. It currently achieves about 48 to 50 bytes per mapping stored in the record index with GZIP compression. I think with ZSTD we may reduce it by a few more bytes. 

 

[Various configs|https://github.com/apache/hudi/pull/5581/files#diff-11e9ef6bd53ef1001b669a1dc68dde2aba9b33c9eb72cc1e4198750336d79772] for the MT record index

 

[A record key iterator|https://github.com/apache/hudi/pull/5581/files#diff-afa34d95ad0690283d7a741ccfe1d3fc7df9e2f561bc9cc9c5ba25fa3b57a30b] for the various base file readers ([HFileReader|[https://github.com/apache/hudi/pull/5581/files#diff-0abe0627b252c5eef221374b5e91f34d09f457f52e5d9798aee5ef79111c5adb]]  , [ORCReader|https://github.com/apache/hudi/pull/5581/files#diff-3abdb5ba0f56065ad767e0b5690a80493fcefccdfbc8e3500a3e68f0f8f6ca8b], [ParquetReader|https://github.com/apache/hudi/pull/5581/files#diff-d7264f7fc03aefba56a28e84cf897ad88f1e99e79a107df8ba27b546514ce1e4]). This is used for reading the keys while initializing the record index.

 

1. Changed enabling of metadata table partitions
In the current code, we enable and check for metadata table partitions through the WriteConfig. This does not bear well for cases of synchronous updates to table as a faulty config will render the MT inconsistent.

[Changed this to save the enable state of a MT partition in the hoodie.properties file post initialization|https://github.com/apache/hudi/pull/5581/files#diff-53ae78ff1f1bd5d8b0f87cb69853299e5228b44f30b770e27f60c0c3c27d4185]. The checks would now be {{table.getMetaClient().getTableConfig().isMetadataTableEnabled() }} instead of {{ config.isMetadataTableEnabled()}}

A new Hoodie Index type RECORD_INDEX[ and its implementation|[https://github.com/apache/hudi/pull/5581/files#diff-b22610e17825aeccb587f64b3dd0fedfe428d4f33b0d2a25a8d258a23cd66323]]. This index used the MT record_index partition to perform the update and tag operations.

By default, HoodieWriteHandle does not track the written records within WriteStatus for memory optimization. But with MT partitions like record_index, we need access to the information about records inserted/updated into the dataset. Hence, [we need to track written records within WriteStatus|https://github.com/apache/hudi/pull/5581/files#diff-63a77e05c924278c190061a1a18a992a7f9480af14f0f34f4328bf72ae673fe9] in two cases:
      1. When the HoodieIndex being used is not implicit with storage
      2. If any of the metadata table partitions (record index, etc) which require written record tracking are enabled

 

File groups in each partitions are fixed at creation time and we do not want them to be split into muliple files
    // ever. [Hence we use a very large basefile size|https://github.com/apache/hudi/pull/5581/files#diff-b20dd7a7d374928dc9936cc33789ba1839da3a10883fc65d62fcccf84b81ed4f] in metadata table.

In metadata table, the [log blocks should be as large as the log file max size|https://github.com/apache/hudi/pull/5581/files#diff-b20dd7a7d374928dc9936cc33789ba1839da3a10883fc65d62fcccf84b81ed4f]. This reduces the overall number of log blocks and speeds up lookup of keys in HFileLogBlocks.

 

[Initializing of record index|https://github.com/apache/hudi/pull/5581/files#diff-b20dd7a7d374928dc9936cc33789ba1839da3a10883fc65d62fcccf84b81ed4f] for all engines by reading keys from base files.

 

[Estimates the file group count to use for a MT partition|https://github.com/apache/hudi/pull/5581/files#diff-b20dd7a7d374928dc9936cc33789ba1839da3a10883fc65d62fcccf84b81ed4f]. Different partitions save different amount of information and hence need a separate file group count. This is hard to estimate manually when thousands of datasets are involved in production rollout. This code estimates correct size of the file group for a partition by default. The WriteConfig can still be used to override or provide a manual value.

 

[BulkInsert for MT when a partition is being initialized|https://github.com/apache/hudi/pull/5581/files#diff-51e81a343e90f5c52e69c184b4eb6718542affde99dee3c85af9edb6425a5e19] for the first time. This has various benefits for scale:

 - avoids Workload Profile which needs lots of memory and is slow

 - Is fast - 270Billion records indexed in 7.5hrs 

 

[A new BulkInsertParitioner for MT|https://github.com/apache/hudi/pull/5581/files#diff-65089796097739c8b1a6b5be58cc4a9d15c8f754f55f25b03e7a2871cfe5e9d3] which is required for sharding the records into the correct file groups.

[Metrics for HUDI Bloom Indexes|https://github.com/apache/hudi/pull/5581/files#diff-04cb169f456ebe056b91868bcacea9eb8e26a99816b777ca9d84ffe5eb8521a7]  and [HBaseIndex|https://github.com/apache/hudi/pull/5581/files#diff-5fc348b9beea8b086a96808f76f9527a6076334d631d45c5cefb261e7155cad4]. These are useful for comparison between the indexes as well as for debugging ingestion issues.

 

 

[Parallel reading of keys from MT |[https://github.com/apache/hudi/pull/5581/files#diff-7c43aea81a02b4f135452b50eaa36d5868081e72b37d43101ca9de1f9ebb5195]]with an interface optimized for large amount of reads (millions of tagLocations etc). The existing interface uses List<> which is less performant than using Map

 

> Implement Record level Index to map a record key to a <partition path, FileID> pair #90
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-53
>                 URL: https://issues.apache.org/jira/browse/HUDI-53
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: metadata, writer-core
>    Affects Versions: 0.9.0
>            Reporter: Vinoth Chandar
>            Assignee: Prashant Wason
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.12.0
>
>
> [https://github.com/uber/hudi/issues/90] 
>  
> feature-enquiry
>  * [https://github.com/apache/hudi/issues/4058]
>  *  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)