You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2023/04/22 08:37:00 UTC

[jira] [Closed] (HUDI-6092) Reuse schema objects while reading large number of log blocks

     [ https://issues.apache.org/jira/browse/HUDI-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chen closed HUDI-6092.
----------------------------
    Resolution: Fixed

Fixed via master branch: 09a7953f12535be9be746809fb79fd5f23df083f

> Reuse schema objects while reading large number of log blocks
> -------------------------------------------------------------
>
>                 Key: HUDI-6092
>                 URL: https://issues.apache.org/jira/browse/HUDI-6092
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.1, 0.14.0
>
>
> Some log files may contain a large number of log blocks. When such a log file is read, for each block the schema string is read from the log block header and parsed into the Schema object. The schema string will most probably be the same and hence parsing it again and again created overhead of parsing as well as memory overhead.
> An optimization is to cache the parsed schema objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)