You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Prashant Wason (Jira)" <ji...@apache.org> on 2023/04/18 06:02:00 UTC

[jira] [Created] (HUDI-6092) Reuse schema objects while reading large number of log blocks

Prashant Wason created HUDI-6092:
------------------------------------

             Summary: Reuse schema objects while reading large number of log blocks
                 Key: HUDI-6092
                 URL: https://issues.apache.org/jira/browse/HUDI-6092
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Prashant Wason
            Assignee: Prashant Wason


Some log files may contain a large number of log blocks. When such a log file is read, for each block the schema string is read from the log block header and parsed into the Schema object. The schema string will most probably be the same and hence parsing it again and again created overhead of parsing as well as memory overhead.

An optimization is to cache the parsed schema objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)