You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Manoj Govindassamy (Jira)" <ji...@apache.org> on 2022/01/21 18:18:00 UTC

[jira] [Created] (HUDI-3301) Metadata table inline reading should be stateless and thread safe

Manoj Govindassamy created HUDI-3301:
----------------------------------------

             Summary: Metadata table inline reading should be stateless and thread safe
                 Key: HUDI-3301
                 URL: https://issues.apache.org/jira/browse/HUDI-3301
             Project: Apache Hudi
          Issue Type: Task
            Reporter: Manoj Govindassamy
            Assignee: Ethan Guo
             Fix For: 0.11.0


Metadata table inline reading (enable.full.scan.log.files = false) today alters instance member fields and not thread safe.

 

When the inline reading is enabled, HoodieMetadataMergedLogRecordReader doesn't do full read of log and base files and doesn't fill in the ExternalSpillableMap records cache. Each getRecordsByKeys() thereby will re-read the log and base files by design. But the issue here is this reading alters the instance members and the filled in records are relevant only for that request. Any concurrent getRecordsByKeys() is also modifying the member variable leading to NPE.

 

To avoid this, a temporary fix of making getRecordsByKeys() a synchronized method has been pushed to master. But this fix doesn't solve all usecases. We need to make the whole class stateless and thread safe for inline reading.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)