You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/04/19 00:18:00 UTC

[jira] [Updated] (HUDI-3301) MergedLogRecordReader inline reading should be stateless and thread safe

     [ https://issues.apache.org/jira/browse/HUDI-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-3301:
----------------------------
    Priority: Blocker  (was: Critical)

> MergedLogRecordReader inline reading should be stateless and thread safe
> ------------------------------------------------------------------------
>
>                 Key: HUDI-3301
>                 URL: https://issues.apache.org/jira/browse/HUDI-3301
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: Manoj Govindassamy
>            Assignee: Yue Zhang
>            Priority: Blocker
>             Fix For: 0.12.0
>
>
> Metadata table inline reading (enable.full.scan.log.files = false) today alters instance member fields and not thread safe.
>  
> When the inline reading is enabled, HoodieMetadataMergedLogRecordReader doesn't do full read of log and base files and doesn't fill in the ExternalSpillableMap records cache. Each getRecordsByKeys() thereby will re-read the log and base files by design. But the issue here is this reading alters the instance members and the filled in records are relevant only for that request. Any concurrent getRecordsByKeys() is also modifying the member variable leading to NPE.
>  
> To avoid this, a temporary fix of making getRecordsByKeys() a synchronized method has been pushed to master. But this fix doesn't solve all usecases. We need to make the whole class stateless and thread safe for inline reading.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)