You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Manoj Govindassamy (Jira)" <ji...@apache.org> on 2022/01/22 07:41:00 UTC

[jira] [Commented] (HUDI-3300) Timeline server FSViewManager should avoid inline reading for metadata file partition

    [ https://issues.apache.org/jira/browse/HUDI-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480362#comment-17480362 ] 

Manoj Govindassamy commented on HUDI-3300:
------------------------------------------

Verified the time line server - it has reuse enabled and the readers opened are retained and the files listed are cached at the higher level there by serving faster requests. But for all other cases, when the {{reuse}} readers are false, we should not be caching the reader handles at all. Today we cache the readers and then close the readers towards the end. There is a possibility of multiple non {{reuse}} requests coming at the same time and using the same readers. Fullscan/Inline scan is a totally different problem and i am not going there. In the worst case two non {{reuse}} requests can use the same reader and its latest file slice retrieved at the time of caching the readers. If for any reasons the file slice happens to move forward because of concurrent upserts, the readers wouldn't know about this and would only read the old file slices.

 

Siva: 
to my understanding, you are mostly right in your explanation. but tell me something. when 2nd reader is coming through, if latest commit time hasn't changed, why would there be new updates to the file slice.  On the contrary, If there was updates to latest file slice (with new log appends), latest commit time would have updated and so caller should have re-initialized the file system view right. or does this re-initialize happen only incase of timeline server and at other places we don't keep refreshing the fileSystemView.I know this is very tricky. we definitely need to get a good understanding of every nitty gritty detail here.
 

> Timeline server FSViewManager should avoid inline reading for metadata file partition
> -------------------------------------------------------------------------------------
>
>                 Key: HUDI-3300
>                 URL: https://issues.apache.org/jira/browse/HUDI-3300
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Manoj Govindassamy
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> When inline reading is enabled, that is hoodie.metadata.enable.full.scan.log.files = false, MetadataMergedLogRecordReader doesn't cache the file listings records via the ExternalSpillableMap. So, every file listing will lead to re-reading of metadata files partition log and base files. Since files partition is less in size, even when inline reading is enabled, the TimelineServer should construct the FSViewManager with inline reading disabled for metadata files partition. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)