You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/21 07:21:01 UTC

[jira] [Updated] (HIVE-9805) LLAP: consider specialized "transient" metadata cache

     [ https://issues.apache.org/jira/browse/HIVE-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis updated HIVE-9805:
--------------------------------------

I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed.

> LLAP: consider specialized "transient" metadata cache
> -----------------------------------------------------
>
>                 Key: HIVE-9805
>                 URL: https://issues.apache.org/jira/browse/HIVE-9805
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap
>            Reporter: Sergey Shelukhin
>            Priority: Major
>             Fix For: llap
>
>
> Due to the nature of cache now (metadata cache + disk cache), when data is read from ORC, whole bunch of processing is still done with metadata, columns, streams, contexts, offsets, etc. to get the data that is in cache. Essentially only the disk reads are eliminated, everything else is as if we are reading an unknown file.
> We could have a better metadata representation that is saved during first read - for example, (file, stripe) -> DiskRange[] (incl. cache buffers that are not locked) + multi-dimensional array per column per stream per RG pointing to offsets in DiskRange array. 
> That way if such structure is found in cache, reader can avoid all the calculation and just do dumb conversion into results to pass to decoder plus disk reading for missing parts. 
> This java cache cannot figure in the main data eviction policy so it should be small. With java objects no cache locking is needed, we can evict while someone is still using the structure, and it will be GCed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)