You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2015/08/12 19:53:47 UTC

[jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore

    [ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693938#comment-14693938 ] 

Sergey Shelukhin edited comment on HIVE-11500 at 8/12/15 5:53 PM:
------------------------------------------------------------------

The filter should ideally use deserialized columns.
I am wary of implementing interfaces that are too general, especially since different caches have different semantics (i.e. footer cache never expires, but split location cache expires), different return types (as the doc states we may not even return the footer from footer cache with filter request). I think we should use YAGNI principle and add interfaces as needed unless we find we actually need a general cache. Then we can design a general cache.
Having many methods on metastore is not really that big of a deal, since they do different things. The only ones that could be removed are the outdated ones, i.e. where argument method was replaced by request-response methods, and such


was (Author: sershe):
The filter should ideally use deserialized columns.
I am vary of implementing interfaces that are too general, especially since different caches have different semantics (i.e. footer cache never expires, but split location cache expires), different return types (as the doc states we may not even return the footer from footer cache with filter request). I think we should use YAGNI principle and add interfaces as needed unless we find we actually need a general cache. Then we can design a general cache.
Having many methods on metastore is not really that big of a deal, since they do different things. The only ones that could be removed are the outdated ones, i.e. where argument method was replaced by request-response methods, and such

> implement file footer / splits cache in HBase metastore
> -------------------------------------------------------
>
>                 Key: HIVE-11500
>                 URL: https://issues.apache.org/jira/browse/HIVE-11500
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Metastore
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBase metastore split cache.pdf
>
>
> We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore.
> -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits
> In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)