You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Prasanth Jayachandran (JIRA)" <ji...@apache.org> on 2017/10/02 21:18:00 UTC

[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

    [ https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188856#comment-16188856 ] 

Prasanth Jayachandran commented on HIVE-17669:
----------------------------------------------

[~mithun] Would be better to use bounded cache (guava cache?) as unbounded can lead to slow kill (and adds to GC pressure). Also an option to disable the cache would be good.
Another option is to use MD5 or SHA of the string to avoid large SARG strings taking up too much space (with logging SARG string + MD5/SHA signature.

> Cache to optimize SearchArgument deserialization
> ------------------------------------------------
>
>                 Key: HIVE-17669
>                 URL: https://issues.apache.org/jira/browse/HIVE-17669
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC, Query Processor
>    Affects Versions: 2.2.0, 3.0.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Chris Drome
>         Attachments: HIVE-17699.1.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having use essentially the same {{SearchArgument}} over several files. It would be good not to have to deserialize from string, over and over again. Caching the object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)