You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jürgen Thomann (JIRA)" <ji...@apache.org> on 2016/05/18 13:18:12 UTC

[jira] [Commented] (HIVE-13531) Cache in json_tuple UDF grows larger than it should

    [ https://issues.apache.org/jira/browse/HIVE-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288935#comment-15288935 ] 

Jürgen Thomann commented on HIVE-13531:
---------------------------------------

I investigated the problem now a bit more after the second heap dump and this problem can be reproduced if this UDF is used at the same time in multiple queries.

I'm not sure which is the best version to solve this problem, but there are at least 2 possible fixes.
1. Change the HashCache to a synchronized Map which is easily done with Collections.synchronizedMap
2. remove the static from the declaration of jsonObjectCache. I not sure why it is static, but if two different queries uses json_tuple they would use the same cache at the moment which would reduce the effective cache size for each query.

Another thing is the use of INIT_SIZE = 32 and CACHE_SIZE = 16 with a load factor of 0.6f. Wouldn't it make more sense to increase the load factor to nearly one and increase the CACHE_SIZE to 28 or something in that area?

> Cache in json_tuple UDF grows larger than it should
> ---------------------------------------------------
>
>                 Key: HIVE-13531
>                 URL: https://issues.apache.org/jira/browse/HIVE-13531
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 1.1.0
>         Environment: CDH 5.5.0 with Java 1.8.0_45
>            Reporter: Jürgen Thomann
>            Assignee: Jason Dere
>            Priority: Minor
>
> According to the code in ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java the HashCache should never grow larger than 16 entries. In the last OOM of Hive Server 2 I found this HashCache with over 1 million java.util.LinkedHashMap$Entry objects.
> The code looks right and works single threaded as it should when I tested it isolated. The only problem I can imagine with my limited Hive source code knowledge that it is accessed concurrently and somewhere the cleanup with removeEldestEntry is not working in that case.
> I had this problem with Hive 1.1.0 but the current implementation in master looks the same for the HashCache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)