You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "JinsuKim (JIRA)" <ji...@apache.org> on 2016/05/02 04:52:12 UTC
[jira] [Updated] (HIVE-13665) HS2 memory leak When multiple queries are running with get_json_object

     [ https://issues.apache.org/jira/browse/HIVE-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JinsuKim updated HIVE-13665:
----------------------------
    Attachment: patch.lst.txt

> HS2 memory leak When multiple queries are running with get_json_object
> ----------------------------------------------------------------------
>
>                 Key: HIVE-13665
>                 URL: https://issues.apache.org/jira/browse/HIVE-13665
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: JinsuKim
>         Attachments: patch.lst.txt
>
>
> The extractObjectCache in UDFJson is increased over limitation(CACHE_SIZE = 16). When multiple queries are running concurrently on HS2 local(not mr/tez) with get_json_object or get_json_tuple
> {code:java|title=HS2 heap_dump}
> Object at 0x515ab18f8
> instance of org.apache.hadoop.hive.ql.udf.UDFJson$HashCache@0x515ab18f8 (77 bytes)
> Class:
> class org.apache.hadoop.hive.ql.udf.UDFJson$HashCache
> Instance data members:
> accessOrder (Z) : false
> entrySet (L) : <null>
> hashSeed (I) : 0
> header (L) : java.util.LinkedHashMap$Entry@0x515a577d0 (60 bytes) 
> keySet (L) : <null>
> loadFactor (F) : 0.6
> modCount (I) : 4741146
> size (I) : 2733158                   <========== here!!
> table (L) : [Ljava.util.HashMap$Entry;@0x7163d8b70 (67108880 bytes) 
> threshold (I) : 5033165
> values (L) : <null>
> References to this object:
> {code}
> I think that this problem be caused by the LinkedHashMap object is not thread-safe
> {code}
> * <p><strong>Note that this implementation is not synchronized.</strong>
>  * If multiple threads access a linked hash map concurrently, and at least
>  * one of the threads modifies the map structurally, it <em>must</em> be
>  * synchronized externally.  This is typically accomplished by
>  * synchronizing on some object that naturally encapsulates the map.
> {code}
> Reproduce :
> # Multiple queries are running with get_json_object and small input data(for execution on hs2 local mode)
> # jvm heap dump & analyze
> {code:title=test scenario}
> Multiple queries are running with get_json_object and small input data(for execute on hs2 local mode)
> 1.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040105' 
> 2.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040106'
> 3.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040107'
> 4.hql :
> SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040108'
>  
> run.sh :
> t_cnt=0
> while true
> do
> echo "query executing..."
>     for i in 1 2 3 4
>     do
>         beeline -u jdbc:hive2://localhost:10000 -n hive --silent=true -f $i.hql > $i.log 2>&1 &
>     done
> wait
> t_cnt=`expr $t_cnt + 1`
> echo "query count : $t_cnt"
> sleep 2
> done
> jvm heap dump & analyze :
> jmap -dump:format=b,file=hive.dmp $PID
> jhat -J-mx48000m -port 8080 hive.dmp &
> {code}
> Finally I have attached our patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)