You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Misha Dmitriev (JIRA)" <ji...@apache.org> on 2018/05/23 00:45:00 UTC

[jira] [Created] (HIVE-19668) 11.8% of the heap wasted due to duplicate org.antlr.runtime.CommonToken's

Misha Dmitriev created HIVE-19668:
-------------------------------------

             Summary: 11.8% of the heap wasted due to duplicate org.antlr.runtime.CommonToken's
                 Key: HIVE-19668
                 URL: https://issues.apache.org/jira/browse/HIVE-19668
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2
    Affects Versions: 3.0.0
            Reporter: Misha Dmitriev
            Assignee: Misha Dmitriev
         Attachments: image-2018-05-22-17-41-39-572.png

I've recently analyzed a HS2 heap dump, obtained when there was a huge memory spike during compilation of some big query. The analysis was done with jxray ([www.jxray.com).|http://www.jxray.com)./] It turns out that more than 90% of the 20G heap was used by data structures associated with query parsing ({{org.apache.hadoop.hive.ql.parse.QBExpr}}). There are probably multiple opportunities for optimizations here. One of them is to stop the code from creating duplicate instances of {{org.antlr.runtime.CommonToken}} class. See a sample of these objects in the attached image:

!image-2018-05-22-17-41-39-572.png|width=879,height=399!

Looks like these particular {{CommonToken}} objects are constants, that don't change once created. I see some code, e.g. in {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}}, where such objects are apparently repeatedly created with e.g. {{new CommonToken(HiveParser.TOK_INSERT, "TOK_INSERT")}} If these 33 token kinds are instead created once and reused, we will save more than 1/10th of the heap in this scenario. Plus, since these objects are small but very numerous, getting rid of them will remove a gread deal of pressure from the GC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)