You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/10/15 19:57:00 UTC

[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

    [ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650695#comment-16650695 ] 

Paul Rogers commented on IMPALA-7501:
-------------------------------------

Tests show that the fix works. Would be great to automate this, but there appears no ready-to-use heap dump library. The closest appears to be [this one|https://github.com/eaftan/hprof-parser] which requires us to build the code ourselves. For now, did this partially manually using the [Eclipse memory analyer|http://www.eclipse.org/mat/].

To create a heap dump, modified {{LocalCatalogTest.testPartitioning()}} as follows:

{code:java}
  @Test
  public void testCachePruning() throws Exception {
    FeFsTable t = (FeFsTable) catalog_.getTable("functional",  "alltypes");
    CatalogTest.checkAllTypesPartitioning(t, /*checkFileDescriptors=*/false);
    System.gc();
    File dumpFile = new File("/tmp/cache-prune-dump.hprof");
    dumpFile.delete();
    HeapDumper.dumpHeap(dumpFile);
  }
{code}

Ran this before the fix, loaded the .hprof file into the Eclipse Memory Analyzer (MAT) and issued the following MAT query:

{noformat}
SELECT * FROM org.apache.hadoop.hive.metastore.api.Partition
{noformat}

This shows the dozen or so cached partitions and showed that, as expected, the "sd" field held onto the "cols" field with the field schema.

Then, applied the fix in the prior comment and repeated the exercise. No "cols" objects were listed, showing that they were no longer in the heap.

The help dump itself was done using the class shown below. (Probably won't check it into Impala until there is some way to programmatically consume the output, class listed here for reference.)

{code:java}
// Adapted from: https://blogs.oracle.com/sundararajan/programmatically-dumping-heap-from-java-applications
// See also: https://vlkan.com/blog/post/2016/08/12/hotspot-heapdump-threadump/

package org.apache.impala.catalog.local;

import javax.management.MBeanServer;

import java.io.File;
import java.io.IOException;
import java.lang.management.ManagementFactory;
import com.sun.management.HotSpotDiagnosticMXBean;

public class HeapDumper {
  // This is the name of the HotSpot Diagnostic MBean
  private static final String HOTSPOT_BEAN_NAME =
       "com.sun.management:type=HotSpotDiagnostic";

  private static HeapDumper instance;
  private final HotSpotDiagnosticMXBean hotspotMBean;

  private HeapDumper() {
    MBeanServer server = ManagementFactory.getPlatformMBeanServer();
    try {
      hotspotMBean =
          ManagementFactory.newPlatformMXBeanProxy(server,
          HOTSPOT_BEAN_NAME, HotSpotDiagnosticMXBean.class);
    } catch (IOException e) {
      throw new IllegalStateException("Failed to initialize the Hotspot Mbean", e);
    }
  }

  /**
   * Dump the heap to a file
   *
   * @param file the heap dump file
   * @param live true to dump
   *             only the live objects
   * @throws IOException
   */
  static void dumpHeap(File file, boolean live) throws IOException {
    instance().dump(file, live);
  }

  static void dumpHeap(File file) {
    try {
      dumpHeap(file, true);
    } catch (IOException e) {
      throw new RuntimeException("Failed to dump the heap", e);
    }
  }

  public static HeapDumper instance() {
    if (instance == null) {
      synchronized(HeapDumper.class) {
        if (instance == null) {
          instance = new HeapDumper();
        }
      }
    }
    return instance;
  }

  public synchronized void dump(File file, boolean live) throws IOException {
    hotspotMBean.dumpHeap(file.getAbsolutePath(), live);
  }
}
{code}

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit after running a production workload simulation for a couple hours. It had 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M objects are retained by FieldSchema, which, as far as I remember, are ignored on the partition level by the Impala planner. So, with a bit of slimming down of these objects, we could make a huge dent in effective cache capacity given a fixed budget. Reducing object count should also have the effect of improved GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org