You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/04 04:01:00 UTC

[jira] [Work logged] (HADOOP-18055) Async Profiler endpoint for Hadoop daemons

     [ https://issues.apache.org/jira/browse/HADOOP-18055?focusedWorklogId=703178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703178 ]

ASF GitHub Bot logged work on HADOOP-18055:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jan/22 04:00
            Start Date: 04/Jan/22 04:00
    Worklog Time Spent: 10m 
      Work Description: jojochuang commented on a change in pull request #3824:
URL: https://github.com/apache/hadoop/pull/3824#discussion_r777811736



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/http/TestDisabledProfileServlet.java
##########
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.http;
+
+import java.io.IOException;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import javax.servlet.http.HttpServletResponse;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/**
+ * Small test to cover default disabled prof endpoint.
+ */
+public class TestDisabledProfileServlet extends HttpServerFunctionalTest {
+
+  private static HttpServer2 server;
+  private static URL baseUrl;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+    server = createTestServer();
+    server.start();
+    baseUrl = getServerURL(server);
+  }
+
+  @AfterClass
+  public static void cleanup() throws Exception {
+    server.stop();
+  }
+
+  @Test
+  public void testQuery() throws Exception {
+    try {
+      readOutput(new URL(baseUrl, "/prof"));
+      throw new IllegalStateException("Should not reach here");
+    } catch (IOException e) {
+      assertTrue(e.getMessage()
+          .contains(HttpServletResponse.SC_INTERNAL_SERVER_ERROR + " for URL: " + baseUrl));
+    }
+
+    // CORS headers
+    HttpURLConnection conn =
+        (HttpURLConnection) new URL(baseUrl, "/prof").openConnection();
+    assertEquals("GET", conn.getHeaderField(ProfileServlet.ACCESS_CONTROL_ALLOW_METHODS));
+    assertNotNull(conn.getHeaderField(ProfileServlet.ACCESS_CONTROL_ALLOW_ORIGIN));

Review comment:
       would it be a good idea to call conn.disconnect() to clean up?

##########
File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/ProfileServlet.java
##########
@@ -0,0 +1,394 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.http;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+import org.apache.hadoop.thirdparty.com.google.common.base.Joiner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.util.ProcessUtils;
+
+/**
+ * Servlet that runs async-profiler as web-endpoint.
+ * <p>
+ * Following options from async-profiler can be specified as query paramater.
+ * //  -e event          profiling event: cpu|alloc|lock|cache-misses etc.
+ * //  -d duration       run profiling for 'duration' seconds (integer)
+ * //  -i interval       sampling interval in nanoseconds (long)
+ * //  -j jstackdepth    maximum Java stack depth (integer)
+ * //  -b bufsize        frame buffer size (long)
+ * //  -t                profile different threads separately
+ * //  -s                simple class names instead of FQN
+ * //  -o fmt[,fmt...]   output format: summary|traces|flat|collapsed|svg|tree|jfr|html
+ * //  --width px        SVG width pixels (integer)
+ * //  --height px       SVG frame height pixels (integer)
+ * //  --minwidth px     skip frames smaller than px (double)
+ * //  --reverse         generate stack-reversed FlameGraph / Call tree
+ * <p>
+ * Example:
+ * If Namenode http address is localhost:9870, and ResourceManager http address is localhost:8088,
+ * ProfileServlet running with async-profiler setup can be accessed with
+ * http://localhost:9870/prof and http://localhost:8088/prof for Namenode and ResourceManager
+ * processes respectively.
+ * Deep dive into some params:
+ * - To collect 10 second CPU profile of current process i.e. Namenode (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof"
+ * - To collect 10 second CPU profile of pid 12345 (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?pid=12345" (For instance, provide pid of Datanode)
+ * - To collect 30 second CPU profile of pid 12345 (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?pid=12345&amp;duration=30"
+ * - To collect 1 minute CPU profile of current process and output in tree format (html)
+ * curl "http://localhost:9870/prof?output=tree&amp;duration=60"
+ * - To collect 10 second heap allocation profile of current process (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?event=alloc"
+ * - To collect lock contention profile of current process (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?event=lock"
+ * <p>
+ * Following event types are supported (default is 'cpu') (NOTE: not all OS'es support all events)
+ * // Perf events:
+ * //    cpu
+ * //    page-faults
+ * //    context-switches
+ * //    cycles
+ * //    instructions
+ * //    cache-references
+ * //    cache-misses
+ * //    branches
+ * //    branch-misses
+ * //    bus-cycles
+ * //    L1-dcache-load-misses
+ * //    LLC-load-misses
+ * //    dTLB-load-misses
+ * //    mem:breakpoint
+ * //    trace:tracepoint
+ * // Java events:
+ * //    alloc
+ * //    lock
+ */
+@InterfaceAudience.Private
+public class ProfileServlet extends HttpServlet {
+
+  private static final long serialVersionUID = 1L;
+  private static final Logger LOG = LoggerFactory.getLogger(ProfileServlet.class);
+
+  static final String ACCESS_CONTROL_ALLOW_METHODS = "Access-Control-Allow-Methods";
+  static final String ACCESS_CONTROL_ALLOW_ORIGIN = "Access-Control-Allow-Origin";
+  private static final String ALLOWED_METHODS = "GET";
+  private static final String CONTENT_TYPE_TEXT = "text/plain; charset=utf-8";
+  private static final String ASYNC_PROFILER_HOME_ENV = "ASYNC_PROFILER_HOME";
+  private static final String ASYNC_PROFILER_HOME_SYSTEM_PROPERTY = "async.profiler.home";
+  private static final String PROFILER_SCRIPT = "/profiler.sh";
+  private static final int DEFAULT_DURATION_SECONDS = 10;
+  private static final AtomicInteger ID_GEN = new AtomicInteger(0);
+
+  static final String OUTPUT_DIR = System.getProperty("java.io.tmpdir") + "/prof-output";

Review comment:
       (related: HDDS-5387)

##########
File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/ProfileServlet.java
##########
@@ -0,0 +1,394 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.http;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+import org.apache.hadoop.thirdparty.com.google.common.base.Joiner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.util.ProcessUtils;
+
+/**
+ * Servlet that runs async-profiler as web-endpoint.
+ * <p>
+ * Following options from async-profiler can be specified as query paramater.
+ * //  -e event          profiling event: cpu|alloc|lock|cache-misses etc.
+ * //  -d duration       run profiling for 'duration' seconds (integer)
+ * //  -i interval       sampling interval in nanoseconds (long)
+ * //  -j jstackdepth    maximum Java stack depth (integer)
+ * //  -b bufsize        frame buffer size (long)
+ * //  -t                profile different threads separately
+ * //  -s                simple class names instead of FQN
+ * //  -o fmt[,fmt...]   output format: summary|traces|flat|collapsed|svg|tree|jfr|html
+ * //  --width px        SVG width pixels (integer)
+ * //  --height px       SVG frame height pixels (integer)
+ * //  --minwidth px     skip frames smaller than px (double)
+ * //  --reverse         generate stack-reversed FlameGraph / Call tree
+ * <p>
+ * Example:
+ * If Namenode http address is localhost:9870, and ResourceManager http address is localhost:8088,
+ * ProfileServlet running with async-profiler setup can be accessed with
+ * http://localhost:9870/prof and http://localhost:8088/prof for Namenode and ResourceManager
+ * processes respectively.
+ * Deep dive into some params:
+ * - To collect 10 second CPU profile of current process i.e. Namenode (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof"
+ * - To collect 10 second CPU profile of pid 12345 (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?pid=12345" (For instance, provide pid of Datanode)
+ * - To collect 30 second CPU profile of pid 12345 (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?pid=12345&amp;duration=30"
+ * - To collect 1 minute CPU profile of current process and output in tree format (html)
+ * curl "http://localhost:9870/prof?output=tree&amp;duration=60"
+ * - To collect 10 second heap allocation profile of current process (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?event=alloc"
+ * - To collect lock contention profile of current process (returns FlameGraph svg)
+ * curl "http://localhost:9870/prof?event=lock"
+ * <p>
+ * Following event types are supported (default is 'cpu') (NOTE: not all OS'es support all events)
+ * // Perf events:
+ * //    cpu
+ * //    page-faults
+ * //    context-switches
+ * //    cycles
+ * //    instructions
+ * //    cache-references
+ * //    cache-misses
+ * //    branches
+ * //    branch-misses
+ * //    bus-cycles
+ * //    L1-dcache-load-misses
+ * //    LLC-load-misses
+ * //    dTLB-load-misses
+ * //    mem:breakpoint
+ * //    trace:tracepoint
+ * // Java events:
+ * //    alloc
+ * //    lock
+ */
+@InterfaceAudience.Private
+public class ProfileServlet extends HttpServlet {
+
+  private static final long serialVersionUID = 1L;
+  private static final Logger LOG = LoggerFactory.getLogger(ProfileServlet.class);
+
+  static final String ACCESS_CONTROL_ALLOW_METHODS = "Access-Control-Allow-Methods";
+  static final String ACCESS_CONTROL_ALLOW_ORIGIN = "Access-Control-Allow-Origin";
+  private static final String ALLOWED_METHODS = "GET";
+  private static final String CONTENT_TYPE_TEXT = "text/plain; charset=utf-8";
+  private static final String ASYNC_PROFILER_HOME_ENV = "ASYNC_PROFILER_HOME";
+  private static final String ASYNC_PROFILER_HOME_SYSTEM_PROPERTY = "async.profiler.home";
+  private static final String PROFILER_SCRIPT = "/profiler.sh";
+  private static final int DEFAULT_DURATION_SECONDS = 10;
+  private static final AtomicInteger ID_GEN = new AtomicInteger(0);
+
+  static final String OUTPUT_DIR = System.getProperty("java.io.tmpdir") + "/prof-output";

Review comment:
       because the ProfileServlet is used by several systems (Ozone, HBase, Hive), I would suggest to make the default output dir specific to Hadoop




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 703178)
    Time Spent: 2h  (was: 1h 50m)

> Async Profiler endpoint for Hadoop daemons
> ------------------------------------------
>
>                 Key: HADOOP-18055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18055
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Async profiler ([https://github.com/jvm-profiling-tools/async-profiler]) is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specific APIs to collect stack traces and to track memory allocations. The profiler works with OpenJDK, Oracle JDK and other Java runtimes based on the HotSpot JVM.
> Async profiler can also profile heap allocations, lock contention, and HW performance counters in addition to CPU.
> We have an httpserver based servlet stack hence we can use HIVE-20202 as an implementation template to provide async profiler as servlet for Hadoop daemons. Ideally we achieve these requirements:
>  * Retrieve flamegraph SVG generated from latest profile trace.
>  * Online enable and disable of profiling activity. (async-profiler does not do instrumentation based profiling so this should not cause the code gen related perf problems of that other approach and can be safely toggled on and off while under production load.)
>  * CPU profiling.
>  * ALLOCATION profiling.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org