You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "parthchandra (via GitHub)" <gi...@apache.org> on 2023/12/04 18:24:26 UTC
Re: [PR] [SPARK-46094] Add support for code profiling executors [spark]

parthchandra commented on code in PR #44021:
URL: https://github.com/apache/spark/pull/44021#discussion_r1410100843


##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for

Review Comment:
   Because async-profiler requires specific POSIX signals capabilities which Windows implements differently. So async-profiler doesn't support windows. More here: https://github.com/async-profiler/async-profiler/issues/188



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.

Review Comment:
   Replaced with a link that references jdk17.



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>
+  <td>
+      An hdfs compatible path to which the profiler's output files are copied. The output files will be written as <i>outputDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
+      If no outputDir is specified then the files are not copied over. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.localDir</code></td>
+  <td><code>.</code> i.e. the executor's working dir</td>
+  <td>
+   The local directory in the executor container to write the jfr files to. If not specified the file will be written to the executor's working directory. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.fraction</code></td>
+  <td>0.10</td>
+  <td>
+    The fraction of executors on which to enable code profiling. The executors to be profiled are picked at random.  
+  </td>
+  <td>4.0.0</td>
+</tr>
+</table>
+
+### Kubernetes
+On Kubernetes, spark will try to shut down the executor pods while the profiler files are still being saved. To prevent this set 
+```
+  spark.kubernetes.executor.deleteOnTermination=false

Review Comment:
   Done



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorProfilerPlugin.scala:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.util.{Map => JMap}
+
+import scala.jdk.CollectionConverters._
+import scala.util.Random
+
+import org.apache.spark.SparkConf
+import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, PluginContext, SparkPlugin}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EXECUTOR_CODE_PROFILING_ENABLED, EXECUTOR_CODE_PROFILING_FRACTION}
+
+
+/**
+ * Spark plugin to do code profiling of executors
+ *
+ */
+class ExecutorProfilerPlugin extends SparkPlugin {
+  override def driverPlugin(): DriverPlugin = null
+
+  // No-op
+  override def executorPlugin(): ExecutorPlugin = new CodeProfilerExecutorPlugin
+}
+
+class CodeProfilerExecutorPlugin extends ExecutorPlugin with Logging {
+
+  private var sparkConf: SparkConf = _
+  private var pluginCtx: PluginContext = _
+  private var profiler: ExecutorCodeProfiler = _
+  private var codeProfilingEnabled: Boolean = _
+  private var codeProfilingFraction: Double = _
+  private val rand: Random = new Random(System.currentTimeMillis())
+
+  override def init(ctx: PluginContext, extraConf: JMap[String, String]): Unit = {
+    pluginCtx = ctx
+    sparkConf = ctx.conf()
+    codeProfilingEnabled = sparkConf.get(EXECUTOR_CODE_PROFILING_ENABLED)
+    codeProfilingFraction = sparkConf.get(EXECUTOR_CODE_PROFILING_FRACTION)
+
+    if (codeProfilingEnabled) {
+      if (rand.nextInt(100) * 0.01 < codeProfilingFraction) {
+        logInfo(s"Executor id ${pluginCtx.executorID()} selected for code profiling")
+        profiler = new ExecutorCodeProfiler(sparkConf, pluginCtx.executorID())
+        profiler.start()
+      }
+    }
+    Map.empty[String, String].asJava
+  }
+
+  override def shutdown(): Unit = {
+
+    logInfo("Executor code profiler shutting down")
+    if (profiler != null) {
+      profiler.stop()
+    }
+  }
+
+

Review Comment:
   Done



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorProfilerPlugin.scala:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.util.{Map => JMap}
+
+import scala.jdk.CollectionConverters._
+import scala.util.Random
+
+import org.apache.spark.SparkConf
+import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, PluginContext, SparkPlugin}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EXECUTOR_CODE_PROFILING_ENABLED, EXECUTOR_CODE_PROFILING_FRACTION}
+
+
+/**
+ * Spark plugin to do code profiling of executors
+ *
+ */
+class ExecutorProfilerPlugin extends SparkPlugin {
+  override def driverPlugin(): DriverPlugin = null
+
+  // No-op
+  override def executorPlugin(): ExecutorPlugin = new CodeProfilerExecutorPlugin
+}
+
+class CodeProfilerExecutorPlugin extends ExecutorPlugin with Logging {
+
+  private var sparkConf: SparkConf = _
+  private var pluginCtx: PluginContext = _
+  private var profiler: ExecutorCodeProfiler = _
+  private var codeProfilingEnabled: Boolean = _
+  private var codeProfilingFraction: Double = _
+  private val rand: Random = new Random(System.currentTimeMillis())
+
+  override def init(ctx: PluginContext, extraConf: JMap[String, String]): Unit = {
+    pluginCtx = ctx
+    sparkConf = ctx.conf()
+    codeProfilingEnabled = sparkConf.get(EXECUTOR_CODE_PROFILING_ENABLED)
+    codeProfilingFraction = sparkConf.get(EXECUTOR_CODE_PROFILING_FRACTION)
+
+    if (codeProfilingEnabled) {
+      if (rand.nextInt(100) * 0.01 < codeProfilingFraction) {
+        logInfo(s"Executor id ${pluginCtx.executorID()} selected for code profiling")
+        profiler = new ExecutorCodeProfiler(sparkConf, pluginCtx.executorID())
+        profiler.start()
+      }
+    }
+    Map.empty[String, String].asJava
+  }
+
+  override def shutdown(): Unit = {
+

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>
+  <td>
+      An hdfs compatible path to which the profiler's output files are copied. The output files will be written as <i>outputDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
+      If no outputDir is specified then the files are not copied over. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.localDir</code></td>
+  <td><code>.</code> i.e. the executor's working dir</td>
+  <td>
+   The local directory in the executor container to write the jfr files to. If not specified the file will be written to the executor's working directory. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.fraction</code></td>
+  <td>0.10</td>
+  <td>
+    The fraction of executors on which to enable code profiling. The executors to be profiled are picked at random.  
+  </td>
+  <td>4.0.0</td>
+</tr>
+</table>
+
+### Kubernetes

Review Comment:
   🙏🏾 



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler

Review Comment:
   Done



##########
assembly/pom.xml:
##########
@@ -204,6 +204,16 @@
         </dependency>
       </dependencies>
     </profile>
+    <profile>
+      <id>code-profiler</id>

Review Comment:
   Sure



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>
+  <td>
+      An hdfs compatible path to which the profiler's output files are copied. The output files will be written as <i>outputDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>

Review Comment:
   Done



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorCodeProfiler.scala:
##########
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.io.{BufferedInputStream, FileInputStream, InputStream}
+import java.net.URI
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+
+import one.profiler.{AsyncProfiler, AsyncProfilerLoader}
+import org.apache.hadoop.fs.{FileSystem, FSDataOutputStream, Path}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.ThreadUtils
+
+
+/**
+ * A class that enables the async code profiler
+ *

Review Comment:
   Done



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>
+  <td>
+      An hdfs compatible path to which the profiler's output files are copied. The output files will be written as <i>outputDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
+      If no outputDir is specified then the files are not copied over. 

Review Comment:
   Running out of space in the dfs will not affect the job. However, the jfr file may be corrupted. Added the warning.
   Also added the warning for localDir where out of space in the local system may cause the job to fail on K8s.



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorCodeProfiler.scala:
##########
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.io.{BufferedInputStream, FileInputStream, InputStream}
+import java.net.URI
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+
+import one.profiler.{AsyncProfiler, AsyncProfilerLoader}
+import org.apache.hadoop.fs.{FileSystem, FSDataOutputStream, Path}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.ThreadUtils
+
+
+/**
+ * A class that enables the async code profiler
+ *
+ */
+private[spark] class ExecutorCodeProfiler(conf: SparkConf, executorId: String) extends Logging {

Review Comment:
   Renamed



##########
connector/profiler/README.md:
##########
@@ -0,0 +1,86 @@
+# Spark Code Profiler Plugin
+
+## Build
+
+To build 
+``` 
+  ./build/mvn clean package -P code-profiler
+```
+
+## Executor Code Profiling
+
+The spark profiler module enables code profiling of executors in cluster mode based on the the [async profiler](https://github.com/async-profiler/async-profiler/blob/master/README.md), a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures [Java Flight Recorder (jfr)](https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#) files for each executor; these can be read by many tools including Java Mission Control and Intellij.
+
+The profiler writes the jfr files to the executor's working directory in the executor's local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.
+
+Code profiling is currently only supported for
+
+*   Linux (x64)
+*   Linux (arm 64)
+*   Linux (musl, x64)
+*   MacOS
+
+To get maximum profiling information set the following jvm options for the executor -
+
+```
+    -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
+```
+
+For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
+
+
+To enable code profiling, first enable the code profiling plugin via
+
+```
+spark.plugins=org.apache.spark.executor.ExecutorProfilerPlugin
+```
+
+Then enable the profiling in the configuration.
+
+
+### Code profiling configuration
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.executor.profiling.enabled</code></td>
+  <td>
+    <code>false</code>
+  </td>
+  <td>
+    If true, will enable code profiling 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.outputDir</code></td>
+  <td></td>
+  <td>
+      An hdfs compatible path to which the profiler's output files are copied. The output files will be written as <i>outputDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
+      If no outputDir is specified then the files are not copied over. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.localDir</code></td>
+  <td><code>.</code> i.e. the executor's working dir</td>
+  <td>
+   The local directory in the executor container to write the jfr files to. If not specified the file will be written to the executor's working directory. 
+  </td>
+  <td>4.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.profiling.fraction</code></td>
+  <td>0.10</td>
+  <td>
+    The fraction of executors on which to enable code profiling. The executors to be profiled are picked at random.  
+  </td>
+  <td>4.0.0</td>
+</tr>
+</table>
+
+### Kubernetes

Review Comment:
   🙏🏾 



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorProfilerPlugin.scala:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.util.{Map => JMap}
+
+import scala.jdk.CollectionConverters._
+import scala.util.Random
+
+import org.apache.spark.SparkConf
+import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, PluginContext, SparkPlugin}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EXECUTOR_CODE_PROFILING_ENABLED, EXECUTOR_CODE_PROFILING_FRACTION}
+
+
+/**
+ * Spark plugin to do code profiling of executors
+ *

Review Comment:
   Done



##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -799,6 +799,44 @@ package object config {
     .intConf
     .createOptional
 
+  private[spark] val EXECUTOR_CODE_PROFILING_ENABLED =
+    ConfigBuilder("spark.executor.profiling.enabled")
+      .doc("Turn on code profiling via async_profiler in executors.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(false)
+
+  private[spark] val EXECUTOR_CODE_PROFILING_OUTPUT_DIR =
+    ConfigBuilder("spark.executor.profiling.outputDir")
+      .doc("HDFS compatible file-system  path to where the profiler will write output jfr files.")
+      .version("4.0.0")
+      .stringConf
+      .createOptional
+
+  private[spark] val EXECUTOR_CODE_PROFILING_LOCAL_DIR =
+    ConfigBuilder("spark.executor.profiling.localDir")
+      .doc("Local file system path on executor where profiler output is saved. Defaults to the " +
+        "working directory of the executor process.")
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault(".")
+
+  private[spark] val EXECUTOR_CODE_PROFILING_OPTIONS =
+    ConfigBuilder("spark.executor.profiling.options")
+      .doc("Options to pass on to the async profiler.")
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault("event=wall,interval=10ms,alloc=2m,lock=10ms,chunktime=300s")
+
+  private[spark] val EXECUTOR_CODE_PROFILING_FRACTION =
+    ConfigBuilder("spark.executor.profiling.fraction")
+      .doc("Fraction of executors to profile")
+      .version("4.0.0")
+      .doubleConf
+      .checkValue(v => v >= 0.0 && v < 1.0,
+        "Fraction of executors to profile must be in [0,1)")
+      .createWithDefault(0.1)
+

Review Comment:
   Done. We don't need any yarn or kubernetes specific configuration



##########
connector/profiler/src/main/scala/org/apache/spark/executor/ExecutorCodeProfiler.scala:
##########
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.io.{BufferedInputStream, FileInputStream, InputStream}
+import java.net.URI
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+
+import one.profiler.{AsyncProfiler, AsyncProfilerLoader}
+import org.apache.hadoop.fs.{FileSystem, FSDataOutputStream, Path}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.ThreadUtils
+
+
+/**
+ * A class that enables the async code profiler
+ *
+ */
+private[spark] class ExecutorCodeProfiler(conf: SparkConf, executorId: String) extends Logging {
+
+  private var running = false
+  private val enableProfiler = conf.get(EXECUTOR_CODE_PROFILING_ENABLED)
+  private val profilerOptions = conf.get(EXECUTOR_CODE_PROFILING_OPTIONS)
+  private val profilerOutputDir = conf.get(EXECUTOR_CODE_PROFILING_OUTPUT_DIR)
+  private val profilerLocalDir = conf.get(EXECUTOR_CODE_PROFILING_LOCAL_DIR)
+
+  private val startcmd = s"start,$profilerOptions,file=$profilerLocalDir/profile.jfr"
+  private val stopcmd = s"stop,$profilerOptions,file=$profilerLocalDir/profile.jfr"
+  private val dumpcmd = s"dump,$profilerOptions,file=$profilerLocalDir/profile.jfr"
+  private val resumecmd = s"resume,$profilerOptions,file=$profilerLocalDir/profile.jfr"
+
+  private val UPLOAD_SIZE = 8 * 1024 * 1024 // 8 MB
+  private val WRITE_INTERVAL = 30 // seconds

Review Comment:
   I felt there were already too many configuration parameters and I found this to be a good value for real use cases. 
   Making this configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org