You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/08 14:14:30 UTC

[GitHub] [spark] fsamuel-bs opened a new pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

fsamuel-bs opened a new pull request #29977:
URL: https://github.com/apache/spark/pull/29977


   ### What changes were proposed in this pull request?
   Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.
   
   ### Why are the changes needed?
   Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:
   
   1. This feature was considered when the ExecutorPlugin API was initially introduced in #21923, but never implemented.
   2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
     a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
     b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.
   
   ### Does this PR introduce _any_ user-facing change?
   No. This PR introduces new features for future developers to use.
   
   ### How was this patch tested?
   Unit tests on `PluginContainerSuite`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r504768749



##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       We catch Throwable on `ExecutorPluginContainer#onTaskStart` and siblings (see https://github.com/apache/spark/pull/29977/files#diff-5e4d939e9bb53b4be2c48d4eb53b885c162c729b9adc874f918f7701a352cdbbR157), so that's what I meant by "not propagate". I.e. if a plugin's `onTaskStart` throws, Spark will log, but won't fail the associated spark task.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r506773523



##########
File path: core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
##########
@@ -129,6 +129,38 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo
     assert(TestSparkPlugin.driverPlugin != null)
   }
 
+  test("SPARK-33088: executor tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    sc.parallelize(1 to 10, 2).count()
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 0)
+  }
+
+  test("SPARK-33088: executor failed tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    try {
+      sc.parallelize(1 to 10, 2).foreach(i => throw new RuntimeException)
+    } catch {
+      case t: Throwable => // ignore exception
+    }
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 0)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 2)

Review comment:
       Hi, folks.
   It turns out that this is a flaky test. I filed a JIRA issue and made PR.
   - https://issues.apache.org/jira/browse/SPARK-33173
   - https://github.com/apache/spark/pull/30072




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709340988


   **[Test build #129841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129841/testReport)** for PR 29977 at commit [`8a5e436`](https://github.com/apache/spark/commit/8a5e43671b3ffc16a6d3630886027cde97380558).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600259


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-707140239


   That is exactly what I wanted to look at some more. 
   There are a few corner cases Executor handles like failing on memory leak. The other thing is that the fetchFailed error is private[spark] in the TaskContext. If something happens such that user/Spark wraps that exception you might not be able to tell that.  There is also the task commit denied exception and then failures due to just stopping early.
   Most of these are corner cases but really it comes down to do we care this if users using this api infer what the real Spark status was in these few cases. they just mostly just have to reproduce some of the logic that Executor does. like check for commit denied exception for instance.   This is a developer api and think it would be ok as long as we document appropriately.
   
   I think we should definitely keep something that allows user to tell if task passed or failed.
   
   @mridulm thoughts on leaving here vs we could just move the task end notification about into the Executor which would be in the same thread still. That feels like it would be more reliable with the plugin knowing same status of task that the scheduler will see.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r506773523



##########
File path: core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
##########
@@ -129,6 +129,38 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo
     assert(TestSparkPlugin.driverPlugin != null)
   }
 
+  test("SPARK-33088: executor tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    sc.parallelize(1 to 10, 2).count()
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 0)
+  }
+
+  test("SPARK-33088: executor failed tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    try {
+      sc.parallelize(1 to 10, 2).foreach(i => throw new RuntimeException)
+    } catch {
+      case t: Throwable => // ignore exception
+    }
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 0)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 2)

Review comment:
       Hi, folks.
   It turns out that this is a flaky test. I filed a JIRA issue and PR.
   - https://issues.apache.org/jira/browse/SPARK-33173
   - https://github.com/apache/spark/pull/30072




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r501903929



##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       What is the expectation in case `onTaskStart` fails - do we want to invoke succeeded/failed ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705731860






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-708933732


   Thanks @tgravescs, I did miss out on other cases where task can fail due to spark infra causing the task to fail (commit denied is a very good example).
   
   If it helps with catching more corner cases, without spi impl's having to duplicate what spark does - that is a great direction to take.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705735424


   > You also have other situations where tasks are killed, which luckily I guess the TaskContext does have a isInterrupted field for that. The executor ends up making some pass/fail decisions that go to the scheduler
   
   Currently we document `isInterrupted` as a task kill by platform (instead of user code related task failure). Essentially we have:
   
   a) Succeeded.
   b) Failed.
   b.1) Task killed by platform
   b.2) Task failed due to user code.
   
   Do we want to distinguish b.1 from b.2 explicitly via api ? (`isInterrupted` currently does that - imo that should suffice).
   Is there any other failure flows ? (ignoring executor crash here).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rshkv commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
rshkv commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r501837781



##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -332,7 +332,8 @@ private[spark] class Executor(
 
   class TaskRunner(
       execBackend: ExecutorBackend,
-      private val taskDescription: TaskDescription)
+      private val taskDescription: TaskDescription,
+      private val plugins: Option[PluginContainer])

Review comment:
       Maybe we can make the `plugins` parameter optional or default to some `EmptyPluginContainer`?:
   ```suggestion
         private val plugins: Option[PluginContainer] = None)
   ```

##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -332,7 +332,8 @@ private[spark] class Executor(
 
   class TaskRunner(
       execBackend: ExecutorBackend,
-      private val taskDescription: TaskDescription)
+      private val taskDescription: TaskDescription,
+      private val plugins: Option[PluginContainer])

Review comment:
       Same for `Task#run`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r506773523



##########
File path: core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
##########
@@ -129,6 +129,38 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo
     assert(TestSparkPlugin.driverPlugin != null)
   }
 
+  test("SPARK-33088: executor tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    sc.parallelize(1 to 10, 2).count()
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 0)
+  }
+
+  test("SPARK-33088: executor failed tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    try {
+      sc.parallelize(1 to 10, 2).foreach(i => throw new RuntimeException)
+    } catch {
+      case t: Throwable => // ignore exception
+    }
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 0)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 2)

Review comment:
       Hi, folks.
   It turns out that this is a flaky test. I filed a JIRA issue and made PR.
   - SPARK-33173
   - https://github.com/apache/spark/pull/30072




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r503323475



##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -332,7 +332,8 @@ private[spark] class Executor(
 
   class TaskRunner(
       execBackend: ExecutorBackend,
-      private val taskDescription: TaskDescription)
+      private val taskDescription: TaskDescription,
+      private val plugins: Option[PluginContainer])

Review comment:
       @rshkv, what is the reason to make this default to None?  This is an internal api and only called from here. It's an option already so people can check it easily.  In some ways its nice to force it so you make sure all uses of it have been updated.  
   Are there cases you know this is used outside Spark?

##########
File path: core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java
##########
@@ -54,4 +54,39 @@ default void init(PluginContext ctx, Map<String, String> extraConf) {}
    */
   default void shutdown() {}
 
+  /**
+   * Perform any action before the task is run.
+   * <p>
+   * This method is invoked from the same thread the task will be executed.
+   * Task-specific information can be accessed via {@link org.apache.spark.TaskContext#get}.
+   * <p>
+   * Plugin authors should avoid expensive operations here, as this method will be called
+   * on every task, and doing something expensive can significantly slow down a job.
+   * It is not recommended for a user to call a remote service, for example.
+   * <p>
+   * Exceptions thrown from this method do not propagate - they're caught,
+   * logged, and suppressed. Therefore exceptions when executing this method won't
+   * make the job fail.
+   */

Review comment:
       add @since 3.1.0 to each of the new functions docs

##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       maybe I'm misunderstanding but the documentation states "Exceptions thrown from this method do not propagate", there is nothing here preventing that. I think perhaps you meant to say the user needs to make sure they don't propagate?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709706320


   Thanks for the reviews @rshkv and @tgravescs !
   Merging to master.
   
   Thanks for the contribution @fsamuel-bs !
   I was not able to assign the jira to you @fsamuel-bs, your id was not showing up - can you reassign it to yourself pls ? Thx


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705604463


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r501903929



##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       What is the expectation in case `onTaskStart` fails - do we want to invoke succeeded/failed ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-710047148


   I added @fsamuel-bs as a contributor in lira and assigned it to him. thanks..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600259






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709369876


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34446/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709464212






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600726


   @vanzin, @tgravescs, @LucaCanali tagging you all as you seem to have worked on the latest refactor on this area of the code on https://github.com/apache/spark/pull/26170.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs edited a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs edited a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-710047148


   I added @fsamuel-bs as a contributor in jira and assigned it to him. thanks..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705604463


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-708933732


   Thanks @tgravescs, I did miss out on other cases where task can fail due to spark infra causing the task to fail (commit denied is a very good example).
   
   If it helps with catching more cornee cases, without spi impl's having to duplicate what spark does - that is a great direction to take.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rshkv commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
rshkv commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r501837781



##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -332,7 +332,8 @@ private[spark] class Executor(
 
   class TaskRunner(
       execBackend: ExecutorBackend,
-      private val taskDescription: TaskDescription)
+      private val taskDescription: TaskDescription,
+      private val plugins: Option[PluginContainer])

Review comment:
       Maybe we can make the `plugins` parameter optional or default to some `EmptyPluginContainer`?:
   ```suggestion
         private val plugins: Option[PluginContainer] = None)
   ```

##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -332,7 +332,8 @@ private[spark] class Executor(
 
   class TaskRunner(
       execBackend: ExecutorBackend,
-      private val taskDescription: TaskDescription)
+      private val taskDescription: TaskDescription,
+      private val plugins: Option[PluginContainer])

Review comment:
       Same for `Task#run`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705691969


   > You also have other situations where tasks are killed, which luckily I guess the TaskContext does have a isInterrupted field for that. The executor ends up making some pass/fail decisions that go to the scheduler
   
   I see. For my use-case, I don't really care about the distinction of `success` vs `fail`, only that a method will be called at least once when the task finishes in the same thread the task is executed on (for me to dismantle the tracing stuff I've built up `onTaskStart`).
   
   Could expose instead an `onTaskCompleted` method, which does not leak the fact that a task succeeded or failed. Or could move triggering `onTaskSucceeded`/`onTaskFailed` from the Executor. Think the latter is better, because it seems useful to have the distinction if a task succeeded or failed, but happy to implement whatever maintainers prefer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #29977:
URL: https://github.com/apache/spark/pull/29977


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r506786690



##########
File path: core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
##########
@@ -129,6 +129,38 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo
     assert(TestSparkPlugin.driverPlugin != null)
   }
 
+  test("SPARK-33088: executor tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    sc.parallelize(1 to 10, 2).count()
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 0)
+  }
+
+  test("SPARK-33088: executor failed tasks trigger plugin calls") {
+    val conf = new SparkConf()
+      .setAppName(getClass().getName())
+      .set(SparkLauncher.SPARK_MASTER, "local[1]")
+      .set(PLUGINS, Seq(classOf[TestSparkPlugin].getName()))
+
+    sc = new SparkContext(conf)
+    try {
+      sc.parallelize(1 to 10, 2).foreach(i => throw new RuntimeException)
+    } catch {
+      case t: Throwable => // ignore exception
+    }
+
+    assert(TestSparkPlugin.executorPlugin.numOnTaskStart == 2)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskSucceeded == 0)
+    assert(TestSparkPlugin.executorPlugin.numOnTaskFailed == 2)

Review comment:
       Thanks @dongjoon-hyun 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600259


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r505202682



##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       Perhaps reword it to say exceptions are ignored ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600726






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705653831


   thanks for adding this. I can definitely see use cases for this and have thought it would be nice to have something like this. 
   
   I want to look at the api details more. You also have other situations where tasks are killed, which luckily I guess the TaskContext does have a isInterrupted field for that.   The executor ends up making some pass/fail decisions that go to the scheduler
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709385508






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705600259


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705653831


   thanks for adding this. I can definitely see use cases for this and have thought it would be nice to have something like this. 
   
   I want to look at the api details more. You also have other situations where tasks are killed, which luckily I guess the TaskContext does have a isInterrupted field for that.   The executor ends up making some pass/fail decisions that go to the scheduler
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709124296


   @mridulm @tgravescs: I've moved triggering the methods from `Task` to `Executor`. Also modified the signature of `onTaskFailed` to receive a `TaskFailedReason` instead of `throwable` to better match what it's sent to the scheduler. Let me know your thoughts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709340988


   **[Test build #129841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129841/testReport)** for PR 29977 at commit [`8a5e436`](https://github.com/apache/spark/commit/8a5e43671b3ffc16a6d3630886027cde97380558).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r502469786



##########
File path: core/src/main/scala/org/apache/spark/scheduler/Task.scala
##########
@@ -123,8 +125,12 @@ private[spark] abstract class Task[T](
       Option(taskAttemptId),
       Option(attemptNumber)).setCurrentContext()
 
+    plugins.foreach(_.onTaskStart())

Review comment:
       That's what I documented on https://github.com/apache/spark/pull/29977/files#diff-6a99ec9983962323b4e0c1899134b5d6R76-R78 -- argument that came to mind is that it's easy for a plugin dev to track some state in a thread-local and clean decide if it wants to perform the succeeded/failed action or not.
   
   Happy to change it if we prefer not to put this burden on the plugin owner though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709385461


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34446/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-705731860


   The changes mostly look good, thanks for working on it @fsamuel-bs !
   Given `TaskContext.get()` exposes a pretty much everything about the task which the plugin could require, we are in a pretty good shape.
   Do we want to include additional metadata to be passed in from driver for a subset of stage/task to the plugin ?
   Currently global info can be passed in via `init` and job properties can be used to pass in metadata; but wondering if we need something more fine grained ... thoughts ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709385508






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709340255






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] fsamuel-bs commented on a change in pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
fsamuel-bs commented on a change in pull request #29977:
URL: https://github.com/apache/spark/pull/29977#discussion_r504766569



##########
File path: core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java
##########
@@ -54,4 +54,39 @@ default void init(PluginContext ctx, Map<String, String> extraConf) {}
    */
   default void shutdown() {}
 
+  /**
+   * Perform any action before the task is run.
+   * <p>
+   * This method is invoked from the same thread the task will be executed.
+   * Task-specific information can be accessed via {@link org.apache.spark.TaskContext#get}.
+   * <p>
+   * Plugin authors should avoid expensive operations here, as this method will be called
+   * on every task, and doing something expensive can significantly slow down a job.
+   * It is not recommended for a user to call a remote service, for example.
+   * <p>
+   * Exceptions thrown from this method do not propagate - they're caught,
+   * logged, and suppressed. Therefore exceptions when executing this method won't
+   * make the job fail.
+   */

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709464212






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29977: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29977:
URL: https://github.com/apache/spark/pull/29977#issuecomment-709462930


   **[Test build #129841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129841/testReport)** for PR 29977 at commit [`8a5e436`](https://github.com/apache/spark/commit/8a5e43671b3ffc16a6d3630886027cde97380558).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org