You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/20 01:06:58 UTC

[GitHub] [spark] sarthfrey opened a new pull request #27640: Add all gather method to BarrierTaskContext

sarthfrey opened a new pull request #27640: Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589805019
 
 
   Thanks, merged to master/3.0 !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589370138
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589474730
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118737/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589428604
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23488/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589428604
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23488/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r393029589
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   Fair point; why not just `Seq`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289464
 
 
   **[Test build #118733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118733/testReport)** for PR 27640 at commit [`c1d1b0e`](https://github.com/apache/spark/commit/c1d1b0eccccf60b54b4268dee3505d98fa4cb59f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589485680
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118738/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarthfrey commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
sarthfrey commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r393143615
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   I didn't have a particular reason in mind for `ArrayBuffer[String]` over `Array[String]`, @zhengruifeng do you think the latter is preferable here, and if so, why? The returned collection is indexed and sorted by partition ID so I preferred those over `Seq` which is vague about whether it is naturally indexed or linear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440480
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589368724
 
 
   **[Test build #118733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118733/testReport)** for PR 27640 at commit [`c1d1b0e`](https://github.com/apache/spark/commit/c1d1b0eccccf60b54b4268dee3505d98fa4cb59f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440238
 
 
   **[Test build #118738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118738/testReport)** for PR 27640 at commit [`7c259ac`](https://github.com/apache/spark/commit/7c259acd3e447450a18bcdeea47d14ac8514b1bc).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589474724
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289990
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289464
 
 
   **[Test build #118733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118733/testReport)** for PR 27640 at commit [`c1d1b0e`](https://github.com/apache/spark/commit/c1d1b0eccccf60b54b4268dee3505d98fa4cb59f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589370178
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118733/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589287298
 
 
   jenkins, add to whitelist

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r392801222
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   cc @gatorsmile @srowen 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440485
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23489/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588559059
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589370178
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118733/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589485680
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118738/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 closed pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
jiangxb1987 closed pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r391402137
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   friendly ping @jiangxb1987 @sarthfrey 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589428592
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289998
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23484/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589370138
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588563461
 
 
   OK to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588631042
 
 
   """
   /home/runner/work/spark/spark/python/pyspark/taskcontext.py:docstring of pyspark.BarrierTaskContext.getTaskInfos:2:Explicit markup ends without a blank line; unexpected unindent.
   """

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589474730
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118737/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589427940
 
 
   **[Test build #118737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118737/testReport)** for PR 27640 at commit [`3f1f709`](https://github.com/apache/spark/commit/3f1f7091cabbe536140fff1820247ce3a1213f70).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarthfrey commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
sarthfrey commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r393149791
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   Gotcha, will submit a PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589474724
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440485
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23489/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589474095
 
 
   **[Test build #118737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118737/testReport)** for PR 27640 at commit [`3f1f709`](https://github.com/apache/spark/commit/3f1f7091cabbe536140fff1820247ce3a1213f70).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589485673
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440480
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
mengxr commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589287537
 
 
   jenkins, test this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588559402
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588559059
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r393146074
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   OK sure IndexedSeq. or Array is fine. Just something immutable

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289998
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23484/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-588559402
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589427940
 
 
   **[Test build #118737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118737/testReport)** for PR 27640 at commit [`3f1f709`](https://github.com/apache/spark/commit/3f1f7091cabbe536140fff1820247ce3a1213f70).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589428592
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#discussion_r387492294
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -163,6 +155,73 @@ class BarrierTaskContext private[spark] (
       timerTask.cancel()
       timer.purge()
     }
+    json
+  }
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same
+   * stage have reached this routine.
+   *
+   * CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all
+   * possible code branches. Otherwise, you may get the job hanging or a SparkException after
+   * timeout. Some examples of '''misuses''' are listed below:
+   * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it
+   * shall lead to timeout of the function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       if (context.partitionId() == 0) {
+   *           // Do nothing.
+   *       } else {
+   *           context.barrier()
+   *       }
+   *       iter
+   *   }
+   * }}}
+   *
+   * 2. Include barrier() function in a try-catch code block, this may lead to timeout of the
+   * second function call.
+   * {{{
+   *   rdd.barrier().mapPartitions { iter =>
+   *       val context = BarrierTaskContext.get()
+   *       try {
+   *           // Do something that might throw an Exception.
+   *           doSomething()
+   *           context.barrier()
+   *       } catch {
+   *           case e: Exception => logWarning("...", e)
+   *       }
+   *       context.barrier()
+   *       iter
+   *   }
+   * }}}
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit = {
+    runBarrier(RequestMethod.BARRIER)
+    ()
+  }
+
+  /**
+   * :: Experimental ::
+   * Blocks until all tasks in the same stage have reached this routine. Each task passes in
+   * a message and returns with a list of all the messages passed in by each of those tasks.
+   *
+   * CAUTION! The allGather method requires the same precautions as the barrier method
+   *
+   * The message is type String rather than Array[Byte] because it is more convenient for
+   * the user at the cost of worse performance.
+   */
+  @Experimental
+  @Since("3.0.0")
+  def allGather(message: String): ArrayBuffer[String] = {
 
 Review comment:
   Just out of curiosity, why return an `ArrayBuffer[String]` instead of an `Array[String]` here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589485211
 
 
   **[Test build #118738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118738/testReport)** for PR 27640 at commit [`7c259ac`](https://github.com/apache/spark/commit/7c259acd3e447450a18bcdeea47d14ac8514b1bc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589289990
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589485673
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27640: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
URL: https://github.com/apache/spark/pull/27640#issuecomment-589440238
 
 
   **[Test build #118738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118738/testReport)** for PR 27640 at commit [`7c259ac`](https://github.com/apache/spark/commit/7c259acd3e447450a18bcdeea47d14ac8514b1bc).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org