You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2016/04/12 00:57:51 UTC

[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/12309

    [SPARK-14542][CORE] PipeRDD should allow configurable buffer size for…

    ## What changes were proposed in this pull request?
    
    Currently PipedRDD internally uses PrintWriter to write data to the stdin of the piped process, which by default uses a BufferedWriter of buffer size 8k. In our experiment, we have seen that 8k buffer size is too small and the job spends significant amount of CPU time in system calls to copy the data. We should have a way to configure the buffer size for the writer.
    
    
    ## How was this patch tested?
    Ran PipedRDDSuite tests. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark bufferedPipedRDD

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12309
    
----
commit 697433d49fde2b5f76ab2a7b986e133c435efdc3
Author: Sital Kedia <sk...@fb.com>
Date:   2016-04-11T22:43:04Z

    [SPARK-14542][CORE] PipeRDD should allow configurable buffer size for the stdin writer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62014736
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Yeah I agree with that; the problem is that it may quite reasonably vary from one RDD to another, so it didn't seem right as a global conf. Unless I've missed a third way, an optional param seemed most reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217115858
  
    **[Test build #57874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57874/consoleFull)** for PR 12309 at commit [`bd252b7`](https://github.com/apache/spark/commit/bd252b70294f4e53259ff2c568f43d99114ff2d8).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r59609962
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Buffering here is probably a decent idea, with a small buffer. Is it even necessary to make it configurable? 8K is pretty standard; you've found a larger buffer (32K?) is better. Would you ever want to turn it off or make it quite larger than that? The reason is just that this requires you to change a public API and that's going to require additional steps.
    
    Separately, this needs to specify UTF-8 encoding. Actually, we have this same problem in the stderr and stdout readers above, that they rely on platform encoding. I can sort of see an argument that using platform encoding makes sense when dealing with platform binaries, but, there's still no particular reason to expect the JVM default more often matches whatever some binary is using.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217160976
  
    I don't understand why the MiMa failure is still there. I have added exclusions for them. Any idea?
    ```
    [error]  * method pipe(scala.collection.Seq,scala.collection.Map,scala.Function1,scala.Function2,Boolean)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
    [error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe")
    [error]  * method pipe(java.util.List,java.util.Map,Boolean,Int)org.apache.spark.api.java.JavaRDD in trait org.apache.spark.api.java.JavaRDDLike is present only in current version
    [error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.api.java.JavaRDDLike.pipe")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60207018
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Hm, is it crazy to just set a large buffer size? 1M isn't that much since an executors isn't generally going to run lots of processes. I was thinking some conf parameter might be nicer than an API param, but, I suppose the right buffer size depends on what you're doing.
    
    Maybe an API method argument isn't so bad. I was going to say, this needs to be plumbed through to the Java API, but its `pipe` method already lacks most of the args of the core Scala version. Maybe it's sensible if we do this for 2.0.0, and, fix up the Java API? (Python API works entirely differently anyway here)
    
    @tejasapatil and erm, @andrewor14 or @tgravescs any particular opinions on adding a bufferSize arg to the pipe method?
    
    Yes, if there's an encoding issue it's not introduced by this change of course. In general we never want to depend on the platform encoding. I wonder ... if this is one exception, since it's communicating with platform binaries. It's still not so great, but maybe on second thought that can be left alone for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216258479
  
    I'm still supportive of this change, but I think fleshing out the new Java API method would really complete it, as we'd fix up an inconsistency while making this change. It ought to be one line. It'd also be nice to change or add the existing pipe tests to at least try setting a different buffer size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217416792
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216695350
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12309


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217401291
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r61972694
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -267,10 +267,19 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Return an RDD created by piping elements to a forked external process.
        */
    -  def pipe(command: JList[String], env: java.util.Map[String, String]): JavaRDD[String] =
    +  def pipe(command: JList[String], env: JMap[String, String]): JavaRDD[String] =
         rdd.pipe(command.asScala, env.asScala)
     
       /**
    +   * Return an RDD created by piping elements to a forked external process.
    +   */
    +  def pipe(command: JList[String],
    +           env: JMap[String, String],
    +           separateWorkingDir: Boolean,
    +           bufferSize: Int): JavaRDD[String] =
    +    rdd.pipe(command.asScala, env.asScala, null, null, separateWorkingDir, bufferSize)
    --- End diff --
    
    style:
    ```
    def pipe(
        command: JList[String],
        ...
        bufferSize: Int): JavaRDD[String] = {
      rdd.pipe(...)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216327912
  
    Yeah I think it has to be something like `(a: String => Unit) => printPipeContext.call(new VoidFunction[String]() { override def call(s: String): Unit = a(s) })` I'm happy to punt on this aspect since it's non-trivial and not the intent of your change. If you can create any test code to cover this path I think it's GTG.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r61049111
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -271,6 +271,15 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
         rdd.pipe(command.asScala, env.asScala)
     
       /**
    +   * Return an RDD created by piping elements to a forked external process.
    +   */
    +  def pipe(command: JList[String],
    +           env: java.util.Map[String, String],
    --- End diff --
    
    Optional: wouldn't mind importing this as JMap and using it consistently in this file while we're here.
    
    This can also take arguments for `printPipeContext` and `printRDDElement`. That's not part of your change of course but make sense to fix this omission while adding the all-args API method to support the new arg.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217210632
  
    Ah, I had made the exclusion for v1.6 and we are building v2.0. Moved the exclusion for v2.0 instead, hopefully that will fix the MiMa issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217115873
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57874/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60100586
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -17,10 +17,7 @@
     
     package org.apache.spark.rdd
     
    -import java.io.File
    -import java.io.FilenameFilter
    -import java.io.IOException
    -import java.io.PrintWriter
    +import java.io._
    --- End diff --
    
    Will fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62503986
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -686,6 +686,10 @@ object MimaExcludes {
             ProblemFilters.exclude[IncompatibleMethTypeProblem](
               "org.apache.spark.sql.DataFrameReader.this")
           ) ++ Seq(
    +        // SPARK-14542 configurable buffer size for pipe RDD
    +        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe"),
    --- End diff --
    
    Yes, that's needed. Without it the MiMa tests failed. 
    
    ```
    [error]  * method pipe(scala.collection.Seq,scala.collection.Map,scala.Function1,scala.Function2,Boolean)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
    [error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-209574351
  
    cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216505929
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216373275
  
    Thanks @srowen. I changed one of the test to cover the code path. Let me know what you think. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216270380
  
    @srowen - I think you missed my comment earlier. I totally agree with you that new Java API should be in sync with scala api.   
    
    Repeating my comment below - 
    
    Regarding the arguments for `printPipeContext` and `printRDDElement`, simply passing VoidFunction[String] is not sufficient because `printPipeContext` is of type `(String => Unit) => Unit`. Any idea how to deal with that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62082998
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -45,7 +47,8 @@ private[spark] class PipedRDD[T: ClassTag](
         envVars: Map[String, String],
         printPipeContext: (String => Unit) => Unit,
         printRDDElement: (T, String => Unit) => Unit,
    -    separateWorkingDir: Boolean)
    +    separateWorkingDir: Boolean,
    +    bufferSize: Int)
    --- End diff --
    
    I have excluded the missing old method signature. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62166781
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -214,8 +214,8 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Applies a function f to each partition of this RDD.
        */
    -  def foreachPartition(f: VoidFunction[java.util.Iterator[T]]) {
    -    rdd.foreachPartition(x => f.call(x.asJava))
    +  def foreachPartition(f: VoidFunction[JIterator[T]]) {
    +    rdd.foreachPartition((x => f.call(x.asJava)))
    --- End diff --
    
    Nit: this added extra parens which aren't needed. If you wouldn't mind, while changing this, end the line above with `): Unit = {` to standardize


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r61680334
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -271,6 +271,15 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
         rdd.pipe(command.asScala, env.asScala)
     
       /**
    +   * Return an RDD created by piping elements to a forked external process.
    +   */
    +  def pipe(command: JList[String],
    +           env: java.util.Map[String, String],
    --- End diff --
    
    Okay, I changed to use JMap and JIterator across the file. 
    
    Regarding the arguments for `printPipeContext` and `printRDDElement`, simply passing `VoidFunction[String]` is not sufficient because `printPipeContext`  is of type ` (String => Unit) => Unit`.  Any idea how to deal with that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62189800
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -214,8 +214,8 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Applies a function f to each partition of this RDD.
        */
    -  def foreachPartition(f: VoidFunction[java.util.Iterator[T]]) {
    -    rdd.foreachPartition(x => f.call(x.asJava))
    +  def foreachPartition(f: VoidFunction[JIterator[T]]) {
    +    rdd.foreachPartition((x => f.call(x.asJava)))
    --- End diff --
    
    will fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216583528
  
    Fixed checkstyle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216695354
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57687/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217556200
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58016/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-218183909
  
    Thanks for the review @srowen. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217530991
  
    **[Test build #58016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58016/consoleFull)** for PR 12309 at commit [`9efb2cf`](https://github.com/apache/spark/commit/9efb2cfbbc48a7929819172d22ec18b7de9e67b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216506752
  
    **[Test build #57634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57634/consoleFull)** for PR 12309 at commit [`a193a47`](https://github.com/apache/spark/commit/a193a4760698e1d9425eba07e0de9ce2e2a64835).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216506996
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57634/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62189778
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -596,7 +610,8 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Returns the maximum element from this RDD as defined by the specified
        * Comparator[T].
    -   * @param comp the comparator that defines ordering
    +    *
    --- End diff --
    
    good eye, will fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217489895
  
    I don't understand `./dev/mima` passes on my laptop. I also verified that `./dev/mima` fails without my changes in `MimaExcludes.scala`. Something weird with the Jenkins build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217416794
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57979/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r59609978
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -17,10 +17,7 @@
     
     package org.apache.spark.rdd
     
    -import java.io.File
    -import java.io.FilenameFilter
    -import java.io.IOException
    -import java.io.PrintWriter
    +import java.io._
    --- End diff --
    
    (Don't collapse these)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217704387
  
    Aside from one final question I think this is OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62429048
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -686,6 +686,10 @@ object MimaExcludes {
             ProblemFilters.exclude[IncompatibleMethTypeProblem](
               "org.apache.spark.sql.DataFrameReader.this")
           ) ++ Seq(
    +        // SPARK-14542 configurable buffer size for pipe RDD
    +        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe"),
    --- End diff --
    
    Is this one still needed? I'd think MiMa is fine with the Scala API change because there isn't now a method invocation that no longer works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216693441
  
    **[Test build #57687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57687/consoleFull)** for PR 12309 at commit [`a674b3c`](https://github.com/apache/spark/commit/a674b3c53165379643f1edc99e5a45afa56ed107).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-208603933
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60339435
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Introducing a new configuration has a limitation that it will force the entire pipeline (which might have several pipe() operations) to use the same buffer size globally. I prefer this to be in the API itself and preferably should be backward compatible so that existing jobs are not affected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217556199
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60100537
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    @srowen  - Thanks for taking a look. In our testing we found out that using a buffer of large size (1 MB) gives us a cpu savings of around 15%. It makes sense to be able to increase the buffer size when we are piping a large amount of data. If changing a public API is not too much trouble, it would be pretty useful for us to have a configurable buffer size. 
    
    Regarding your second point, I am not sure if I understand you. My change is not going to change the behavior of the PrintWriter at all.  Do you mean to say the issue with UTF-8 encoding already exists and I should fix it in this diff?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216693238
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217416672
  
    **[Test build #57979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57979/consoleFull)** for PR 12309 at commit [`9efb2cf`](https://github.com/apache/spark/commit/9efb2cfbbc48a7929819172d22ec18b7de9e67b2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62015242
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -45,7 +47,8 @@ private[spark] class PipedRDD[T: ClassTag](
         envVars: Map[String, String],
         printPipeContext: (String => Unit) => Unit,
         printRDDElement: (T, String => Unit) => Unit,
    -    separateWorkingDir: Boolean)
    +    separateWorkingDir: Boolean,
    +    bufferSize: Int)
    --- End diff --
    
    This causes a MiMa failure. This could be resolved with a default value for this arg; normally that would be essential although we could also just exclude the failure on the missing old method signature. I don't have a strong feeling but suppose it makes sense to have a default value?
    
    ```
    [error]  * method pipe(scala.collection.Seq,scala.collection.Map,scala.Function1,scala.Function2,Boolean)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
    [error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe")
    [error]  * method pipe(java.util.List,java.util.Map,Boolean,Int)org.apache.spark.api.java.JavaRDD in trait org.apache.spark.api.java.JavaRDDLike is present only in current version
    [error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.api.java.JavaRDDLike.pipe")
    [info] spark-mllib: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-mllib_2.11:1.6.0  (filtered 498)
    ```
    
    The other failure in JavaRDDLike can be excluded safely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-218175299
  
    Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217114483
  
    **[Test build #57874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57874/consoleFull)** for PR 12309 at commit [`bd252b7`](https://github.com/apache/spark/commit/bd252b70294f4e53259ff2c568f43d99114ff2d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216271026
  
    Oops, right, that got collapsed. Hm, so it's actually a `VoidFunction[VoidFunction[String]]` then? a little bit trickier but should still be fairly easy to support. Give it a shot and see if that just works out pretty easily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217513410
  
    Ah, this time its not the MiMa failure, seems like some flaky test failed - https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57979/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217113939
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217115871
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60806867
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    That makes sense. I will add the overloaded method to the Java API as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217402183
  
    **[Test build #57979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57979/consoleFull)** for PR 12309 at commit [`9efb2cf`](https://github.com/apache/spark/commit/9efb2cfbbc48a7929819172d22ec18b7de9e67b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216274532
  
    So I tried the following, but it does not work. 
    
    ```
      def pipe(command: JList[String],
               env: JMap[String, String],
               printPipeContext: VoidFunction[VoidFunction[String]],
               separateWorkingDir: Boolean,
               bufferSize: Int): JavaRDD[String] =
        rdd.pipe(command.asScala, env.asScala, s => printPipeContext.call(s),
          null, separateWorkingDir, bufferSize)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216695311
  
    **[Test build #57687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57687/consoleFull)** for PR 12309 at commit [`a674b3c`](https://github.com/apache/spark/commit/a674b3c53165379643f1edc99e5a45afa56ed107).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217530027
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r61010207
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    @srowen - I added a Java api to specify `separateWorkingDir` and `bufferSize`. For `printPipeContext` and `printRDDElement`, these are functions which take another function as argument and there is no straightforward way to specify these in Java (probably that's the reason they are left out in the first place). Let me know what you think about the change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216506990
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r61973091
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    It's probably OK to expose the buffer size, though in other places the buffer size is usually a config. If there's a real use case (and it seems like there is) then maybe it's fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217438883
  
    Sure just `./dev/mima`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60610117
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -17,10 +17,7 @@
     
     package org.apache.spark.rdd
     
    -import java.io.File
    -import java.io.FilenameFilter
    -import java.io.IOException
    -import java.io.PrintWriter
    +import java.io._
    --- End diff --
    
    fixed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60732084
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    OK, I personally think this is good to merge if we also update the Java API to include an overload of this method that exposes all of these args, including the new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62083038
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -267,10 +267,19 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Return an RDD created by piping elements to a forked external process.
        */
    -  def pipe(command: JList[String], env: java.util.Map[String, String]): JavaRDD[String] =
    +  def pipe(command: JList[String], env: JMap[String, String]): JavaRDD[String] =
         rdd.pipe(command.asScala, env.asScala)
     
       /**
    +   * Return an RDD created by piping elements to a forked external process.
    +   */
    +  def pipe(command: JList[String],
    +           env: JMap[String, String],
    +           separateWorkingDir: Boolean,
    +           bufferSize: Int): JavaRDD[String] =
    +    rdd.pipe(command.asScala, env.asScala, null, null, separateWorkingDir, bufferSize)
    --- End diff --
    
    changed, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r62166810
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -596,7 +610,8 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       /**
        * Returns the maximum element from this RDD as defined by the specified
        * Comparator[T].
    -   * @param comp the comparator that defines ordering
    +    *
    --- End diff --
    
    Tiny nit, this has accidentally indented some unrelated docs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217435879
  
    @srowen - Do you know how I can run the MiMa test locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-216506984
  
    **[Test build #57634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57634/consoleFull)** for PR 12309 at commit [`a193a47`](https://github.com/apache/spark/commit/a193a4760698e1d9425eba07e0de9ce2e2a64835).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12309#issuecomment-217556001
  
    **[Test build #58016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58016/consoleFull)** for PR 12309 at commit [`9efb2cf`](https://github.com/apache/spark/commit/9efb2cfbbc48a7929819172d22ec18b7de9e67b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org