You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by shzhng <gi...@git.apache.org> on 2016/04/18 19:19:44 UTC

[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

GitHub user shzhng opened a pull request:

    https://github.com/apache/spark/pull/12474

    [SPARK-14702] [Core] Expose SparkLauncher's ProcessBuilder for user flexibility

    ## What changes were proposed in this pull request?
    
    making public the ProcessBuilder that SparkLauncher uses to construct the process used in SparkLauncher.launch()
    
    
    ## How was this patch tested?
    
    tests not needed, just exposed/rename a private method


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shzhng/spark SPARK-14702

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12474
    
----
commit 6c96600bcd7bb8e845d03da758cdeab306e82768
Author: Shuo Zheng <sz...@palantir.com>
Date:   2016-04-18T15:37:15Z

    SPARK-14702: Expose SparkLauncher's ProcessBuilder for user flexibility

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by shzhng <gi...@git.apache.org>.
Github user shzhng commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-211519971
  
    @rxin yep, we need the ability to set the working directory of the `SparkLauncher.launch` call, as well as piping output and error streams to various files. Simply giving the user the Process created from the ProcessBuilder grants them none of this flexibility.
    
    There are hacky ways to do both, such as manually grabbing the InputStreams and ErrorStreams, getting the byte arrays of the output, then byte syncing it out to a file with Guava's ByteStreams (example here: http://stackoverflow.com/a/17095886). Setting the working directory is even hackier as it involves messing with `System.properties("user.dir")`, something you should generally avoid messing with. All of this is simplified with the ProcessBuilder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-217981329
  
    I sort of like the idea of exposing the `ProcessBuilder`. While particularly for the working directory we could add a setting, it's cleaner to just use the java API; and for the streams, Java 7 has added some nice APIs that would be more cumbersome to wrap around configs.
    
    The thing that's missing from this PR is to add the same treatment for `startApplication`. I'm a little wary of suggesting an overloaded method that takes a `ProcessBuilder` argument (aside from the list of listeners), because that means you could pass in any instance, not necessarily one that will start a Spark application. But that's the easiest way to do it, and maybe the case I'm thinking can be filed under "user error".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-218470923
  
    Yeah I would be a bit wary of just exposing ProcessBuilder without having a way to do the startApplication.  You don't want the user to have to do the LaunchServer and ChildProc stuff.  startApplication also sets redirectErrorStream, which as mentioned above he seems to want to be able to control also.
    
    On the other hand  I was just looking at better integration with oozie and allowing passing processBuilder into startApplication would allow them to do what they want.  oozie doesn't want to have to rely on the spark-submit script so if they could set any ProcessBuilder they could simply create one that does SparkSubmit.main.  But I would rather just see us add official support for that use case.
    
    I think it comes down to giving the user full control vs us having to support funky use cases and them doing weird things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12474: [SPARK-14702] [Core] Expose SparkLauncher's ProcessBuild...

Posted by shzhng <gi...@git.apache.org>.
Github user shzhng commented on the issue:

    https://github.com/apache/spark/pull/12474
  
    closing, fixed by https://github.com/apache/spark/pull/14201


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12474: [SPARK-14702] [Core] Expose SparkLauncher's Proce...

Posted by shzhng <gi...@git.apache.org>.
Github user shzhng closed the pull request at:

    https://github.com/apache/spark/pull/12474


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by shzhng <gi...@git.apache.org>.
Github user shzhng commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-212516505
  
    @rxin any thoughts on opening this up as part of the API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-211507927
  
    Thanks for the pull request. We can't just change a private to public like this, because it can make the API more difficult to maintain in the long run. Can you justify more why this is needed, and why you can't work around it from the user side?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-211486661
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14702] [Core] Expose SparkLauncher's Pr...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12474#issuecomment-217956015
  
    If that's the issue then I think the right way is to make the working directory configurable or something. We shouldn't just expose this internal API. Also cc @vanzin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org