You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2014/08/01 12:34:47 UTC

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/1715

    [SPARK-2678][Core] Added "--" to prevent spark-submit from shadowing application options

    JIRA issue: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
    
    This PR aims to fix SPARK-2678 in a downward compatible way, and replaces PR #1699.
    
    All arguments that follow a `--` are passed to user application. Before this change, `spark-submit` shadows application options:
    
    ```bash
    # The "--help" option is captured by spark-submit
    bin/spark-submit --class Foo userapp.jar --help
    ```
    
    With the help of `--`, we can pass arbitrary options to user application:
    
    ```bash
    # The "--help" option is now passed to userapp.jar
    bin/spark-submit --class Foo userapp.jar -- --help
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-2678

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1715
    
----
commit 27741380631e4e293d040d8fc206343814467b37
Author: Cheng Lian <li...@gmail.com>
Date:   2014-08-01T10:27:41Z

    Added "--" to prevent spark-submit from shadowing application options

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50975678
  
    I personally find putting the application jar under either `--primary` or `--jars` pretty confusing. If they pass it to `--jars`, then now they have to worry about the ordering of the jars, because the first one will be parsed as the application jar. I think it should just be an argument by itself as it is today.
    
    Instead, I propose that we rename `--` to something more specific to Spark, such as `--spark-application-args`, such that everything that follows will be interpreted as application args. This essentially reduces the namespace of collision to the single config that we introduce in this PR (`--` or `--spark-application-args` or some better name).
    
    ```
    bin/spark-submit
      spark-internal
      --class [...].HiveThriftServer2
      --master local[*]
      --spark-application-args --master something --hiveconf k=v
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51000564
  
    @liancheng @andrewor14 by the way - I also had one other thought. Another option is that we could add a configuration property to support the old parsing behavior where `--` is not treated specially.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1715


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15727020
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             verbose = true
             parse(tail)
     
    +      case "--" :: tail =>
    +        if (inSparkOpts) {
    +          SparkSubmit.printErrorAndExit(
    +            "Application option separator \"--\" must be after the primary resource " +
    +              "(i.e., application jar file or Python file).")
    +        } else {
    +          childArgs ++= tail.filter(_.nonEmpty)
    --- End diff --
    
    This behavior is actually inherited from [the current master code](https://github.com/apache/spark/blob/e8e0fd691a06a2887fdcffb2217b96805ace0cb0/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L347-L349). But I agree with you, since `--` is a newly introduced argument passing mode, we should allow empty argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50972190
  
    I agree with @andrewor14 that we should rename the `--primary` option if there're any better candidates though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50968172
  
    QA tests have started for PR 1715. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17771/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15727021
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             verbose = true
             parse(tail)
     
    +      case "--" :: tail =>
    +        if (inSparkOpts) {
    +          SparkSubmit.printErrorAndExit(
    --- End diff --
    
    Fair, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15731536
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -322,6 +343,14 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
                   val errMessage = s"Unrecognized option '$value'."
                   SparkSubmit.printErrorAndExit(errMessage)
    --- End diff --
    
    and they only get here if they do `spark-submit --class Foo --hiveconf /some/path app.jar`. Then it's fine to leave this as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51093637
  
    Let's say we're using the `--jars` approach, and I run the following two commands. (Correct me if I'm understanding your proposal incorrectly; I am basing this off @liancheng's pseudocode in an earlier comment.) Here I am assuming that `--arg1` and `--arg2` are also spark-submit arguments:
    ```
    bin/spark-submit --jars app.jar --arg1 -- --arg2
    ```
    Here we treat `--arg1` as a spark-submit argument, but pass `--arg2` to the application. The user may also specify the primary jar explicitly as follows:
    ```
    bin/spark-submit app.jar --arg1 -- --arg2
    ```
    Here we treat both `--arg1` and `--arg2` as spark-submit arguments, but we may pass `--` to the application depending on what `--arg1` is. For instance, if `--arg1` is `--name` and takes in a value, then `--` will become the app name. Otherwise, if `--arg1` is `--supervise`, which does not take in a value, then `--` will be passed to the application.
    
    From the user's perspective, the ways we specify the primary resource in these two commands are near equivalent. However, the arguments are actually parsed very differently. On the other hand, if we simply add a new Spark specific config (`--spark-application-args` or something) and keep the way we specify the primary resource the same, we get backwards compatibility for free while providing this new functionality. This latter approach just seems simpler to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51089001
  
    @pwendell How about for python files? What if I have one.py and two.py that reference each other, and I want spark-submit to run the main method of one.py but not two.py. Since we don't specify a class, I'm not sure how we can distinguish between the two main methods unless we impose the requirement that the primary python file must be the first one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730602
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -322,6 +343,14 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
                   val errMessage = s"Unrecognized option '$value'."
                   SparkSubmit.printErrorAndExit(errMessage)
    --- End diff --
    
    Confirmed this locally with current master code. `spark-submit` passes `--*` like argument correctly to the application as long as the user put it after the primary resource.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50952515
  
    I mean that they specify it not as a required argument, but as part of the `--jars` option we already expose:
    
    ```
    ./spark-submit --master local --jars myJar.jar -- --userOpt a --userOpt b
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730463
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -322,6 +343,14 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
                   val errMessage = s"Unrecognized option '$value'."
                   SparkSubmit.printErrorAndExit(errMessage)
    --- End diff --
    
    Are you indicating cases like `spark-submit --class Foo app.jar --arg`? Actually `--arg` can be passed to `app.jar` correctly since `inSparkOpts` is set to `false` when `--arg` is being processed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50963026
  
    QA tests have started for PR 1715. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17769/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50982344
  
    @andrewor14 Interesting, actually this is almost exactly the same solution I came across at the very beginning :)
    
    The only difference is that we chose `--` rather than some more intuitive option name like `--spark-application-args`. And `--` was chosen because it's an idiomatic way among UNIX-like systems to pass this kind of "user application options".
    
    The reason that we (@pwendell and me) gave it up after discussion is that this solution is actually not fully downward compatible, it breaks existing user applications which already recognize `--` as a valid option. Turning `--` into something more specific like `--spark-application-args` does reduce the probability of name collision. Especially, after this change, we won't have similar compatibility issue whenever we add any new options to `spark-submit` in the future. @pwendell Maybe this is acceptable?
    
    And I agree with your arguments about the drawbacks of putting application jar into `--jars`. Similar arguments applies to Python application. That is also an important reason that I introduced `--primary` at the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51146747
  
    Sure, that sounds good. The only thing if you use the first entry in `--jar` is I wouldn't automatically look for a main class in that (there's this part that checks the JAR manifest). Instead, make them use `--main-class` in that case. Otherwise it's a bit confusing that you give only JARs and it starts running some program.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15710856
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             verbose = true
             parse(tail)
     
    +      case "--" :: tail =>
    +        if (inSparkOpts) {
    +          SparkSubmit.printErrorAndExit(
    --- End diff --
    
    Couldn't you just omit this and just delegate to the existing check in `checkRequiredArguments()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15710924
  
    --- Diff: sbin/start-thriftserver.sh ---
    @@ -26,11 +26,16 @@ set -o posix
     # Figure out where Spark is installed
     FWDIR="$(cd `dirname $0`/..; pwd)"
     
    -if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
    -  echo "Usage: ./sbin/start-thriftserver [options]"
    -  $FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    +CLASS="org.apache.spark.sql.hive.thriftserver.HiveThriftServer2"
    +
    +if [[ "$@" = --help ]] || [[ "$@" = -h ]]; then
    +  echo "Usage: ./sbin/start-thriftserver.sh [options] [--] [thrift server options]"
    +  exec "$FWDIR"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    --- End diff --
    
    Same deal with `exec` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51017797
  
    @pwendell As for the difference between application jars (or secondary jars) and primary jar, what is important is the way `SparkSubmit` handles them: primary jar is ensured to be the first jar in the CLASSPATH, so that the main class specified by `--class` will never be shadowed by some class of the same name in some secondary jar. So removing primary resource from the user interface (command line options) means:
    
    1. Users must add the primary resource (either jar file or Python file) in front of all other jars/python files
    2. Internally, the first jar or first python file should be considered as primary
    
    IMHO, these requirements are too implicit...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50955875
  
    Discussed with @pwendell offline, record a summary here:
    
    1. We leave the current option passing mode as is to keep downward compatibility
    1. Introducing `--` as a new application option passing mode, but should be used mutually exclusive with primary resource (either Jar file or Python file).
    
       Users need to use `--jars` and/or `--py-files` to specify the primary resource if they'd like to use `--` as application option separator.
    
    So basically, we'll have this:
    
    ```
    if primary resource exists:
      resort to the old option passing mode
    else if -- exists:
      pass everything after -- to the application
      if --py-files exists:
        isPyhon = true
      else:
        isPython = false
    else:
      report error
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50968039
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51055752
  
    @pwendell OK, the `java -firstCpElement` example really convinced me :) I used to think asking users to care about the order of the jars is a little too much, but every sane programmer over JVM should care about this anyway.
    
    I'll try to deliver a version as you described soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50938511
  
    I don't think it's unavoidable if we make specifying a main jar mutually exclusive with using `--` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51089761
  
    @andrewor14 I believe Patrick means:
    
    1. For Scala/Java applications, the primary jar should appear as the 1st entry of `--jars`
    1. For Python applications, the primary Python file should appear as the 1st entry of `--py-files`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51145806
  
    Thanks Matei, Patrick's last a few comments have already convinced me to remove the "primary" notion from a user's perspective. And yes, `spark-internal` can be removed in this way, `spark-shell` and `pyspark-shell` can also be removed by checking `--class` in Spark scripts. Internally, to keep code relies on `primaryResource` intact (one example is the cluster deploy mode in a standalone cluster), we can still pick the first entry in `--jar` for Java/Scala apps and `--main-file` for Python apps as primary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50959924
  
    `SparkSubmit` uses `primaryResource` in various places, omitting primary resource makes the change hard to be correct in every corner case. Especially, in Python applications, primary resource is used to located the primary Python script. If we simply put it in `--py-files`, we'll have to add more constraints like "Python primary resource put in `--py-files` must appear as the first file", which can be rather unintuitive for users.
    
    Instead of omitting it, I'd suggest to add a new option `--primary` to specify the primary resource when `--` is used. Thus, the new option passing style looks something like:
    
    ```
    bin/spark-submit --class Foo --primary user.jar --master local -- --arg1 --arg2
    ```
    
    To be more precise, the new mode consists of both `--primary` and `--`, and is used mutually exclusively with the positional primary resource mode. In this way, we needn't to change `SparkSubmit` and can also keep full downward compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15727018
  
    --- Diff: bin/spark-sql ---
    @@ -26,11 +26,16 @@ set -o posix
     # Figure out where Spark is installed
     FWDIR="$(cd `dirname $0`/..; pwd)"
     
    -if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
    -  echo "Usage: ./sbin/spark-sql [options]"
    -  $FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    +CLASS="org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver"
    +
    +if [[ "$@" = --help ]] || [[ "$@" = -h ]]; then
    +  echo "Usage: ./sbin/spark-sql [options] [--] [CLI options]"
    +  exec "$FWDIR"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    --- End diff --
    
    Actually if redirection is used, `exec` does return and the following code gets executed ([reference](http://wiki.bash-hackers.org/commands/builtin/exec)). And during my local test, it works as expected.
    
    But indeed, I didn't notice this issue while adding this change. Should use `exec` more carefully. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15731522
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -322,6 +343,14 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
                   val errMessage = s"Unrecognized option '$value'."
                   SparkSubmit.printErrorAndExit(errMessage)
    --- End diff --
    
    Ah I see. The whole point of this PR is not to make arguments like `--hiveconf` work, but to make arguments like an application-specific version of `--master` work. Say if the user did `spark-submit --class Foo app.jar --master` then this will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51137420
  
    Hey Cheng, a couple of comments on this:
    - In Java, we should be able to run without a `--primary` if only `--jars` and `--main-class` are given. I'd rather support that than add the primary argument here.
    - If we do this, then the `--primary` argument becomes relevant only for Python, and we can give it a specific name there, e.g. `--main-file`. Another option is to add `--module` and execute the `__main__` function in a specific module with `python -m module-name`. This is the more Pythonic way to do it AFAIK but it will also be more work, so I'd just stick to `--main-file` in 1.1 and add `--module` later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50970982
  
    The main point is that we need a new mode to pass user arguments "correctly", and retain the old style for downward compatibility. At first @pwendell suggested that we can just omit primary resource and ask user to put it in `--jars` and/or `--py-files` in the new mode. But that means `SparkSubmitArguments.primaryResource` can be `null`, and may potentially break many corner cases that rely on this assumption. That is the reason I added `--primary`: to make the new mode mutually exclusive with the old one, and at the same time minimize the changes and the risk of breaking any corner cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50938810
  
    Patrick, I'm not sure what you mean by that. How would someone not specify a main jar?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50964497
  
    QA results for PR 1715:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17768/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15710486
  
    --- Diff: bin/beeline ---
    @@ -17,29 +17,14 @@
     # limitations under the License.
     #
     
    -# Figure out where Spark is installed
    --- End diff --
    
    These changes seem unrelated. Is there a bug you can mention? Otherwise, could you call them out explicitly in the PR description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730030
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -349,7 +378,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
           outStream.println("Unknown/unsupported param " + unknownParam)
         }
         outStream.println(
    -      """Usage: spark-submit [options] <app jar | python file> [app options]
    +      """Usage:
    +        |  spark-submit [options] <app jar | python file> [app options]
    +        |  spark-submit [options] --primary <app jar | python file> -- [app options]
    +        |
    --- End diff --
    
    This changes the length of the help output message. Have you verified that all the other scripts (spark-class, spark-submit, pyspark etc.) report the entirety of the help message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50918375
  
    A few nits. Also, just wanted to point out that one aspect of backwards compatibility, pointed out by Patrick in #1699, is that this will break things for people who are passing "--" to their apps today. I think that's very unlikely and pretty unavoidable, since spark-submit's current argument parsing is a little questionably to start with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730133
  
    --- Diff: docs/submitting-applications.md ---
    @@ -35,17 +35,23 @@ dependencies, and can support different cluster managers and deploy modes that S
       --deploy-mode <deploy-mode> \
       --conf <key>=<value> \
       ... # other options
    -  <application-jar> \
    +  --primary <application-jar> \
    +  -- \
       [application-arguments]
     {% endhighlight %}
     
    +(**NOTE** Prior to Spark 1.1, `--primary` and `--` are not required. The old application option
    +passing mode is still supported. Start from Spark 1.1, `--` is used as user application option
    +separator, to let user application receive arbitrary command line option, including those are once
    +swallowed by `spark-submit`, like `--master`.)
    --- End diff --
    
    "Since Spark 1.1, `--` is used as a separator for user application options, such that anything that follows will be passed as command line arguments to the application. This includes arguments that were once swallowed by `spark-submit`, e.g. `--hiveconf`"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15727017
  
    --- Diff: bin/beeline ---
    @@ -17,29 +17,14 @@
     # limitations under the License.
     #
     
    -# Figure out where Spark is installed
    --- End diff --
    
    My bad, you're right. I should update the PR description.
    
    While working on #1620, the `beeline` script had once been implemented with `spark-submit` to avoid duplicated `java` check and computing classpath, but then reverted because of the issue this PR is trying to fix (`beeline --help` shows `spark-submit` usage message).
    
    And while working on this PR, I realized that `beeline` is only a JDBC client, and unrelated to Spark, I can just start it with `spark-class`. That's the reason why this change appear here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50968956
  
    @liancheng @pwendell Is the argument for adding a `--primary` that the application might accept `--` as an argument? For example, if the user does the following:
    ```
    bin/spark-submit app.jar apparg1 -- apparg2
    ```
    It's impossible to know whether the arguments are `apparg1 -- apparg2` or just `apparg1 apparg2`. Then it seems that `--` should always be used with `--primary`. Maybe we should make this point more explicit in the docs and error messages (i.e. don't only mention one or the other).
    
    Have you considered other alternatives for the name "primary"? It makes sense to us because we refer the user application jar / python file as the primary resource, but it may not make as much sense to the user. At the same time I'm not sure if there is a better alternative. (I'm thinking about `--application-jar`, but that means we'll need something like `--application-file` for python. Maybe this is not worth the hassle).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51090902
  
    Hm, I see. Even then we still need some kind of separator right? I thought the whole point of handling primary resources differently here (either under `--primary` or `--jars` or `--py-files`) is to provide backwards compatibility in case the user application uses `--`. If we pick a Spark specific enough separator, doesn't this issue go away?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50971219
  
    @liancheng can you explain what corner case you are thinking of? I actually don't like `--primary` because, as said before, I was hoping to avoid the entire concept of a "primary" resource. It doesn't really make sense in the java context where users might either (a) multiple jars or (b) no jars... in both of these cases there is really no semantic meaning of a primary resource, so it's slightly awkward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50875821
  
    QA results for PR 1715:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17671/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730471
  
    --- Diff: docs/submitting-applications.md ---
    @@ -35,17 +35,23 @@ dependencies, and can support different cluster managers and deploy modes that S
       --deploy-mode <deploy-mode> \
       --conf <key>=<value> \
       ... # other options
    -  <application-jar> \
    +  --primary <application-jar> \
    +  -- \
       [application-arguments]
     {% endhighlight %}
     
    +(**NOTE** Prior to Spark 1.1, `--primary` and `--` are not required. The old application option
    +passing mode is still supported. Start from Spark 1.1, `--` is used as user application option
    +separator, to let user application receive arbitrary command line option, including those are once
    +swallowed by `spark-submit`, like `--master`.)
    --- End diff --
    
    Thanks for the rewording, except for one thing: `--hiveconf` can't be swallowed. Actually only those options that have the same name as those recognized by `spark-submit` are swallowed, like `--master`, `--conf`, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51015286
  
    @pwendell I'd agree with @andrewor14, adding a configuration may make it harder to follow.
    
    At least `--spark-application-args` looks like a satisfying solution now. Need more eyes on this though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50962830
  
    QA tests have started for PR 1715. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17768/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50996861
  
    @andrewor14 is there any function difference between the application jars and the primary jar? This is why I support removing the concept of a primary jar... from what I can tell there is no difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51024718
  
    @liancheng why not just have the jars listed on the classpath in the order they are given to us? This is also how classpaths work in general, when I run a java command, I don't give a special flag for the first element in the classpath, I just put it first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51024798
  
    @andrewor14 I still don't understand how this is different. Basically, the JVM works such that you put a set of jars in order (indicating precedence) and then you can refer to a class defined in any of the jars. Why should we differ from the normal JVM semantics and have a special name for the first jar in the class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51006757
  
    Well my understanding is that `--class` refers to a class that's in the primary jar, and this class may refer to other classes in other secondary jars.
    
    I also thought about the config thing too, but that means spark-submit behavior would be different if you had some hidden configuration somewhere, which is also not super intuitive in my opinion. Imagine if you intended to pass `--` to your application, but because you forgot to remove this config your application behaves differently (but does not fail immediately).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50972156
  
    Some examples:
    
    1. For Python applications, we need a primary Python file to construct the arguments for `PythonRunner`
    1. Logic related to `spark-shell`, `pyspark-shell` and `spark-internal`
    1. In standalone cluster mode, `org.apache.spark.deploy.Client` requires a primary user jar to pass to `DriverRunner` ([1](https://github.com/apache/spark/blob/87738bfa4051771ddfb8c4a4c1eb142fd77e3a46/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L253), [2](https://github.com/apache/spark/blob/87738bfa4051771ddfb8c4a4c1eb142fd77e3a46/core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala#L71), [3](https://github.com/apache/spark/blob/87738bfa4051771ddfb8c4a4c1eb142fd77e3a46/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala#L145))
    
    I understand that "primary" is more a internal term, but it's still well defined: for Java applications, conceptually the jar file containing the main class that starts he Spark application should be the primary one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-51137532
  
    BTW if we do this then there's no need to special-case `spark-internal` and such, right? We can just pass a class name. Or let me know if that's not true. In that case I'd say we should have a "for Spark use only" parameter here but not expose `--primary`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15710567
  
    --- Diff: bin/spark-sql ---
    @@ -26,11 +26,16 @@ set -o posix
     # Figure out where Spark is installed
     FWDIR="$(cd `dirname $0`/..; pwd)"
     
    -if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
    -  echo "Usage: ./sbin/spark-sql [options]"
    -  $FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    +CLASS="org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver"
    +
    +if [[ "$@" = --help ]] || [[ "$@" = -h ]]; then
    +  echo "Usage: ./sbin/spark-sql [options] [--] [CLI options]"
    +  exec "$FWDIR"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
    --- End diff --
    
    `exec` here means anything past this line won't execute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50997486
  
    @liancheng @andrewor14 I do like `--spark-application-args` since I think it's safer to assume no other application has used this exact flag (whereas the `--` is a more generic mechanism I could imaging other people using).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730624
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -377,8 +409,13 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             |
             |  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
             |
    -        |  --help, -h                  Show this help message and exit
    -        |  --verbose, -v               Print additional debug output
    +        |  --help, -h                  Show this help message and exit.
    +        |  --verbose, -v               Print additional debug output.
    +        |
    +        |  --primary                   The primary jar file or Python file of the application.
    --- End diff --
    
    Thanks, will reword the option description.
    
    Updated the PR description to explain why `--primary` is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730083
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -322,6 +343,14 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
                   val errMessage = s"Unrecognized option '$value'."
                   SparkSubmit.printErrorAndExit(errMessage)
    --- End diff --
    
    We should add here something that directs the user to use `--primary` and `--` to possibly specify `$value` as an application option. Otherwise the only way for them to find out about these two options is if they went through the full help message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50959981
  
    And, adding `--primary` also helps to maintain support for specifying `spark-internal`, `spark-shell` and `pyspark-shell` as primary resource.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50956063
  
    Yeah - that was my suggestion. Btw, I don't think this is particularly elegant, but it does allow us to retain compatiblity. I think in our docs we could suggest that users use the new mechanism, so the main reason to preserve both options is for 1.0 deployments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50871094
  
    QA tests have started for PR 1715. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17671/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15710642
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             verbose = true
             parse(tail)
     
    +      case "--" :: tail =>
    +        if (inSparkOpts) {
    +          SparkSubmit.printErrorAndExit(
    +            "Application option separator \"--\" must be after the primary resource " +
    +              "(i.e., application jar file or Python file).")
    +        } else {
    +          childArgs ++= tail.filter(_.nonEmpty)
    --- End diff --
    
    I wouldn't filter. If a user has gone through the trouble of quoting things to explicitly include an empty argument, it should probably stay there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730025
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -377,8 +409,13 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
             |
             |  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
             |
    -        |  --help, -h                  Show this help message and exit
    -        |  --verbose, -v               Print additional debug output
    +        |  --help, -h                  Show this help message and exit.
    +        |  --verbose, -v               Print additional debug output.
    +        |
    +        |  --primary                   The primary jar file or Python file of the application.
    --- End diff --
    
    You may need to explicitly mention that this is only needed if we need to pass `--*` arguments to the application. Out of context, it's not clear why the user needs to do `--primary` when they can just pass it directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1715#discussion_r15730497
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -349,7 +378,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
           outStream.println("Unknown/unsupported param " + unknownParam)
         }
         outStream.println(
    -      """Usage: spark-submit [options] <app jar | python file> [app options]
    +      """Usage:
    +        |  spark-submit [options] <app jar | python file> [app options]
    +        |  spark-submit [options] --primary <app jar | python file> -- [app options]
    +        |
    --- End diff --
    
    Thanks! `spark-shell` and `pyspark` do need to be updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50972358
  
    QA tests have started for PR 1715. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17774/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1715#issuecomment-50966823
  
    The build failures seem unrelated to this PR changes. I guess the Jenkins server runs out of file descriptors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org