You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Marcelo Vanzin <va...@cloudera.com> on 2014/04/30 20:29:03 UTC

SparkSubmit and --driver-java-options

Hello all,

Maybe my brain is not evolved enough to be able to trace through what
happens with command-line arguments as they're parsed through all the
shell scripts... but I really can't figure out how to pass more than a
single JVM option on the command line.

Unless someone has an obvious workaround that I'm missing, I'd like to
propose something that is actually pretty standard in JVM tools: using
-J. From javac:

  -J<flag>                   Pass <flag> directly to the runtime system

So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
use several of those to pass multiple options (unlike
--driver-java-options), so it helps that it's a short syntax.

Unless someone has some issue with that I'll work on a patch for it...
(well, I'm going to do it locally for me anyway because I really can't
figure out how to do what I want to otherwise.)


-- 
Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Dean Wampler <de...@gmail.com>.
Try this:

#!/bin/bash
for x in "$@"; do
  echo "arg: $x"
done
ARGS_COPY=("$@")     # Make ARGS_COPY an array with the array elements in $@

for x in "${ARGS_COPY[@]}"; do                # preserve array arguments.
  echo "arg_copy: $x"
done



On Wed, Apr 30, 2014 at 3:51 PM, Patrick Wendell <pw...@gmail.com> wrote:

> So I reproduced the problem here:
>
> == test.sh ==
> #!/bin/bash
> for x in "$@"; do
>   echo "arg: $x"
> done
> ARGS_COPY=$@
> for x in "$ARGS_COPY"; do
>   echo "arg_copy: $x"
> done
> ==
>
> ./test.sh a b "c d e" f
> arg: a
> arg: b
> arg: c d e
> arg: f
> arg_copy: a b c d e f
>
> I'll dig around a bit more and see if we can fix it. Pretty sure we
> aren't passing these argument arrays around correctly in bash.
>
> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
> > On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >> Yeah I think the problem is that the spark-submit script doesn't pass
> >> the argument array to spark-class in the right way, so any quoted
> >> strings get flattened.
> >>
> >> I think we'll need to figure out how to do this correctly in the bash
> >> script so that quoted strings get passed in the right way.
> >
> > I tried a few different approaches but finally ended up giving up; my
> > bash-fu is apparently not strong enough. If you can make it work
> > great, but I have "-J" working locally in case you give up like me.
> > :-)
> >
> > --
> > Marcelo
>



-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
Patch here:
https://github.com/apache/spark/pull/609

On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Dean - our e-mails crossed, but thanks for the tip. Was independently
> arriving at your solution :)
>
> Okay I'll submit something.
>
> - Patrick
>
> On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> Cool, that seems to work. Thanks!
>>
>> On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> Marcelo - Mind trying the following diff locally? If it works I can
>>> send a patch:
>>>
>>> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
>>> diff --git a/bin/spark-submit b/bin/spark-submit
>>> index dd0d95d..49bc262 100755
>>> --- a/bin/spark-submit
>>> +++ b/bin/spark-submit
>>> @@ -18,7 +18,7 @@
>>>  #
>>>
>>>  export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
>>> -ORIG_ARGS=$@
>>> +ORIG_ARGS=("$@")
>>>
>>>  while (($#)); do
>>>    if [ "$1" = "--deploy-mode" ]; then
>>> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
>>> && [ $DEPLOY_MODE = "client"
>>>    export SPARK_MEM=$DRIVER_MEMORY
>>>  fi
>>>
>>> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
>>> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
>>> "${ORIG_ARGS[@]}"
>>>
>>> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>> So I reproduced the problem here:
>>>>
>>>> == test.sh ==
>>>> #!/bin/bash
>>>> for x in "$@"; do
>>>>   echo "arg: $x"
>>>> done
>>>> ARGS_COPY=$@
>>>> for x in "$ARGS_COPY"; do
>>>>   echo "arg_copy: $x"
>>>> done
>>>> ==
>>>>
>>>> ./test.sh a b "c d e" f
>>>> arg: a
>>>> arg: b
>>>> arg: c d e
>>>> arg: f
>>>> arg_copy: a b c d e f
>>>>
>>>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>>>> aren't passing these argument arrays around correctly in bash.
>>>>
>>>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>>>> the argument array to spark-class in the right way, so any quoted
>>>>>> strings get flattened.
>>>>>>
>>>>>> I think we'll need to figure out how to do this correctly in the bash
>>>>>> script so that quoted strings get passed in the right way.
>>>>>
>>>>> I tried a few different approaches but finally ended up giving up; my
>>>>> bash-fu is apparently not strong enough. If you can make it work
>>>>> great, but I have "-J" working locally in case you give up like me.
>>>>> :-)
>>>>>
>>>>> --
>>>>> Marcelo
>>
>>
>>
>> --
>> Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
Dean - our e-mails crossed, but thanks for the tip. Was independently
arriving at your solution :)

Okay I'll submit something.

- Patrick

On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Cool, that seems to work. Thanks!
>
> On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> Marcelo - Mind trying the following diff locally? If it works I can
>> send a patch:
>>
>> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
>> diff --git a/bin/spark-submit b/bin/spark-submit
>> index dd0d95d..49bc262 100755
>> --- a/bin/spark-submit
>> +++ b/bin/spark-submit
>> @@ -18,7 +18,7 @@
>>  #
>>
>>  export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
>> -ORIG_ARGS=$@
>> +ORIG_ARGS=("$@")
>>
>>  while (($#)); do
>>    if [ "$1" = "--deploy-mode" ]; then
>> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
>> && [ $DEPLOY_MODE = "client"
>>    export SPARK_MEM=$DRIVER_MEMORY
>>  fi
>>
>> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
>> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
>> "${ORIG_ARGS[@]}"
>>
>> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> So I reproduced the problem here:
>>>
>>> == test.sh ==
>>> #!/bin/bash
>>> for x in "$@"; do
>>>   echo "arg: $x"
>>> done
>>> ARGS_COPY=$@
>>> for x in "$ARGS_COPY"; do
>>>   echo "arg_copy: $x"
>>> done
>>> ==
>>>
>>> ./test.sh a b "c d e" f
>>> arg: a
>>> arg: b
>>> arg: c d e
>>> arg: f
>>> arg_copy: a b c d e f
>>>
>>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>>> aren't passing these argument arrays around correctly in bash.
>>>
>>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>>> the argument array to spark-class in the right way, so any quoted
>>>>> strings get flattened.
>>>>>
>>>>> I think we'll need to figure out how to do this correctly in the bash
>>>>> script so that quoted strings get passed in the right way.
>>>>
>>>> I tried a few different approaches but finally ended up giving up; my
>>>> bash-fu is apparently not strong enough. If you can make it work
>>>> great, but I have "-J" working locally in case you give up like me.
>>>> :-)
>>>>
>>>> --
>>>> Marcelo
>
>
>
> --
> Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Marcelo Vanzin <va...@cloudera.com>.
Cool, that seems to work. Thanks!

On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Marcelo - Mind trying the following diff locally? If it works I can
> send a patch:
>
> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
> diff --git a/bin/spark-submit b/bin/spark-submit
> index dd0d95d..49bc262 100755
> --- a/bin/spark-submit
> +++ b/bin/spark-submit
> @@ -18,7 +18,7 @@
>  #
>
>  export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
> -ORIG_ARGS=$@
> +ORIG_ARGS=("$@")
>
>  while (($#)); do
>    if [ "$1" = "--deploy-mode" ]; then
> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
> && [ $DEPLOY_MODE = "client"
>    export SPARK_MEM=$DRIVER_MEMORY
>  fi
>
> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
> "${ORIG_ARGS[@]}"
>
> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> So I reproduced the problem here:
>>
>> == test.sh ==
>> #!/bin/bash
>> for x in "$@"; do
>>   echo "arg: $x"
>> done
>> ARGS_COPY=$@
>> for x in "$ARGS_COPY"; do
>>   echo "arg_copy: $x"
>> done
>> ==
>>
>> ./test.sh a b "c d e" f
>> arg: a
>> arg: b
>> arg: c d e
>> arg: f
>> arg_copy: a b c d e f
>>
>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>> aren't passing these argument arrays around correctly in bash.
>>
>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>> the argument array to spark-class in the right way, so any quoted
>>>> strings get flattened.
>>>>
>>>> I think we'll need to figure out how to do this correctly in the bash
>>>> script so that quoted strings get passed in the right way.
>>>
>>> I tried a few different approaches but finally ended up giving up; my
>>> bash-fu is apparently not strong enough. If you can make it work
>>> great, but I have "-J" working locally in case you give up like me.
>>> :-)
>>>
>>> --
>>> Marcelo



-- 
Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
Marcelo - Mind trying the following diff locally? If it works I can
send a patch:

patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
diff --git a/bin/spark-submit b/bin/spark-submit
index dd0d95d..49bc262 100755
--- a/bin/spark-submit
+++ b/bin/spark-submit
@@ -18,7 +18,7 @@
 #

 export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
-ORIG_ARGS=$@
+ORIG_ARGS=("$@")

 while (($#)); do
   if [ "$1" = "--deploy-mode" ]; then
@@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
&& [ $DEPLOY_MODE = "client"
   export SPARK_MEM=$DRIVER_MEMORY
 fi

-$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
"${ORIG_ARGS[@]}"

On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
> So I reproduced the problem here:
>
> == test.sh ==
> #!/bin/bash
> for x in "$@"; do
>   echo "arg: $x"
> done
> ARGS_COPY=$@
> for x in "$ARGS_COPY"; do
>   echo "arg_copy: $x"
> done
> ==
>
> ./test.sh a b "c d e" f
> arg: a
> arg: b
> arg: c d e
> arg: f
> arg_copy: a b c d e f
>
> I'll dig around a bit more and see if we can fix it. Pretty sure we
> aren't passing these argument arrays around correctly in bash.
>
> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>> the argument array to spark-class in the right way, so any quoted
>>> strings get flattened.
>>>
>>> I think we'll need to figure out how to do this correctly in the bash
>>> script so that quoted strings get passed in the right way.
>>
>> I tried a few different approaches but finally ended up giving up; my
>> bash-fu is apparently not strong enough. If you can make it work
>> great, but I have "-J" working locally in case you give up like me.
>> :-)
>>
>> --
>> Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
So I reproduced the problem here:

== test.sh ==
#!/bin/bash
for x in "$@"; do
  echo "arg: $x"
done
ARGS_COPY=$@
for x in "$ARGS_COPY"; do
  echo "arg_copy: $x"
done
==

./test.sh a b "c d e" f
arg: a
arg: b
arg: c d e
arg: f
arg_copy: a b c d e f

I'll dig around a bit more and see if we can fix it. Pretty sure we
aren't passing these argument arrays around correctly in bash.

On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> Yeah I think the problem is that the spark-submit script doesn't pass
>> the argument array to spark-class in the right way, so any quoted
>> strings get flattened.
>>
>> I think we'll need to figure out how to do this correctly in the bash
>> script so that quoted strings get passed in the right way.
>
> I tried a few different approaches but finally ended up giving up; my
> bash-fu is apparently not strong enough. If you can make it work
> great, but I have "-J" working locally in case you give up like me.
> :-)
>
> --
> Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Yeah I think the problem is that the spark-submit script doesn't pass
> the argument array to spark-class in the right way, so any quoted
> strings get flattened.
>
> I think we'll need to figure out how to do this correctly in the bash
> script so that quoted strings get passed in the right way.

I tried a few different approaches but finally ended up giving up; my
bash-fu is apparently not strong enough. If you can make it work
great, but I have "-J" working locally in case you give up like me.
:-)

-- 
Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
Yeah I think the problem is that the spark-submit script doesn't pass
the argument array to spark-class in the right way, so any quoted
strings get flattened.

We do:
ORIG_ARGS=$@
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS

This works:
// remove all the code relating to `shift`ing the arguments
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"

Not sure, but I think the issue is that when you make a copy of $@ in
bash the type actually changes from an array to something else.

My patch fixes this for spark-shell but I didn't realize that
spark-submit does the same thing.
https://github.com/apache/spark/pull/576/files#diff-bc287993dfd11fd18794041e169ffd72L23

I think we'll need to figure out how to do this correctly in the bash
script so that quoted strings get passed in the right way.

On Wed, Apr 30, 2014 at 1:06 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Just pulled again just in case. Verified your fix is there.
>
> $ ./bin/spark-submit --master yarn --deploy-mode client
> --driver-java-options "-Dfoo -Dbar" blah blah blah
> error: Unrecognized option '-Dbar'.
> run with --help for more information or --verbose for debugging output
>
>
> On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> I added a fix for this recently and it didn't require adding -J
>> notation - are you trying it with this patch?
>>
>> https://issues.apache.org/jira/browse/SPARK-1654
>>
>>  ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
>> scala> sys.props.get("foo")
>> res0: Option[String] = Some(a)
>> scala> sys.props.get("bar")
>> res1: Option[String] = Some(b)
>>
>> - Patrick
>>
>> On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>> Hello all,
>>>
>>> Maybe my brain is not evolved enough to be able to trace through what
>>> happens with command-line arguments as they're parsed through all the
>>> shell scripts... but I really can't figure out how to pass more than a
>>> single JVM option on the command line.
>>>
>>> Unless someone has an obvious workaround that I'm missing, I'd like to
>>> propose something that is actually pretty standard in JVM tools: using
>>> -J. From javac:
>>>
>>>   -J<flag>                   Pass <flag> directly to the runtime system
>>>
>>> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
>>> use several of those to pass multiple options (unlike
>>> --driver-java-options), so it helps that it's a short syntax.
>>>
>>> Unless someone has some issue with that I'll work on a patch for it...
>>> (well, I'm going to do it locally for me anyway because I really can't
>>> figure out how to do what I want to otherwise.)
>>>
>>>
>>> --
>>> Marcelo
>
>
>
> --
> Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Marcelo Vanzin <va...@cloudera.com>.
Just pulled again just in case. Verified your fix is there.

$ ./bin/spark-submit --master yarn --deploy-mode client
--driver-java-options "-Dfoo -Dbar" blah blah blah
error: Unrecognized option '-Dbar'.
run with --help for more information or --verbose for debugging output


On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell <pw...@gmail.com> wrote:
> I added a fix for this recently and it didn't require adding -J
> notation - are you trying it with this patch?
>
> https://issues.apache.org/jira/browse/SPARK-1654
>
>  ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
> scala> sys.props.get("foo")
> res0: Option[String] = Some(a)
> scala> sys.props.get("bar")
> res1: Option[String] = Some(b)
>
> - Patrick
>
> On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> Hello all,
>>
>> Maybe my brain is not evolved enough to be able to trace through what
>> happens with command-line arguments as they're parsed through all the
>> shell scripts... but I really can't figure out how to pass more than a
>> single JVM option on the command line.
>>
>> Unless someone has an obvious workaround that I'm missing, I'd like to
>> propose something that is actually pretty standard in JVM tools: using
>> -J. From javac:
>>
>>   -J<flag>                   Pass <flag> directly to the runtime system
>>
>> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
>> use several of those to pass multiple options (unlike
>> --driver-java-options), so it helps that it's a short syntax.
>>
>> Unless someone has some issue with that I'll work on a patch for it...
>> (well, I'm going to do it locally for me anyway because I really can't
>> figure out how to do what I want to otherwise.)
>>
>>
>> --
>> Marcelo



-- 
Marcelo

Re: SparkSubmit and --driver-java-options

Posted by Patrick Wendell <pw...@gmail.com>.
I added a fix for this recently and it didn't require adding -J
notation - are you trying it with this patch?

https://issues.apache.org/jira/browse/SPARK-1654

 ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
scala> sys.props.get("foo")
res0: Option[String] = Some(a)
scala> sys.props.get("bar")
res1: Option[String] = Some(b)

- Patrick

On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Hello all,
>
> Maybe my brain is not evolved enough to be able to trace through what
> happens with command-line arguments as they're parsed through all the
> shell scripts... but I really can't figure out how to pass more than a
> single JVM option on the command line.
>
> Unless someone has an obvious workaround that I'm missing, I'd like to
> propose something that is actually pretty standard in JVM tools: using
> -J. From javac:
>
>   -J<flag>                   Pass <flag> directly to the runtime system
>
> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
> use several of those to pass multiple options (unlike
> --driver-java-options), so it helps that it's a short syntax.
>
> Unless someone has some issue with that I'll work on a patch for it...
> (well, I'm going to do it locally for me anyway because I really can't
> figure out how to do what I want to otherwise.)
>
>
> --
> Marcelo