You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Marcelo Vanzin <va...@cloudera.com> on 2014/04/30 20:29:03 UTC
SparkSubmit and --driver-java-options
Hello all,
Maybe my brain is not evolved enough to be able to trace through what
happens with command-line arguments as they're parsed through all the
shell scripts... but I really can't figure out how to pass more than a
single JVM option on the command line.
Unless someone has an obvious workaround that I'm missing, I'd like to
propose something that is actually pretty standard in JVM tools: using
-J. From javac:
-J<flag> Pass <flag> directly to the runtime system
So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
use several of those to pass multiple options (unlike
--driver-java-options), so it helps that it's a short syntax.
Unless someone has some issue with that I'll work on a patch for it...
(well, I'm going to do it locally for me anyway because I really can't
figure out how to do what I want to otherwise.)
--
Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Dean Wampler <de...@gmail.com>.
Try this:
#!/bin/bash
for x in "$@"; do
echo "arg: $x"
done
ARGS_COPY=("$@") # Make ARGS_COPY an array with the array elements in $@
for x in "${ARGS_COPY[@]}"; do # preserve array arguments.
echo "arg_copy: $x"
done
On Wed, Apr 30, 2014 at 3:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
> So I reproduced the problem here:
>
> == test.sh ==
> #!/bin/bash
> for x in "$@"; do
> echo "arg: $x"
> done
> ARGS_COPY=$@
> for x in "$ARGS_COPY"; do
> echo "arg_copy: $x"
> done
> ==
>
> ./test.sh a b "c d e" f
> arg: a
> arg: b
> arg: c d e
> arg: f
> arg_copy: a b c d e f
>
> I'll dig around a bit more and see if we can fix it. Pretty sure we
> aren't passing these argument arrays around correctly in bash.
>
> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
> > On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >> Yeah I think the problem is that the spark-submit script doesn't pass
> >> the argument array to spark-class in the right way, so any quoted
> >> strings get flattened.
> >>
> >> I think we'll need to figure out how to do this correctly in the bash
> >> script so that quoted strings get passed in the right way.
> >
> > I tried a few different approaches but finally ended up giving up; my
> > bash-fu is apparently not strong enough. If you can make it work
> > great, but I have "-J" working locally in case you give up like me.
> > :-)
> >
> > --
> > Marcelo
>
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
Patch here:
https://github.com/apache/spark/pull/609
On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Dean - our e-mails crossed, but thanks for the tip. Was independently
> arriving at your solution :)
>
> Okay I'll submit something.
>
> - Patrick
>
> On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> Cool, that seems to work. Thanks!
>>
>> On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> Marcelo - Mind trying the following diff locally? If it works I can
>>> send a patch:
>>>
>>> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
>>> diff --git a/bin/spark-submit b/bin/spark-submit
>>> index dd0d95d..49bc262 100755
>>> --- a/bin/spark-submit
>>> +++ b/bin/spark-submit
>>> @@ -18,7 +18,7 @@
>>> #
>>>
>>> export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
>>> -ORIG_ARGS=$@
>>> +ORIG_ARGS=("$@")
>>>
>>> while (($#)); do
>>> if [ "$1" = "--deploy-mode" ]; then
>>> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
>>> && [ $DEPLOY_MODE = "client"
>>> export SPARK_MEM=$DRIVER_MEMORY
>>> fi
>>>
>>> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
>>> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
>>> "${ORIG_ARGS[@]}"
>>>
>>> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>> So I reproduced the problem here:
>>>>
>>>> == test.sh ==
>>>> #!/bin/bash
>>>> for x in "$@"; do
>>>> echo "arg: $x"
>>>> done
>>>> ARGS_COPY=$@
>>>> for x in "$ARGS_COPY"; do
>>>> echo "arg_copy: $x"
>>>> done
>>>> ==
>>>>
>>>> ./test.sh a b "c d e" f
>>>> arg: a
>>>> arg: b
>>>> arg: c d e
>>>> arg: f
>>>> arg_copy: a b c d e f
>>>>
>>>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>>>> aren't passing these argument arrays around correctly in bash.
>>>>
>>>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>>>> the argument array to spark-class in the right way, so any quoted
>>>>>> strings get flattened.
>>>>>>
>>>>>> I think we'll need to figure out how to do this correctly in the bash
>>>>>> script so that quoted strings get passed in the right way.
>>>>>
>>>>> I tried a few different approaches but finally ended up giving up; my
>>>>> bash-fu is apparently not strong enough. If you can make it work
>>>>> great, but I have "-J" working locally in case you give up like me.
>>>>> :-)
>>>>>
>>>>> --
>>>>> Marcelo
>>
>>
>>
>> --
>> Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
Dean - our e-mails crossed, but thanks for the tip. Was independently
arriving at your solution :)
Okay I'll submit something.
- Patrick
On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Cool, that seems to work. Thanks!
>
> On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> Marcelo - Mind trying the following diff locally? If it works I can
>> send a patch:
>>
>> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
>> diff --git a/bin/spark-submit b/bin/spark-submit
>> index dd0d95d..49bc262 100755
>> --- a/bin/spark-submit
>> +++ b/bin/spark-submit
>> @@ -18,7 +18,7 @@
>> #
>>
>> export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
>> -ORIG_ARGS=$@
>> +ORIG_ARGS=("$@")
>>
>> while (($#)); do
>> if [ "$1" = "--deploy-mode" ]; then
>> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
>> && [ $DEPLOY_MODE = "client"
>> export SPARK_MEM=$DRIVER_MEMORY
>> fi
>>
>> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
>> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
>> "${ORIG_ARGS[@]}"
>>
>> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> So I reproduced the problem here:
>>>
>>> == test.sh ==
>>> #!/bin/bash
>>> for x in "$@"; do
>>> echo "arg: $x"
>>> done
>>> ARGS_COPY=$@
>>> for x in "$ARGS_COPY"; do
>>> echo "arg_copy: $x"
>>> done
>>> ==
>>>
>>> ./test.sh a b "c d e" f
>>> arg: a
>>> arg: b
>>> arg: c d e
>>> arg: f
>>> arg_copy: a b c d e f
>>>
>>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>>> aren't passing these argument arrays around correctly in bash.
>>>
>>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>>> the argument array to spark-class in the right way, so any quoted
>>>>> strings get flattened.
>>>>>
>>>>> I think we'll need to figure out how to do this correctly in the bash
>>>>> script so that quoted strings get passed in the right way.
>>>>
>>>> I tried a few different approaches but finally ended up giving up; my
>>>> bash-fu is apparently not strong enough. If you can make it work
>>>> great, but I have "-J" working locally in case you give up like me.
>>>> :-)
>>>>
>>>> --
>>>> Marcelo
>
>
>
> --
> Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Marcelo Vanzin <va...@cloudera.com>.
Cool, that seems to work. Thanks!
On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Marcelo - Mind trying the following diff locally? If it works I can
> send a patch:
>
> patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
> diff --git a/bin/spark-submit b/bin/spark-submit
> index dd0d95d..49bc262 100755
> --- a/bin/spark-submit
> +++ b/bin/spark-submit
> @@ -18,7 +18,7 @@
> #
>
> export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
> -ORIG_ARGS=$@
> +ORIG_ARGS=("$@")
>
> while (($#)); do
> if [ "$1" = "--deploy-mode" ]; then
> @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
> && [ $DEPLOY_MODE = "client"
> export SPARK_MEM=$DRIVER_MEMORY
> fi
>
> -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
> +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
> "${ORIG_ARGS[@]}"
>
> On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> So I reproduced the problem here:
>>
>> == test.sh ==
>> #!/bin/bash
>> for x in "$@"; do
>> echo "arg: $x"
>> done
>> ARGS_COPY=$@
>> for x in "$ARGS_COPY"; do
>> echo "arg_copy: $x"
>> done
>> ==
>>
>> ./test.sh a b "c d e" f
>> arg: a
>> arg: b
>> arg: c d e
>> arg: f
>> arg_copy: a b c d e f
>>
>> I'll dig around a bit more and see if we can fix it. Pretty sure we
>> aren't passing these argument arrays around correctly in bash.
>>
>> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>>> the argument array to spark-class in the right way, so any quoted
>>>> strings get flattened.
>>>>
>>>> I think we'll need to figure out how to do this correctly in the bash
>>>> script so that quoted strings get passed in the right way.
>>>
>>> I tried a few different approaches but finally ended up giving up; my
>>> bash-fu is apparently not strong enough. If you can make it work
>>> great, but I have "-J" working locally in case you give up like me.
>>> :-)
>>>
>>> --
>>> Marcelo
--
Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
Marcelo - Mind trying the following diff locally? If it works I can
send a patch:
patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
diff --git a/bin/spark-submit b/bin/spark-submit
index dd0d95d..49bc262 100755
--- a/bin/spark-submit
+++ b/bin/spark-submit
@@ -18,7 +18,7 @@
#
export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
-ORIG_ARGS=$@
+ORIG_ARGS=("$@")
while (($#)); do
if [ "$1" = "--deploy-mode" ]; then
@@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ]
&& [ $DEPLOY_MODE = "client"
export SPARK_MEM=$DRIVER_MEMORY
fi
-$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
"${ORIG_ARGS[@]}"
On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pw...@gmail.com> wrote:
> So I reproduced the problem here:
>
> == test.sh ==
> #!/bin/bash
> for x in "$@"; do
> echo "arg: $x"
> done
> ARGS_COPY=$@
> for x in "$ARGS_COPY"; do
> echo "arg_copy: $x"
> done
> ==
>
> ./test.sh a b "c d e" f
> arg: a
> arg: b
> arg: c d e
> arg: f
> arg_copy: a b c d e f
>
> I'll dig around a bit more and see if we can fix it. Pretty sure we
> aren't passing these argument arrays around correctly in bash.
>
> On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> Yeah I think the problem is that the spark-submit script doesn't pass
>>> the argument array to spark-class in the right way, so any quoted
>>> strings get flattened.
>>>
>>> I think we'll need to figure out how to do this correctly in the bash
>>> script so that quoted strings get passed in the right way.
>>
>> I tried a few different approaches but finally ended up giving up; my
>> bash-fu is apparently not strong enough. If you can make it work
>> great, but I have "-J" working locally in case you give up like me.
>> :-)
>>
>> --
>> Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
So I reproduced the problem here:
== test.sh ==
#!/bin/bash
for x in "$@"; do
echo "arg: $x"
done
ARGS_COPY=$@
for x in "$ARGS_COPY"; do
echo "arg_copy: $x"
done
==
./test.sh a b "c d e" f
arg: a
arg: b
arg: c d e
arg: f
arg_copy: a b c d e f
I'll dig around a bit more and see if we can fix it. Pretty sure we
aren't passing these argument arrays around correctly in bash.
On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> Yeah I think the problem is that the spark-submit script doesn't pass
>> the argument array to spark-class in the right way, so any quoted
>> strings get flattened.
>>
>> I think we'll need to figure out how to do this correctly in the bash
>> script so that quoted strings get passed in the right way.
>
> I tried a few different approaches but finally ended up giving up; my
> bash-fu is apparently not strong enough. If you can make it work
> great, but I have "-J" working locally in case you give up like me.
> :-)
>
> --
> Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Marcelo Vanzin <va...@cloudera.com>.
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Yeah I think the problem is that the spark-submit script doesn't pass
> the argument array to spark-class in the right way, so any quoted
> strings get flattened.
>
> I think we'll need to figure out how to do this correctly in the bash
> script so that quoted strings get passed in the right way.
I tried a few different approaches but finally ended up giving up; my
bash-fu is apparently not strong enough. If you can make it work
great, but I have "-J" working locally in case you give up like me.
:-)
--
Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
Yeah I think the problem is that the spark-submit script doesn't pass
the argument array to spark-class in the right way, so any quoted
strings get flattened.
We do:
ORIG_ARGS=$@
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
This works:
// remove all the code relating to `shift`ing the arguments
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
Not sure, but I think the issue is that when you make a copy of $@ in
bash the type actually changes from an array to something else.
My patch fixes this for spark-shell but I didn't realize that
spark-submit does the same thing.
https://github.com/apache/spark/pull/576/files#diff-bc287993dfd11fd18794041e169ffd72L23
I think we'll need to figure out how to do this correctly in the bash
script so that quoted strings get passed in the right way.
On Wed, Apr 30, 2014 at 1:06 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Just pulled again just in case. Verified your fix is there.
>
> $ ./bin/spark-submit --master yarn --deploy-mode client
> --driver-java-options "-Dfoo -Dbar" blah blah blah
> error: Unrecognized option '-Dbar'.
> run with --help for more information or --verbose for debugging output
>
>
> On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> I added a fix for this recently and it didn't require adding -J
>> notation - are you trying it with this patch?
>>
>> https://issues.apache.org/jira/browse/SPARK-1654
>>
>> ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
>> scala> sys.props.get("foo")
>> res0: Option[String] = Some(a)
>> scala> sys.props.get("bar")
>> res1: Option[String] = Some(b)
>>
>> - Patrick
>>
>> On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>> Hello all,
>>>
>>> Maybe my brain is not evolved enough to be able to trace through what
>>> happens with command-line arguments as they're parsed through all the
>>> shell scripts... but I really can't figure out how to pass more than a
>>> single JVM option on the command line.
>>>
>>> Unless someone has an obvious workaround that I'm missing, I'd like to
>>> propose something that is actually pretty standard in JVM tools: using
>>> -J. From javac:
>>>
>>> -J<flag> Pass <flag> directly to the runtime system
>>>
>>> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
>>> use several of those to pass multiple options (unlike
>>> --driver-java-options), so it helps that it's a short syntax.
>>>
>>> Unless someone has some issue with that I'll work on a patch for it...
>>> (well, I'm going to do it locally for me anyway because I really can't
>>> figure out how to do what I want to otherwise.)
>>>
>>>
>>> --
>>> Marcelo
>
>
>
> --
> Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Marcelo Vanzin <va...@cloudera.com>.
Just pulled again just in case. Verified your fix is there.
$ ./bin/spark-submit --master yarn --deploy-mode client
--driver-java-options "-Dfoo -Dbar" blah blah blah
error: Unrecognized option '-Dbar'.
run with --help for more information or --verbose for debugging output
On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell <pw...@gmail.com> wrote:
> I added a fix for this recently and it didn't require adding -J
> notation - are you trying it with this patch?
>
> https://issues.apache.org/jira/browse/SPARK-1654
>
> ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
> scala> sys.props.get("foo")
> res0: Option[String] = Some(a)
> scala> sys.props.get("bar")
> res1: Option[String] = Some(b)
>
> - Patrick
>
> On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> Hello all,
>>
>> Maybe my brain is not evolved enough to be able to trace through what
>> happens with command-line arguments as they're parsed through all the
>> shell scripts... but I really can't figure out how to pass more than a
>> single JVM option on the command line.
>>
>> Unless someone has an obvious workaround that I'm missing, I'd like to
>> propose something that is actually pretty standard in JVM tools: using
>> -J. From javac:
>>
>> -J<flag> Pass <flag> directly to the runtime system
>>
>> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
>> use several of those to pass multiple options (unlike
>> --driver-java-options), so it helps that it's a short syntax.
>>
>> Unless someone has some issue with that I'll work on a patch for it...
>> (well, I'm going to do it locally for me anyway because I really can't
>> figure out how to do what I want to otherwise.)
>>
>>
>> --
>> Marcelo
--
Marcelo
Re: SparkSubmit and --driver-java-options
Posted by Patrick Wendell <pw...@gmail.com>.
I added a fix for this recently and it didn't require adding -J
notation - are you trying it with this patch?
https://issues.apache.org/jira/browse/SPARK-1654
./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b"
scala> sys.props.get("foo")
res0: Option[String] = Some(a)
scala> sys.props.get("bar")
res1: Option[String] = Some(b)
- Patrick
On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Hello all,
>
> Maybe my brain is not evolved enough to be able to trace through what
> happens with command-line arguments as they're parsed through all the
> shell scripts... but I really can't figure out how to pass more than a
> single JVM option on the command line.
>
> Unless someone has an obvious workaround that I'm missing, I'd like to
> propose something that is actually pretty standard in JVM tools: using
> -J. From javac:
>
> -J<flag> Pass <flag> directly to the runtime system
>
> So "javac -J-Xmx1g" would pass "-Xmx1g" to the underlying JVM. You can
> use several of those to pass multiple options (unlike
> --driver-java-options), so it helps that it's a short syntax.
>
> Unless someone has some issue with that I'll work on a patch for it...
> (well, I'm going to do it locally for me anyway because I really can't
> figure out how to do what I want to otherwise.)
>
>
> --
> Marcelo