You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2014/03/12 17:10:18 UTC

spark config params conventions

i am reading the spark configuration params from another configuration
object (typesafe config) before setting them as system properties.

i noticed typesafe config has trouble with settings like:
spark.speculation=true
spark.speculation.interval=0.5

the issue seems to be that if spark.speculation is a "container" that has
more values inside then it cannot be also a value itself, i think. so this
would work fine:
spark.speculation.enabled=true
spark.speculation.interval=0.5

just a heads up. i would probably suggest we avoid this situation.

Re: spark config params conventions

Posted by Chester Chen <ch...@yahoo.com>.
Based on typesafe config maintainer's response, with latest version of typeconfig, the double quote is no longer needed for key like spark.speculation, so you don't need code to strip the quotes



Chester
Alpine data labs

Sent from my iPhone

On Mar 12, 2014, at 2:50 PM, Aaron Davidson <il...@gmail.com> wrote:

> One solution for typesafe config is to use
> "spark.speculation" = true
> 
> Typesafe will recognize the key as a string rather than a path, so the name will actually be "\"spark.speculation\"", so you need to handle this contingency when passing the config operations to spark (stripping the quotes from the key).
> 
> Solving this in Spark itself is a little tricky because there are ~5 such conflicts (spark.serializer, spark.speculation, spark.locality.wait, spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty frequently. We could provide aliases for all of these in Spark, but actually deprecating the old ones would affect many users, so we could only do that if enough users would benefit from fully hierarchical config options.
> 
> 
> 
> On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com> wrote:
>> That's the whole reason why some of the intended configuration changes were backed out just before the 0.9.0 release.  It's a well-known issue, even if a completely satisfactory solution isn't as well-known and is probably something which should do another iteration on. 
>> 
>> 
>> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>> i am reading the spark configuration params from another configuration object (typesafe config) before setting them as system properties.
>>> 
>>> i noticed typesafe config has trouble with settings like:
>>> spark.speculation=true
>>> spark.speculation.interval=0.5
>>> 
>>> the issue seems to be that if spark.speculation is a "container" that has  more values inside then it cannot be also a value itself, i think. so this would work fine:
>>> spark.speculation.enabled=true
>>> spark.speculation.interval=0.5
>>> 
>>> just a heads up. i would probably suggest we avoid this situation.
>> 
> 

Re: spark config params conventions

Posted by Chester Chen <ch...@yahoo.com>.
Based on typesafe config maintainer's response, with latest version of typeconfig, the double quote is no longer needed for key like spark.speculation, so you don't need code to strip the quotes



Chester
Alpine data labs

Sent from my iPhone

On Mar 12, 2014, at 2:50 PM, Aaron Davidson <il...@gmail.com> wrote:

> One solution for typesafe config is to use
> "spark.speculation" = true
> 
> Typesafe will recognize the key as a string rather than a path, so the name will actually be "\"spark.speculation\"", so you need to handle this contingency when passing the config operations to spark (stripping the quotes from the key).
> 
> Solving this in Spark itself is a little tricky because there are ~5 such conflicts (spark.serializer, spark.speculation, spark.locality.wait, spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty frequently. We could provide aliases for all of these in Spark, but actually deprecating the old ones would affect many users, so we could only do that if enough users would benefit from fully hierarchical config options.
> 
> 
> 
> On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com> wrote:
>> That's the whole reason why some of the intended configuration changes were backed out just before the 0.9.0 release.  It's a well-known issue, even if a completely satisfactory solution isn't as well-known and is probably something which should do another iteration on. 
>> 
>> 
>> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>> i am reading the spark configuration params from another configuration object (typesafe config) before setting them as system properties.
>>> 
>>> i noticed typesafe config has trouble with settings like:
>>> spark.speculation=true
>>> spark.speculation.interval=0.5
>>> 
>>> the issue seems to be that if spark.speculation is a "container" that has  more values inside then it cannot be also a value itself, i think. so this would work fine:
>>> spark.speculation.enabled=true
>>> spark.speculation.interval=0.5
>>> 
>>> just a heads up. i would probably suggest we avoid this situation.
>> 
> 

Re: spark config params conventions

Posted by Aaron Davidson <il...@gmail.com>.
One solution for typesafe config is to use
"spark.speculation" = true

Typesafe will recognize the key as a string rather than a path, so the name
will actually be "\"spark.speculation\"", so you need to handle this
contingency when passing the config operations to spark (stripping the
quotes from the key).

Solving this in Spark itself is a little tricky because there are ~5 such
conflicts (spark.serializer, spark.speculation, spark.locality.wait,
spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty
frequently. We could provide aliases for all of these in Spark, but
actually deprecating the old ones would affect many users, so we could only
do that if enough users would benefit from fully hierarchical config
options.



On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com>wrote:

> That's the whole reason why some of the intended configuration changes
> were backed out just before the 0.9.0 release.  It's a well-known issue,
> even if a completely satisfactory solution isn't as well-known and is
> probably something which should do another iteration on.
>
>
> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i am reading the spark configuration params from another configuration
>> object (typesafe config) before setting them as system properties.
>>
>> i noticed typesafe config has trouble with settings like:
>> spark.speculation=true
>> spark.speculation.interval=0.5
>>
>> the issue seems to be that if spark.speculation is a "container" that has
>> more values inside then it cannot be also a value itself, i think. so this
>> would work fine:
>> spark.speculation.enabled=true
>> spark.speculation.interval=0.5
>>
>> just a heads up. i would probably suggest we avoid this situation.
>>
>
>

Re: spark config params conventions

Posted by yao <ya...@gmail.com>.
+1. I agree to keep the old ones only for backward compatibility purpose.


On Wed, Mar 12, 2014 at 12:38 PM, Evan Chan <ev...@ooyala.com> wrote:

> +1.
>
> Not just for Typesafe Config, but if we want to consider hierarchical
> configs like JSON rather than flat key mappings, it is necessary.  It
> is also clearer.
>
> On Wed, Mar 12, 2014 at 9:58 AM, Aaron Davidson <il...@gmail.com>
> wrote:
> > Should we try to deprecate these types of configs for 1.0.0? We can start
> > by accepting both and giving a warning if you use the old one, and then
> > actually remove them in the next minor release. I think
> > "spark.speculation.enabled=true" is better than "spark.speculation=true",
> > and if we decide to use typesafe configs again ourselves, this change is
> > necessary.
> >
> > We actually don't have to ever complete the deprecation - we can always
> > accept both spark.speculation and spark.speculation.enabled, and people
> > just have to use the latter if they want to use typesafe config.
> >
> >
> > On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <mark@clearstorydata.com
> >wrote:
> >
> >> That's the whole reason why some of the intended configuration changes
> >> were backed out just before the 0.9.0 release.  It's a well-known issue,
> >> even if a completely satisfactory solution isn't as well-known and is
> >> probably something which should do another iteration on.
> >>
> >>
> >> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >>
> >>> i am reading the spark configuration params from another configuration
> >>> object (typesafe config) before setting them as system properties.
> >>>
> >>> i noticed typesafe config has trouble with settings like:
> >>> spark.speculation=true
> >>> spark.speculation.interval=0.5
> >>>
> >>> the issue seems to be that if spark.speculation is a "container" that
> has
> >>> more values inside then it cannot be also a value itself, i think. so
> this
> >>> would work fine:
> >>> spark.speculation.enabled=true
> >>> spark.speculation.interval=0.5
> >>>
> >>> just a heads up. i would probably suggest we avoid this situation.
> >>>
> >>
> >>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |
>

Re: spark config params conventions

Posted by Evan Chan <ev...@ooyala.com>.
+1.

Not just for Typesafe Config, but if we want to consider hierarchical
configs like JSON rather than flat key mappings, it is necessary.  It
is also clearer.

On Wed, Mar 12, 2014 at 9:58 AM, Aaron Davidson <il...@gmail.com> wrote:
> Should we try to deprecate these types of configs for 1.0.0? We can start
> by accepting both and giving a warning if you use the old one, and then
> actually remove them in the next minor release. I think
> "spark.speculation.enabled=true" is better than "spark.speculation=true",
> and if we decide to use typesafe configs again ourselves, this change is
> necessary.
>
> We actually don't have to ever complete the deprecation - we can always
> accept both spark.speculation and spark.speculation.enabled, and people
> just have to use the latter if they want to use typesafe config.
>
>
> On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com>wrote:
>
>> That's the whole reason why some of the intended configuration changes
>> were backed out just before the 0.9.0 release.  It's a well-known issue,
>> even if a completely satisfactory solution isn't as well-known and is
>> probably something which should do another iteration on.
>>
>>
>> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i am reading the spark configuration params from another configuration
>>> object (typesafe config) before setting them as system properties.
>>>
>>> i noticed typesafe config has trouble with settings like:
>>> spark.speculation=true
>>> spark.speculation.interval=0.5
>>>
>>> the issue seems to be that if spark.speculation is a "container" that has
>>> more values inside then it cannot be also a value itself, i think. so this
>>> would work fine:
>>> spark.speculation.enabled=true
>>> spark.speculation.interval=0.5
>>>
>>> just a heads up. i would probably suggest we avoid this situation.
>>>
>>
>>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: spark config params conventions

Posted by Aaron Davidson <il...@gmail.com>.
Should we try to deprecate these types of configs for 1.0.0? We can start
by accepting both and giving a warning if you use the old one, and then
actually remove them in the next minor release. I think
"spark.speculation.enabled=true" is better than "spark.speculation=true",
and if we decide to use typesafe configs again ourselves, this change is
necessary.

We actually don't have to ever complete the deprecation - we can always
accept both spark.speculation and spark.speculation.enabled, and people
just have to use the latter if they want to use typesafe config.


On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com>wrote:

> That's the whole reason why some of the intended configuration changes
> were backed out just before the 0.9.0 release.  It's a well-known issue,
> even if a completely satisfactory solution isn't as well-known and is
> probably something which should do another iteration on.
>
>
> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i am reading the spark configuration params from another configuration
>> object (typesafe config) before setting them as system properties.
>>
>> i noticed typesafe config has trouble with settings like:
>> spark.speculation=true
>> spark.speculation.interval=0.5
>>
>> the issue seems to be that if spark.speculation is a "container" that has
>> more values inside then it cannot be also a value itself, i think. so this
>> would work fine:
>> spark.speculation.enabled=true
>> spark.speculation.interval=0.5
>>
>> just a heads up. i would probably suggest we avoid this situation.
>>
>
>

Re: spark config params conventions

Posted by Aaron Davidson <il...@gmail.com>.
One solution for typesafe config is to use
"spark.speculation" = true

Typesafe will recognize the key as a string rather than a path, so the name
will actually be "\"spark.speculation\"", so you need to handle this
contingency when passing the config operations to spark (stripping the
quotes from the key).

Solving this in Spark itself is a little tricky because there are ~5 such
conflicts (spark.serializer, spark.speculation, spark.locality.wait,
spark.shuffle.spill, and spark.cleaner.ttl), some of which are used pretty
frequently. We could provide aliases for all of these in Spark, but
actually deprecating the old ones would affect many users, so we could only
do that if enough users would benefit from fully hierarchical config
options.



On Wed, Mar 12, 2014 at 9:24 AM, Mark Hamstra <ma...@clearstorydata.com>wrote:

> That's the whole reason why some of the intended configuration changes
> were backed out just before the 0.9.0 release.  It's a well-known issue,
> even if a completely satisfactory solution isn't as well-known and is
> probably something which should do another iteration on.
>
>
> On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i am reading the spark configuration params from another configuration
>> object (typesafe config) before setting them as system properties.
>>
>> i noticed typesafe config has trouble with settings like:
>> spark.speculation=true
>> spark.speculation.interval=0.5
>>
>> the issue seems to be that if spark.speculation is a "container" that has
>> more values inside then it cannot be also a value itself, i think. so this
>> would work fine:
>> spark.speculation.enabled=true
>> spark.speculation.interval=0.5
>>
>> just a heads up. i would probably suggest we avoid this situation.
>>
>
>

Re: spark config params conventions

Posted by Mark Hamstra <ma...@clearstorydata.com>.
That's the whole reason why some of the intended configuration changes were
backed out just before the 0.9.0 release.  It's a well-known issue, even if
a completely satisfactory solution isn't as well-known and is probably
something which should do another iteration on.


On Wed, Mar 12, 2014 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i am reading the spark configuration params from another configuration
> object (typesafe config) before setting them as system properties.
>
> i noticed typesafe config has trouble with settings like:
> spark.speculation=true
> spark.speculation.interval=0.5
>
> the issue seems to be that if spark.speculation is a "container" that has
> more values inside then it cannot be also a value itself, i think. so this
> would work fine:
> spark.speculation.enabled=true
> spark.speculation.interval=0.5
>
> just a heads up. i would probably suggest we avoid this situation.
>