You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mridul Muralidharan <mr...@gmail.com> on 2022/08/11 17:15:24 UTC

Re: How to set platform-level defaults for array-like configs?

Hi,

  Wenchen, would be great if you could chime in with your thoughts - given
the feedback you originally had on the PR.
It would be great to hear feedback from others on this, particularly folks
managing spark deployments - how this is mitigated/avoided in your
case, any other pain points with configs in this context.


Regards,
Mridul

On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xk...@apache.org> wrote:

> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>    - Define a clear convention, e.g. a suffix of ".default" that enables
>    a default to be set and merged
>    - Document this convention in configuration.md so that we can avoid
>    separately documenting each default-config, and instead just add a note in
>    the docs for the normal config.
>    - Adjust the withPrepended method
>    <https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219>
>    added in #24804 <https://github.com/apache/spark/pull/24804> to
>    leverage this convention instead of each usage instance re-defining the
>    additional config name
>    - Do a comprehensive review of applicable configs and enable them all
>    to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 <https://github.com/apache/spark/pull/34856>, would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmahadik@gmail.com> wrote:
>
>> Hi Spark devs,
>>
>> Spark contains a bunch of array-like configs (comma separated lists).
>> Some examples include `spark.sql.extensions`,
>> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
>> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
>> a dozen or so more). As owners of the Spark platform in our organization,
>> we would like to set platform-level defaults, e.g. custom SQL extension and
>> listeners, and we use some of the above mentioned properties to do so. At
>> the same time, we have power users writing their own listeners, setting the
>> same Spark confs and thus unintentionally overriding our platform defaults.
>> This leads to a loss of functionality within our platform.
>>
>> Previously, Spark has introduced "default" confs for a few of these
>> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
>> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
>> These properties are meant to only be set by cluster admins thus allowing
>> separation between platform default and user configs. However, as discussed
>> in https://github.com/apache/spark/pull/34856, these configs are still
>> client-side and can still be overridden, while also not being a scalable
>> solution as we cannot introduce 1 new "default" config for every array-like
>> config.
>>
>> I wanted to know if others have experienced this issue and what systems
>> have been implemented to tackle this. Are there any existing solutions for
>> this; either client-side or server-side? (e.g. at job submission server).
>> Even though we cannot easily enforce this at the client-side, the
>> simplicity of a solution may make it more appealing.
>>
>> Thanks,
>> Shardul
>>
>

Re: How to set platform-level defaults for array-like configs?

Posted by Shrikant Prasad <sh...@gmail.com>.
Hi Mridul,

If you are using Spark on Kubernetes, you can make use of admission
controller to validate or mutate the confs set in the spark defaults
configmap. But this approach will work only for cluster deploy mode and not
for client.

Regards,
Shrikant

On Fri, 12 Aug 2022 at 12:26 AM, Tom Graves <tg...@yahoo.com.invalid>
wrote:

> A few years ago when I was doing more deployment management I kicked
> around the idea of having different types of configs or different ways to
> specify the configs.  Though one of the problems at the time was actually
> with users specifying a properties file and not picking up the
> spark-defaults.conf.    So I was thinking about creating like a
> spark-admin.conf or something to that nature.
>
>  I think there is benefit in it, it just comes down to how to implement it
> best.  The other thing I don't think I saw addressed was the the ability
> prevent user from overriding configs.  If you just do the defaults I
> presume users could still override it.  That gets a bit trickier especially
> if they can override the entire spark-defaults.conf file.
>
>
> Tom
> On Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan <
> mridul@gmail.com> wrote:
>
>
>
> Hi,
>
>   Wenchen, would be great if you could chime in with your thoughts - given
> the feedback you originally had on the PR.
> It would be great to hear feedback from others on this, particularly folks
> managing spark deployments - how this is mitigated/avoided in your
> case, any other pain points with configs in this context.
>
>
> Regards,
> Mridul
>
> On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xk...@apache.org> wrote:
>
> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>    - Define a clear convention, e.g. a suffix of ".default" that enables
>    a default to be set and merged
>    - Document this convention in configuration.md so that we can avoid
>    separately documenting each default-config, and instead just add a note in
>    the docs for the normal config.
>    - Adjust the withPrepended method
>    <https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219>
>    added in #24804 <https://github.com/apache/spark/pull/24804> to
>    leverage this convention instead of each usage instance re-defining the
>    additional config name
>    - Do a comprehensive review of applicable configs and enable them all
>    to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 <https://github.com/apache/spark/pull/34856>, would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmahadik@gmail.com> wrote:
>
> Hi Spark devs,
>
> Spark contains a bunch of array-like configs (comma separated lists). Some
> examples include `spark.sql.extensions`,
> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
> a dozen or so more). As owners of the Spark platform in our organization,
> we would like to set platform-level defaults, e.g. custom SQL extension and
> listeners, and we use some of the above mentioned properties to do so. At
> the same time, we have power users writing their own listeners, setting the
> same Spark confs and thus unintentionally overriding our platform defaults.
> This leads to a loss of functionality within our platform.
>
> Previously, Spark has introduced "default" confs for a few of these
> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
> These properties are meant to only be set by cluster admins thus allowing
> separation between platform default and user configs. However, as discussed
> in https://github.com/apache/spark/pull/34856, these configs are still
> client-side and can still be overridden, while also not being a scalable
> solution as we cannot introduce 1 new "default" config for every array-like
> config.
>
> I wanted to know if others have experienced this issue and what systems
> have been implemented to tackle this. Are there any existing solutions for
> this; either client-side or server-side? (e.g. at job submission server).
> Even though we cannot easily enforce this at the client-side, the
> simplicity of a solution may make it more appealing.
>
> Thanks,
> Shardul
>
> --
Regards,
Shrikant Prasad

Re: How to set platform-level defaults for array-like configs?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
 A few years ago when I was doing more deployment management I kicked around the idea of having different types of configs or different ways to specify the configs.  Though one of the problems at the time was actually with users specifying a properties file and not picking up the spark-defaults.conf.    So I was thinking about creating like a spark-admin.conf or something to that nature.
 I think there is benefit in it, it just comes down to how to implement it best.  The other thing I don't think I saw addressed was the the ability prevent user from overriding configs.  If you just do the defaults I presume users could still override it.  That gets a bit trickier especially if they can override the entire spark-defaults.conf file. 

Tom    On Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan <mr...@gmail.com> wrote:  
 
 
Hi,
  Wenchen, would be great if you could chime in with your thoughts - given the feedback you originally had on the PR.It would be great to hear feedback from others on this, particularly folks managing spark deployments - how this is mitigated/avoided in your case, any other pain points with configs in this context.

Regards,Mridul
On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xk...@apache.org> wrote:

I find there's substantial value in being able to set defaults, and I think we can see that the community finds value in it as well, given the handful of "default"-like configs that exist today as mentioned in Shardul's email. The mismatch of conventions used today (suffix with ".defaultList", change "extra" to "default", ...) is confusing and inconsistent, plus requires one-off additions for each config.
My proposal here would be:   
   - Define a clear convention, e.g. a suffix of ".default" that enables a default to be set and merged
   - Document this convention in configuration.md so that we can avoid separately documenting each default-config, and instead just add a note in the docs for the normal config.
   - Adjust the withPrepended method added in #24804 to leverage this convention instead of each usage instance re-defining the additional config name
   - Do a comprehensive review of applicable configs and enable them all to use the newly updated withPrepended method
Wenchen, you expressed some concerns with adding more default configs in #34856, would this proposal address those concerns?
Thanks,Erik
On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <sh...@gmail.com> wrote:

Hi Spark devs,

Spark contains a bunch of array-like configs (comma separated lists). Some examples include `spark.sql.extensions`, `spark.sql.queryExecutionListeners`, `spark.jars.repositories`, `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are a dozen or so more). As owners of the Spark platform in our organization, we would like to set platform-level defaults, e.g. custom SQL extension and listeners, and we use some of the above mentioned properties to do so. At the same time, we have power users writing their own listeners, setting the same Spark confs and thus unintentionally overriding our platform defaults. This leads to a loss of functionality within our platform.

Previously, Spark has introduced "default" confs for a few of these array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`, `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`. These properties are meant to only be set by cluster admins thus allowing separation between platform default and user configs. However, as discussed in https://github.com/apache/spark/pull/34856, these configs are still client-side and can still be overridden, while also not being a scalable solution as we cannot introduce 1 new "default" config for every array-like config.

I wanted to know if others have experienced this issue and what systems have been implemented to tackle this. Are there any existing solutions for this; either client-side or server-side? (e.g. at job submission server). Even though we cannot easily enforce this at the client-side, the simplicity of a solution may make it more appealing. 

Thanks,
Shardul