You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Marcelo Vanzin <va...@cloudera.com> on 2014/06/24 00:57:40 UTC

RFC: [SPARK-529] Create constants for known config variables.

I started with some code to implement an idea I had for SPARK-529, and
before going much further (since it's a large and kinda boring change)
I'd like to get some feedback from people.

Current code it at:
https://github.com/vanzin/spark/tree/SPARK-529

There are still some parts I haven't fully fleshed out yet (see TODO
list in the commit message), but that's the basic idea. Let me know if
you have any feedback or different ideas.

Thanks!


-- 
Marcelo

Re: RFC: [SPARK-529] Create constants for known config variables.

Posted by Marcelo Vanzin <va...@cloudera.com>.
Hi Matei, thanks for the comments.

On Mon, Jun 23, 2014 at 7:58 PM, Matei Zaharia <ma...@gmail.com> wrote:
> When we did the configuration pull request, we actually avoided having a big list of defaults in one class file, because this creates a file that all the components in the project depend on. For example, since we have some settings specific to streaming and the REPL, do we want those settings to appear in a file that’s in “core”? It might be better to just make sure we use constants for defaults in the code, or maybe have a separate class per project.

I'm actually following the style used in Hadoop here, but expanding it
a little bit to add type-safety too. (e.g., see DFSConfigKeys.java in
HDFS.) I sort of see your point of "everybody needs to depend on this
file", but then all the affected code already depends on SparkConf.

> The other problem with this kind of change is that it’s disruptive to all the other ongoing patches, so I wouldn’t consider it high-priority right now. We haven’t had a ton of problems with settings being mistyped.

Yes, large changes like this are always disruptive. But since it
mostly touches setup code, not the innards of algorithms and the like,
it shouldn't be hard to merge things.

> If you do want to do something like this though, apart from the comment above about modules, please make sure this is not a public API. As soon as we add it to the API, it means we can’t change or remove those config settings. I’d also suggest giving each config setting a single name instead of having “ui”, “shuffle”, etc objects, since the chained calls to conf.ui.port.value look somewhat confusing.

Making it not public is easy; but then how do we want to expose this
to users? It's normal right now, as you say, to hardcode the settings
in the code (or config files - which this change would not help with).
In the process of writing the p.o.c., I found cases of the same
setting with different casing in different source files, and the same
setting with different defaults. So maybe it has not affected any
users yet, but it's easy to miss things like that with the current
approach.

I don't have a good suggestion for separating the module-specific
configs; I kinda liked the "everything under SparkConf" approach and
tried to group settings in a way that made sense. Creating objects in
each module to hold the configs for those is possible, but would
probably look a bit uglier.

But in any case, this is not urgent, more of a nice thing to have in my opinion.

-- 
Marcelo

Re: RFC: [SPARK-529] Create constants for known config variables.

Posted by Matei Zaharia <ma...@gmail.com>.
Hey Marcelo,

When we did the configuration pull request, we actually avoided having a big list of defaults in one class file, because this creates a file that all the components in the project depend on. For example, since we have some settings specific to streaming and the REPL, do we want those settings to appear in a file that’s in “core”? It might be better to just make sure we use constants for defaults in the code, or maybe have a separate class per project.

The other problem with this kind of change is that it’s disruptive to all the other ongoing patches, so I wouldn’t consider it high-priority right now. We haven’t had a ton of problems with settings being mistyped.

If you do want to do something like this though, apart from the comment above about modules, please make sure this is not a public API. As soon as we add it to the API, it means we can’t change or remove those config settings. I’d also suggest giving each config setting a single name instead of having “ui”, “shuffle”, etc objects, since the chained calls to conf.ui.port.value look somewhat confusing.

Matei

On Jun 23, 2014, at 3:57 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> I started with some code to implement an idea I had for SPARK-529, and
> before going much further (since it's a large and kinda boring change)
> I'd like to get some feedback from people.
> 
> Current code it at:
> https://github.com/vanzin/spark/tree/SPARK-529
> 
> There are still some parts I haven't fully fleshed out yet (see TODO
> list in the commit message), but that's the basic idea. Let me know if
> you have any feedback or different ideas.
> 
> Thanks!
> 
> 
> -- 
> Marcelo