You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Amos Elberg <am...@gmail.com> on 2016/03/28 02:21:47 UTC

[discuss] PR 789

An issue has come up regarding PR 789 that I feel should be a community
discussion.

The PR takes configuration preferences, and converts them into environment
variables, in particular for VMs launched as independent processes.  It is
not requested functionality and nothing else depends on it.

I don't think we should merge this until there's an interface to support
it, because I think its going to lead to additional user misconfiguration
issues in an area where we already have too many.

Right now, Zeppelin allows many configuration options to be set in 2, 3, 4,
or 5 places (depending on the option), without any indication to the user
of what's being used, any check for conflicts, etc.

The problem arises when a user inputs conflicting configuration choices in
different places.  This is actually very easy to do because the same
options are set in so many places, and I've seen it more than a few times.
If there's a config conflict, Zeppelin will behave in an unexpected manner
or fail, and the problem becomes difficult to diagnose because the user is
*sure* they've set that configuration option correctly (which, of course,
they have).

This is a design issue:  We haven't specified an order of precedence.  The
example I've given on multiple occasions is what happens if a user
specifies one SPARK_HOME and a different spark.home?  Right now (unless its
changed recently) the Spark interpreter will use SPARK_HOME, but the
PySpark Interpreter -- which wants to connect through the Spark Interpreter
-- will use spark.home.  The range of failure modes is obvious.

This PR is likely to make that worse, because it converts some
configuration options, but not others, into environment variables that,
depending on Zeppelin's configuration, will or won't get propagated to some
but not all launched interpreters.   That's terribly complex, and its sure
to make a confusing situation even more confusing.

I therefore think we should wait on this until we have an interface to
support it -- one that clearly indicates to the user what configuration
settings are being taken from where, and why.

Re: [discuss] PR 789

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Thanks for your feedback about PR-789.
It's bit difficult to catch your point but, i tried best to summarize your
feedback.

1. PR should have separate gui interface for environment variable
configuration.
2. Configurations are placed multiple places. That cause conflict.
3. Order of precedence of applying environment variable and JVM property is
not clear. for example SPARK_HOME and spark.home.
4. PR set some property as environment variable and some JVM property,
which is complex and confusing.

Is my understanding correct? there're anything missing?

Thanks,
moon


On Sun, Mar 27, 2016 at 5:22 PM Amos Elberg <am...@gmail.com> wrote:

> An issue has come up regarding PR 789 that I feel should be a community
> discussion.
>
> The PR takes configuration preferences, and converts them into environment
> variables, in particular for VMs launched as independent processes.  It is
> not requested functionality and nothing else depends on it.
>
> I don't think we should merge this until there's an interface to support
> it, because I think its going to lead to additional user misconfiguration
> issues in an area where we already have too many.
>
> Right now, Zeppelin allows many configuration options to be set in 2, 3, 4,
> or 5 places (depending on the option), without any indication to the user
> of what's being used, any check for conflicts, etc.
>
> The problem arises when a user inputs conflicting configuration choices in
> different places.  This is actually very easy to do because the same
> options are set in so many places, and I've seen it more than a few times.
> If there's a config conflict, Zeppelin will behave in an unexpected manner
> or fail, and the problem becomes difficult to diagnose because the user is
> *sure* they've set that configuration option correctly (which, of course,
> they have).
>
> This is a design issue:  We haven't specified an order of precedence.  The
> example I've given on multiple occasions is what happens if a user
> specifies one SPARK_HOME and a different spark.home?  Right now (unless its
> changed recently) the Spark interpreter will use SPARK_HOME, but the
> PySpark Interpreter -- which wants to connect through the Spark Interpreter
> -- will use spark.home.  The range of failure modes is obvious.
>
> This PR is likely to make that worse, because it converts some
> configuration options, but not others, into environment variables that,
> depending on Zeppelin's configuration, will or won't get propagated to some
> but not all launched interpreters.   That's terribly complex, and its sure
> to make a confusing situation even more confusing.
>
> I therefore think we should wait on this until we have an interface to
> support it -- one that clearly indicates to the user what configuration
> settings are being taken from where, and why.
>