You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Marcin Tustin <mt...@handybook.com> on 2016/04/13 15:15:31 UTC

Should localProperties be inheritable? Should we change that or document it?

*Tl;dr: *SparkContext.setLocalProperty is implemented with
InheritableThreadLocal.
This has unexpected consequences, not least because the method
documentation doesn't say anything about it:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605

I'd like to propose that we do one of: (1) document explicitly that these
properties are inheritable; (2) stop them being inheritable; or (3)
introduce the option to set these in a non-inheritable way.

*Motivation: *This started with me investigating a last vestige of the
leaking spark.sql.execution.id issue in Spark 1.5.2 (it's not reproducible
under controlled conditions, and given the many and excellent fixes on this
issue it's completely mysterious that this hangs around; the bug itself is
largely beside the point).

The specific contribution that inheritable localProperties makes to this
problem is that if a localProperty like spark.sql.execution.id leaks (i.e.
remains set when it shouldn't) because those properties are inherited by
spawned threads, that pollution affects all subsequently spawned threads.

This doesn't sound like a big deal - why would worker threads be spawning
other threads? It turns out that Java's ThreadPoolExecutor has worker
threads spawn other worker threads (it has no master dispatcher thread; the
workers themselves run all the housekeeping). JavaDoc here:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
and source code here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/ThreadPoolExecutor.java#ThreadPoolExecutor

Accordingly, if using Scala Futures and any kind of thread pool that comes
built-in with Java, it's impossible to avoid localproperties propagating
haphazardly to different threads. For localProperties explicitly set by
user code this isn't nice, and requires work arounds like explicitly
clearing known properties at the start of every future, or in a
beforeExecute hook on the threadpool. For leaky properties the work around
is pretty much the same: defensively clear them in the threadpool.

*Options:*
(0) Do nothing at all. Unattractive, because documenting this would still
be better;
(1) Update the scaladoc to explicitly say that localProperties are
inherited by spawned threads and note that caution should be exercised with
thread pools.
(2) Switch to using ordinary, non-inheritable thread locals. I assume this
would break something for somebody, but if not, this would be my preferred
option. Also a very simple change to implement if no-one is relying on
property inheritance.
(3) Introduce a second localProperty facility which is not inherited. This
would not break any existing code, and should not be too hard to implement.
localProperties which need cleanup could be migrated to using this
non-inheritable facility, helping to limit the impact of failing to clean
up.
The way I envisage this working is that non-inheritable localProperties
would be checked first, then inheritable, then global properties.

*Actions:*
I'm happy to do the coding and open such Jira tickets as desirable or
necessary. Before I do any of that, I'd like to know if there's any support
for this, and ideally secure a committer who can help shepherd this change
through.

Marcin Tustin

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity


Re: Should localProperties be inheritable? Should we change that or document it?

Posted by Marcin Tustin <mt...@handybook.com>.
It would be a pleasure. That said, what do you think about adding the
non-inheritable feature? I think that would be a big win for everything
that doesn't specifically need Inheritability.

On Friday, April 15, 2016, Reynold Xin <rx...@databricks.com> wrote:

> I think this was added a long time ago by me in order to make certain
> things work for Shark (good old times ...). You are probably right that by
> now some apps depend on the fact that this is inheritable, and changing
> that could break them in weird ways.
>
> Do you mind documenting this, and also add a test case?
>
>
> On Wed, Apr 13, 2016 at 6:15 AM, Marcin Tustin <mtustin@handybook.com
> <javascript:_e(%7B%7D,'cvml','mtustin@handybook.com');>> wrote:
>
>> *Tl;dr: *SparkContext.setLocalProperty is implemented with
>> InheritableThreadLocal.
>> This has unexpected consequences, not least because the method
>> documentation doesn't say anything about it:
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605
>>
>> I'd like to propose that we do one of: (1) document explicitly that these
>> properties are inheritable; (2) stop them being inheritable; or (3)
>> introduce the option to set these in a non-inheritable way.
>>
>> *Motivation: *This started with me investigating a last vestige of the
>> leaking spark.sql.execution.id issue in Spark 1.5.2 (it's not
>> reproducible under controlled conditions, and given the many and excellent
>> fixes on this issue it's completely mysterious that this hangs around; the
>> bug itself is largely beside the point).
>>
>> The specific contribution that inheritable localProperties makes to this
>> problem is that if a localProperty like spark.sql.execution.id leaks
>> (i.e. remains set when it shouldn't) because those properties are inherited
>> by spawned threads, that pollution affects all subsequently spawned threads.
>>
>> This doesn't sound like a big deal - why would worker threads be spawning
>> other threads? It turns out that Java's ThreadPoolExecutor has worker
>> threads spawn other worker threads (it has no master dispatcher thread; the
>> workers themselves run all the housekeeping). JavaDoc here:
>> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
>> and source code here:
>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/ThreadPoolExecutor.java#ThreadPoolExecutor
>>
>> Accordingly, if using Scala Futures and any kind of thread pool that
>> comes built-in with Java, it's impossible to avoid localproperties
>> propagating haphazardly to different threads. For localProperties
>> explicitly set by user code this isn't nice, and requires work arounds like
>> explicitly clearing known properties at the start of every future, or in a
>> beforeExecute hook on the threadpool. For leaky properties the work around
>> is pretty much the same: defensively clear them in the threadpool.
>>
>> *Options:*
>> (0) Do nothing at all. Unattractive, because documenting this would still
>> be better;
>> (1) Update the scaladoc to explicitly say that localProperties are
>> inherited by spawned threads and note that caution should be exercised with
>> thread pools.
>> (2) Switch to using ordinary, non-inheritable thread locals. I assume
>> this would break something for somebody, but if not, this would be my
>> preferred option. Also a very simple change to implement if no-one is
>> relying on property inheritance.
>> (3) Introduce a second localProperty facility which is not inherited.
>> This would not break any existing code, and should not be too hard to
>> implement. localProperties which need cleanup could be migrated to using
>> this non-inheritable facility, helping to limit the impact of failing to
>> clean up.
>> The way I envisage this working is that non-inheritable localProperties
>> would be checked first, then inheritable, then global properties.
>>
>> *Actions:*
>> I'm happy to do the coding and open such Jira tickets as desirable or
>> necessary. Before I do any of that, I'd like to know if there's any support
>> for this, and ideally secure a committer who can help shepherd this change
>> through.
>>
>> Marcin Tustin
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>> by Fidelity
>>
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity


Re: Should localProperties be inheritable? Should we change that or document it?

Posted by Reynold Xin <rx...@databricks.com>.
I think this was added a long time ago by me in order to make certain
things work for Shark (good old times ...). You are probably right that by
now some apps depend on the fact that this is inheritable, and changing
that could break them in weird ways.

Do you mind documenting this, and also add a test case?


On Wed, Apr 13, 2016 at 6:15 AM, Marcin Tustin <mt...@handybook.com>
wrote:

> *Tl;dr: *SparkContext.setLocalProperty is implemented with
> InheritableThreadLocal.
> This has unexpected consequences, not least because the method
> documentation doesn't say anything about it:
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605
>
> I'd like to propose that we do one of: (1) document explicitly that these
> properties are inheritable; (2) stop them being inheritable; or (3)
> introduce the option to set these in a non-inheritable way.
>
> *Motivation: *This started with me investigating a last vestige of the
> leaking spark.sql.execution.id issue in Spark 1.5.2 (it's not
> reproducible under controlled conditions, and given the many and excellent
> fixes on this issue it's completely mysterious that this hangs around; the
> bug itself is largely beside the point).
>
> The specific contribution that inheritable localProperties makes to this
> problem is that if a localProperty like spark.sql.execution.id leaks
> (i.e. remains set when it shouldn't) because those properties are inherited
> by spawned threads, that pollution affects all subsequently spawned threads.
>
> This doesn't sound like a big deal - why would worker threads be spawning
> other threads? It turns out that Java's ThreadPoolExecutor has worker
> threads spawn other worker threads (it has no master dispatcher thread; the
> workers themselves run all the housekeeping). JavaDoc here:
> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
> and source code here:
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/ThreadPoolExecutor.java#ThreadPoolExecutor
>
> Accordingly, if using Scala Futures and any kind of thread pool that comes
> built-in with Java, it's impossible to avoid localproperties propagating
> haphazardly to different threads. For localProperties explicitly set by
> user code this isn't nice, and requires work arounds like explicitly
> clearing known properties at the start of every future, or in a
> beforeExecute hook on the threadpool. For leaky properties the work around
> is pretty much the same: defensively clear them in the threadpool.
>
> *Options:*
> (0) Do nothing at all. Unattractive, because documenting this would still
> be better;
> (1) Update the scaladoc to explicitly say that localProperties are
> inherited by spawned threads and note that caution should be exercised with
> thread pools.
> (2) Switch to using ordinary, non-inheritable thread locals. I assume this
> would break something for somebody, but if not, this would be my preferred
> option. Also a very simple change to implement if no-one is relying on
> property inheritance.
> (3) Introduce a second localProperty facility which is not inherited. This
> would not break any existing code, and should not be too hard to implement.
> localProperties which need cleanup could be migrated to using this
> non-inheritable facility, helping to limit the impact of failing to clean
> up.
> The way I envisage this working is that non-inheritable localProperties
> would be checked first, then inheritable, then global properties.
>
> *Actions:*
> I'm happy to do the coding and open such Jira tickets as desirable or
> necessary. Before I do any of that, I'd like to know if there's any support
> for this, and ideally secure a committer who can help shepherd this change
> through.
>
> Marcin Tustin
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>