You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Marcelo Vanzin <va...@cloudera.com> on 2014/06/02 19:05:06 UTC

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Hi Patrick,

Thanks for all the explanations, that makes sense. @DeveloperApi
worries me a little bit especially because of the things Colin
mentions - it's sort of hard to make people move off of APIs, or
support different versions of the same API. But maybe if expectations
(or lack thereof) are set up front, there will be less issues.

You mentioned something in your shading argument that kinda reminded
me of something. Spark currently depends on slf4j implementations and
log4j with "compile" scope. I'd argue that's the wrong approach if
we're talking about Spark being used embedded inside applications;
Spark should only depend on the slf4j API package, and let the
application provide the underlying implementation.

The assembly jars could include an implementation (since I assume
those are currently targeted at cluster deployment and not embedding).

That way there is less sources of conflict at runtime (i.e. the
"multiple implementation jars" messages you can see when running some
Spark programs).

On Fri, May 30, 2014 at 10:54 PM, Patrick Wendell <pw...@gmail.com> wrote:
> 2. Many libraries like logging subsystems, configuration systems, etc
> rely on static state and initialization. I'm not totally sure how e.g.
> slf4j initializes itself if you have both a shaded and non-shaded copy
> of slf4j present.

-- 
Marcelo

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Posted by Sean Owen <so...@cloudera.com>.

On Mon, Jun 2, 2014 at 6:05 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> You mentioned something in your shading argument that kinda reminded
> me of something. Spark currently depends on slf4j implementations and
> log4j with "compile" scope. I'd argue that's the wrong approach if
> we're talking about Spark being used embedded inside applications;
> Spark should only depend on the slf4j API package, and let the
> application provide the underlying implementation.

Good idea in general; in practice, the drawback is that you can't do
things like set log levels if you only depend on the SLF4J API. There
are a few cases where that's nice to control, and that's only possible
if you bind to a particular logger as well.

You typically bundle a SLF4J binding anyway, to give a default, or
else the end-user has to know to also bind some SLF4J logger to get
output. Of course it does make for a bit more surgery if you want to
override the binding this way.

Shading can bring a whole new level of confusion; I myself would only
use it where essential as a workaround. Same with trying to make more
elaborate custom classloading schemes -- never in my darkest
nightmares have I imagine the failure modes that probably pop up when
that goes wrong. I think the library collisions will get better over
time as only later versions of Hadoop are in scope, for example,
and/or one build system is in play. I like tackling complexity along
those lines first.