You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Paul Brown <pr...@mult.ifario.us> on 2014/02/07 08:41:02 UTC

0.9.0 forces log4j usage

We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we
were able to use slf4j, but 0.9.0 broke that and unintentionally forces
direct use of log4j as the logging backend.

The issue is here in the org.apache.spark.Logging trait:

https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107

log4j-over-slf4j *always* returns an empty enumeration for appenders to the
ROOT logger:

https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81

And this causes an infinite loop and an eventual stack overflow.

I'm happy to submit a Jira and a patch, but it would be significant enough
reversal of recent changes that it's probably worth discussing before I
sink a half hour into it.  My suggestion would be that initialization (or
not) should be left to the user with reasonable default behavior supplied
by the spark commandline tooling and not forced on applications that
incorporate Spark.

Thoughts/opinions?

-- Paul
—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
The fix for this was just merged into branch 0.9 (will be in 0.9.1+) and master.

On Sun, Feb 9, 2014 at 11:44 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Thanks Paul - it isn't mean to be a "full solution" but just a fix for
> the 0.9 branch - for the full solution there is another PR by Sean
> Owen.
>
> On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>> Hi, Patrick --
>>
>> I gave that a go locally, and it works as desired.
>>
>> Best.
>> -- Paul
>>
>> --
>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>
>>
>> On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>
>>> Ah okay sounds good. This is what I meant earlier by "You have
>>> some other application that directly calls log4j."... i.e. you have
>>> for historical reasons installed the log4j-over-slf4j.
>>>
>>> Would you mind trying out this fix and seeing if it works? This is
>>> designed to be a hotfix for 0.9, not a general solution where we rip
>>> out log4j from our published dependencies:
>>>
>>> https://github.com/apache/incubator-spark/pull/560/files
>>>
>>> - Patrick
>>>
>>> On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>>> > Hi, Patrick --
>>> >
>>> > I forget which other component is responsible, but we're using the
>>> > log4j-over-slf4j as part of an overall requirement to centralize logging,
>>> > i.e., *someone* else is logging over log4j and we're pulling that in.
>>> >  (There's also some jul logging from Jersey, etc.)
>>> >
>>> > Goals:
>>> >
>>> > - Fully control/capture all possible logging.  (God forbid we have to
>>> grab
>>> > System.out/err, but we'd do it if needed.)
>>> > - Use the backend we like best at the moment.  (Happens to be logback.)
>>> >
>>> > Possible cases:
>>> >
>>> > - If Spark used Log4j at all, we would pull in that logging via
>>> > log4j-over-slf4j.
>>> > - If Spark used only slf4j and referenced no backend, we would use it
>>> as-is
>>> > although we'd still have the log4j-over-slf4j because of other libraries.
>>> > - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
>>> > would exclude that one dependency (via our POM).
>>> >
>>> > Best.
>>> > -- Paul
>>> >
>>> >
>>> > --
>>> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>> >
>>> >
>>> > On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>> >
>>> >> Hey Paul,
>>> >>
>>> >> So if your goal is ultimately to output to logback. Then why don't you
>>> >> just use slf4j and logback-classic.jar as described here [1]. Why
>>> >> involve log4j-over-slf4j at all?
>>> >>
>>> >> Let's say we refactored the spark build so it didn't advertise
>>> >> slf4j-log4j12 as a dependency. Would you still be using
>>> >> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
>>> >> Spark is somewhat log4j dependent at this point.
>>> >>
>>> >> [1] http://www.slf4j.org/manual.html
>>> >>
>>> >> - Patrick
>>> >>
>>> >> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>>> >> > Hi, Patrick --
>>> >> >
>>> >> > That's close but not quite it.
>>> >> >
>>> >> > The issue that occurs is not the delegation loop mentioned in slf4j
>>> >> > documentation.  The stack overflow is entirely within the code in the
>>> >> Spark
>>> >> > trait:
>>> >> >
>>> >> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
>>> >> > at
>>> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
>>> >> > at org.apache.spark.Logging$class.log(Logging.scala:36)
>>> >> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
>>> >> >
>>> >> >
>>> >> > And then that repeats.
>>> >> >
>>> >> > As for our situation, we exclude the slf4j-log4j12 dependency when we
>>> >> > import the Spark library (because we don't want to use log4j) and have
>>> >> > log4j-over-slf4j already in place to ensure that all of the logging in
>>> >> the
>>> >> > overall application runs through slf4j and then out through logback.
>>>  (We
>>> >> > also, as another poster already mentioned, also force jcl and jul
>>> through
>>> >> > slf4j.)
>>> >> >
>>> >> > The zen of slf4j for libraries is that the library uses the slf4j API
>>> and
>>> >> > then the enclosing application can route logging as it sees fit.
>>>  Spark
>>> >> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
>>> >> same
>>> >> > for Spark worker CLI.  Spark as a library (versus as a container)
>>> would
>>> >> not
>>> >> > include any backend to the slf4j API and leave this up to the
>>> >> application.
>>> >> >  (FWIW, this would also avoid your log4j warning message.)
>>> >> >
>>> >> > But as I was saying before, I'd be happy with a situation where I can
>>> >> avoid
>>> >> > log4j being enabled or configured, and I think you'll find an existing
>>> >> > choice of logging framework to be a common scenario for those
>>> embedding
>>> >> > Spark in other systems.
>>> >> >
>>> >> > Best.
>>> >> > -- Paul
>>> >> >
>>> >> > --
>>> >> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>> >> >
>>> >> >
>>> >> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> >> Paul,
>>> >> >>
>>> >> >> Looking back at your problem. I think it's the one here:
>>> >> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
>>> >> >>
>>> >> >> So let me just be clear what you are doing so I understand. You have
>>> >> >> some other application that directly calls log4j. So you have to
>>> >> >> include log4j-over-slf4j to route those logs through slf4j to
>>> logback.
>>> >> >>
>>> >> >> At the same time you embed Spark in this application. In the past it
>>> >> >> was fine, but now that Spark programmatic ally initializes log4j, it
>>> >> >> screws up your application because log4j-over-slf4j doesn't work with
>>> >> >> applications that do this explicilty as discussed here:
>>> >> >> http://www.slf4j.org/legacy.html
>>> >> >>
>>> >> >> Correct?
>>> >> >>
>>> >> >> - Patrick
>>> >> >>
>>> >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com>
>>> >> wrote:
>>> >> >> > got it. that sounds reasonable
>>> >> >> >
>>> >> >> >
>>> >> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <
>>> pwendell@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Koert - my suggestion was this. We let users use any slf4j backend
>>> >> >> >> they want. If we detect that they are using the log4j backend and
>>> >> >> >> *also* they didn't configure any log4j appenders, we set up some
>>> nice
>>> >> >> >> defaults for them. If they are using another backend, Spark
>>> doesn't
>>> >> >> >> try to modify the configuration at all.
>>> >> >> >>
>>> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <koert@tresata.com
>>> >
>>> >> >> wrote:
>>> >> >> >> > well "static binding" is probably the wrong terminology but you
>>> get
>>> >> >> the
>>> >> >> >> > idea. multiple backends are not allowed and cause an even uglier
>>> >> >> >> warning...
>>> >> >> >> >
>>> >> >> >> > see also here:
>>> >> >> >> > https://github.com/twitter/scalding/pull/636
>>> >> >> >> > and here:
>>> >> >> >> >
>>> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>>> >> >> >> > all me being annoying and complaining about slf4j-log4j12
>>> >> dependencies
>>> >> >> >> > (which did get removed).
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <
>>> koert@tresata.com>
>>> >> >> wrote:
>>> >> >> >> >
>>> >> >> >> >> the issue is that slf4j uses static binding. you can put only
>>> one
>>> >> >> slf4j
>>> >> >> >> >> backend on the classpath, and that's what it uses. more than
>>> one
>>> >> is
>>> >> >> not
>>> >> >> >> >> allowed.
>>> >> >> >> >>
>>> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and
>>> >> then
>>> >> >> you
>>> >> >> >> >> took away people's choice of slf4j backend which is considered
>>> bad
>>> >> >> form
>>> >> >> >> for
>>> >> >> >> >> a library, or you do not include it and then people will always
>>> >> get
>>> >> >> the
>>> >> >> >> big
>>> >> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
>>> >> >> >> >>
>>> >> >> >> >> including log4j itself is not necessary a problem i think?
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
>>> >> pwendell@gmail.com
>>> >> >> >> >wrote:
>>> >> >> >> >>
>>> >> >> >> >>> This also seems relevant - but not my area of expertise
>>> (whether
>>> >> >> this
>>> >> >> >> >>> is a valid way to check this).
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >>
>>> >> >>
>>> >>
>>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>>> >> >> >> >>>
>>> >> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
>>> >> >> pwendell@gmail.com>
>>> >> >> >> >>> wrote:
>>> >> >> >> >>> > Hey Guys,
>>> >> >> >> >>> >
>>> >> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't
>>> really
>>> >> >> know
>>> >> >> >> >>> > that people are using other slf4j backends, slf4j is in
>>> there
>>> >> for
>>> >> >> >> >>> > historical reasons but I think we may assume in a few places
>>> >> that
>>> >> >> >> >>> > log4j is being used and we should minimize those.
>>> >> >> >> >>> >
>>> >> >> >> >>> > We should patch this and get a fix into 0.9.1. So some
>>> >> solutions I
>>> >> >> >> see
>>> >> >> >> >>> are:
>>> >> >> >> >>> >
>>> >> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this
>>> >> one.
>>> >> >> >> >>> >
>>> >> >> >> >>> > (b) Ask slf4j which backend is active and only try to
>>> enforce
>>> >> this
>>> >> >> >> >>> > default if we know slf4j is using log4j. Do either of you
>>> know
>>> >> if
>>> >> >> >> this
>>> >> >> >> >>> > is possible? Not sure if slf4j exposes this.
>>> >> >> >> >>> >
>>> >> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this.
>>> >> The
>>> >> >> goal
>>> >> >> >> >>> > of this thing is to provide good usability for people who
>>> have
>>> >> >> linked
>>> >> >> >> >>> > against Spark and haven't done anything to configure
>>> logging.
>>> >> For
>>> >> >> >> >>> > beginners we try to minimize the assumptions about what else
>>> >> they
>>> >> >> >> know
>>> >> >> >> >>> > about, and I've found log4j configuration is a huge mental
>>> >> barrier
>>> >> >> >> for
>>> >> >> >> >>> > people who are getting started.
>>> >> >> >> >>> >
>>> >> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If
>>> you
>>> >> >> have
>>> >> >> >> >>> > any idea if (b) is possible I prefer that one, but it may
>>> not
>>> >> be
>>> >> >> >> >>> > possible or might be brittle.
>>> >> >> >> >>> >
>>> >> >> >> >>> > - Patrick
>>> >> >> >> >>> >
>>> >> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
>>> >> koert@tresata.com>
>>> >> >> >> >>> wrote:
>>> >> >> >> >>> >> Totally agree with Paul: a library should not pick the
>>> slf4j
>>> >> >> >> backend.
>>> >> >> >> >>> It
>>> >> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is
>>> there
>>> >> to
>>> >> >> >> alert
>>> >> >> >> >>> >> people that its their responsibility to pick the back
>>> end...
>>> >> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us>
>>> >> wrote:
>>> >> >> >> >>> >>
>>> >> >> >> >>> >>> Hi, Patrick --
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is
>>> >> the
>>> >> >> way
>>> >> >> >> >>> that
>>> >> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and
>>> >> then
>>> >> >> on
>>> >> >> >> to
>>> >> >> >> >>> a
>>> >> >> >> >>> >>> different backend (e.g., logback).  We're doing the latter
>>> >> and
>>> >> >> >> >>> manipulating
>>> >> >> >> >>> >>> the dependencies in the build because that's the way the
>>> >> >> enclosing
>>> >> >> >> >>> >>> application is set up.
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> The issue with the current situation is that there's no
>>> way
>>> >> for
>>> >> >> an
>>> >> >> >> >>> end user
>>> >> >> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
>>> >> >> solution
>>> >> >> >> >>> was to
>>> >> >> >> >>> >>> use the Maven shade plugin to swap in a version of the
>>> >> Logging
>>> >> >> >> trait
>>> >> >> >> >>> with
>>> >> >> >> >>> >>> the body of that method commented out.)  In addition to
>>> the
>>> >> >> >> situation
>>> >> >> >> >>> with
>>> >> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT
>>> appenders,
>>> >> >> you
>>> >> >> >> >>> might
>>> >> >> >> >>> >>> also run afoul of someone who intentionally configured
>>> log4j
>>> >> >> with
>>> >> >> >> an
>>> >> >> >> >>> empty
>>> >> >> >> >>> >>> set of appenders at the time that Spark is initializing.
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> I'd be happy with any implementation that lets me choose
>>> my
>>> >> >> logging
>>> >> >> >> >>> >>> backend: override default behavior via system property,
>>> >> plug-in
>>> >> >> >> >>> >>> architecture, etc.  I do think it's reasonable to expect
>>> >> someone
>>> >> >> >> >>> digesting
>>> >> >> >> >>> >>> a substantial JDK-based system like Spark to understand
>>> how
>>> >> to
>>> >> >> >> >>> initialize
>>> >> >> >> >>> >>> logging -- surely they're using logging of some kind
>>> >> elsewhere
>>> >> >> in
>>> >> >> >> >>> their
>>> >> >> >> >>> >>> application -- but if you want the default behavior there
>>> as
>>> >> a
>>> >> >> >> >>> courtesy, it
>>> >> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j
>>> >> WARN)
>>> >> >> >> >>> message on
>>> >> >> >> >>> >>> the output that says something like "Initialized default
>>> >> logging
>>> >> >> >> via
>>> >> >> >> >>> Log4J;
>>> >> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable
>>> this
>>> >> >> >> >>> behavior." so
>>> >> >> >> >>> >>> that it's both convenient and explicit.
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> Cheers.
>>> >> >> >> >>> >>> -- Paul
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> --
>>> >> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
>>> >> >> http://mult.ifario.us/
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
>>> >> >> >> pwendell@gmail.com>
>>> >> >> >> >>> >>> wrote:
>>> >> >> >> >>> >>>
>>> >> >> >> >>> >>> > A config option e.g. could just be to add:
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
>>> >> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j
>>> >> logger if
>>> >> >> >> none
>>> >> >> >> >>> is
>>> >> >> >> >>> >>> > detected. Otherwise Spark will not modify logging
>>> behavior.
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>> > Then users could just set this to false if they have a
>>> >> logging
>>> >> >> >> >>> set-up
>>> >> >> >> >>> >>> > that conflicts with this.
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>> > Maybe there is a nicer fix...
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>>> >> >> >> >>> pwendell@gmail.com>
>>> >> >> >> >>> >>> > wrote:
>>> >> >> >> >>> >>> > > Hey Paul,
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature
>>> and
>>> >> the
>>> >> >> >> >>> intent
>>> >> >> >> >>> >>> > > was to give users good default behavior if they didn't
>>> >> >> include
>>> >> >> >> any
>>> >> >> >> >>> >>> > > logging configuration on the classpath.
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > The problem with assuming that CL tooling is going to
>>> fix
>>> >> >> the
>>> >> >> >> job
>>> >> >> >> >>> is
>>> >> >> >> >>> >>> > > that many people link against spark as a library and
>>> run
>>> >> >> their
>>> >> >> >> >>> >>> > > application using their own scripts. In this case the
>>> >> first
>>> >> >> >> thing
>>> >> >> >> >>> >>> > > people see when they run an application that links
>>> >> against
>>> >> >> >> Spark
>>> >> >> >> >>> was a
>>> >> >> >> >>> >>> > > big ugly logging warning.
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
>>> >> >> >> behavior of
>>> >> >> >> >>> >>> > > returning null for the appenders seems a little weird.
>>> >> What
>>> >> >> is
>>> >> >> >> >>> the use
>>> >> >> >> >>> >>> > > case for using this and not just directly use
>>> >> slf4j-log4j12
>>> >> >> >> like
>>> >> >> >> >>> Spark
>>> >> >> >> >>> >>> > > itself does?
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or
>>> was
>>> >> >> your
>>> >> >> >> >>> plan to
>>> >> >> >> >>> >>> > > just revert the existing behavior... We might be able
>>> to
>>> >> >> add a
>>> >> >> >> >>> >>> > > configuration option to disable this logging default
>>> >> stuff.
>>> >> >> Or
>>> >> >> >> we
>>> >> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
>>> >> >> possible.
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > - Patrick
>>> >> >> >> >>> >>> > >
>>> >> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
>>> >> >> >> prb@mult.ifario.us>
>>> >> >> >> >>> >>> wrote:
>>> >> >> >> >>> >>> > >> We have a few applications that embed Spark, and in
>>> >> 0.8.0
>>> >> >> and
>>> >> >> >> >>> 0.8.1,
>>> >> >> >> >>> >>> we
>>> >> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
>>> >> >> >> unintentionally
>>> >> >> >> >>> >>> forces
>>> >> >> >> >>> >>> > >> direct use of log4j as the logging backend.
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging
>>> trait:
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>>
>>> >> >> >> >>>
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty
>>> enumeration
>>> >> for
>>> >> >> >> >>> appenders
>>> >> >> >> >>> >>> to
>>> >> >> >> >>> >>> > the
>>> >> >> >> >>> >>> > >> ROOT logger:
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>>
>>> >> >> >> >>>
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual
>>> stack
>>> >> >> >> overflow.
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would
>>> be
>>> >> >> >> >>> significant
>>> >> >> >> >>> >>> > enough
>>> >> >> >> >>> >>> > >> reversal of recent changes that it's probably worth
>>> >> >> discussing
>>> >> >> >> >>> before
>>> >> >> >> >>> >>> I
>>> >> >> >> >>> >>> > >> sink a half hour into it.  My suggestion would be
>>> that
>>> >> >> >> >>> initialization
>>> >> >> >> >>> >>> > (or
>>> >> >> >> >>> >>> > >> not) should be left to the user with reasonable
>>> default
>>> >> >> >> behavior
>>> >> >> >> >>> >>> > supplied
>>> >> >> >> >>> >>> > >> by the spark commandline tooling and not forced on
>>> >> >> >> applications
>>> >> >> >> >>> that
>>> >> >> >> >>> >>> > >> incorporate Spark.
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> Thoughts/opinions?
>>> >> >> >> >>> >>> > >>
>>> >> >> >> >>> >>> > >> -- Paul
>>> >> >> >> >>> >>> > >> --
>>> >> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
>>> >> >> >> http://mult.ifario.us/
>>> >> >> >> >>> >>> >
>>> >> >> >> >>> >>>
>>> >> >> >> >>>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Thanks Paul - it isn't mean to be a "full solution" but just a fix for
the 0.9 branch - for the full solution there is another PR by Sean
Owen.

On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> Hi, Patrick --
>
> I gave that a go locally, and it works as desired.
>
> Best.
> -- Paul
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell <pw...@gmail.com> wrote:
>
>> Ah okay sounds good. This is what I meant earlier by "You have
>> some other application that directly calls log4j."... i.e. you have
>> for historical reasons installed the log4j-over-slf4j.
>>
>> Would you mind trying out this fix and seeing if it works? This is
>> designed to be a hotfix for 0.9, not a general solution where we rip
>> out log4j from our published dependencies:
>>
>> https://github.com/apache/incubator-spark/pull/560/files
>>
>> - Patrick
>>
>> On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>> > Hi, Patrick --
>> >
>> > I forget which other component is responsible, but we're using the
>> > log4j-over-slf4j as part of an overall requirement to centralize logging,
>> > i.e., *someone* else is logging over log4j and we're pulling that in.
>> >  (There's also some jul logging from Jersey, etc.)
>> >
>> > Goals:
>> >
>> > - Fully control/capture all possible logging.  (God forbid we have to
>> grab
>> > System.out/err, but we'd do it if needed.)
>> > - Use the backend we like best at the moment.  (Happens to be logback.)
>> >
>> > Possible cases:
>> >
>> > - If Spark used Log4j at all, we would pull in that logging via
>> > log4j-over-slf4j.
>> > - If Spark used only slf4j and referenced no backend, we would use it
>> as-is
>> > although we'd still have the log4j-over-slf4j because of other libraries.
>> > - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
>> > would exclude that one dependency (via our POM).
>> >
>> > Best.
>> > -- Paul
>> >
>> >
>> > --
>> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >
>> >
>> > On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> >
>> >> Hey Paul,
>> >>
>> >> So if your goal is ultimately to output to logback. Then why don't you
>> >> just use slf4j and logback-classic.jar as described here [1]. Why
>> >> involve log4j-over-slf4j at all?
>> >>
>> >> Let's say we refactored the spark build so it didn't advertise
>> >> slf4j-log4j12 as a dependency. Would you still be using
>> >> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
>> >> Spark is somewhat log4j dependent at this point.
>> >>
>> >> [1] http://www.slf4j.org/manual.html
>> >>
>> >> - Patrick
>> >>
>> >> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>> >> > Hi, Patrick --
>> >> >
>> >> > That's close but not quite it.
>> >> >
>> >> > The issue that occurs is not the delegation loop mentioned in slf4j
>> >> > documentation.  The stack overflow is entirely within the code in the
>> >> Spark
>> >> > trait:
>> >> >
>> >> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
>> >> > at
>> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
>> >> > at org.apache.spark.Logging$class.log(Logging.scala:36)
>> >> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
>> >> >
>> >> >
>> >> > And then that repeats.
>> >> >
>> >> > As for our situation, we exclude the slf4j-log4j12 dependency when we
>> >> > import the Spark library (because we don't want to use log4j) and have
>> >> > log4j-over-slf4j already in place to ensure that all of the logging in
>> >> the
>> >> > overall application runs through slf4j and then out through logback.
>>  (We
>> >> > also, as another poster already mentioned, also force jcl and jul
>> through
>> >> > slf4j.)
>> >> >
>> >> > The zen of slf4j for libraries is that the library uses the slf4j API
>> and
>> >> > then the enclosing application can route logging as it sees fit.
>>  Spark
>> >> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
>> >> same
>> >> > for Spark worker CLI.  Spark as a library (versus as a container)
>> would
>> >> not
>> >> > include any backend to the slf4j API and leave this up to the
>> >> application.
>> >> >  (FWIW, this would also avoid your log4j warning message.)
>> >> >
>> >> > But as I was saying before, I'd be happy with a situation where I can
>> >> avoid
>> >> > log4j being enabled or configured, and I think you'll find an existing
>> >> > choice of logging framework to be a common scenario for those
>> embedding
>> >> > Spark in other systems.
>> >> >
>> >> > Best.
>> >> > -- Paul
>> >> >
>> >> > --
>> >> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >> >
>> >> >
>> >> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Paul,
>> >> >>
>> >> >> Looking back at your problem. I think it's the one here:
>> >> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
>> >> >>
>> >> >> So let me just be clear what you are doing so I understand. You have
>> >> >> some other application that directly calls log4j. So you have to
>> >> >> include log4j-over-slf4j to route those logs through slf4j to
>> logback.
>> >> >>
>> >> >> At the same time you embed Spark in this application. In the past it
>> >> >> was fine, but now that Spark programmatic ally initializes log4j, it
>> >> >> screws up your application because log4j-over-slf4j doesn't work with
>> >> >> applications that do this explicilty as discussed here:
>> >> >> http://www.slf4j.org/legacy.html
>> >> >>
>> >> >> Correct?
>> >> >>
>> >> >> - Patrick
>> >> >>
>> >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com>
>> >> wrote:
>> >> >> > got it. that sounds reasonable
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <
>> pwendell@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Koert - my suggestion was this. We let users use any slf4j backend
>> >> >> >> they want. If we detect that they are using the log4j backend and
>> >> >> >> *also* they didn't configure any log4j appenders, we set up some
>> nice
>> >> >> >> defaults for them. If they are using another backend, Spark
>> doesn't
>> >> >> >> try to modify the configuration at all.
>> >> >> >>
>> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <koert@tresata.com
>> >
>> >> >> wrote:
>> >> >> >> > well "static binding" is probably the wrong terminology but you
>> get
>> >> >> the
>> >> >> >> > idea. multiple backends are not allowed and cause an even uglier
>> >> >> >> warning...
>> >> >> >> >
>> >> >> >> > see also here:
>> >> >> >> > https://github.com/twitter/scalding/pull/636
>> >> >> >> > and here:
>> >> >> >> >
>> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>> >> >> >> > all me being annoying and complaining about slf4j-log4j12
>> >> dependencies
>> >> >> >> > (which did get removed).
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <
>> koert@tresata.com>
>> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> the issue is that slf4j uses static binding. you can put only
>> one
>> >> >> slf4j
>> >> >> >> >> backend on the classpath, and that's what it uses. more than
>> one
>> >> is
>> >> >> not
>> >> >> >> >> allowed.
>> >> >> >> >>
>> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and
>> >> then
>> >> >> you
>> >> >> >> >> took away people's choice of slf4j backend which is considered
>> bad
>> >> >> form
>> >> >> >> for
>> >> >> >> >> a library, or you do not include it and then people will always
>> >> get
>> >> >> the
>> >> >> >> big
>> >> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
>> >> >> >> >>
>> >> >> >> >> including log4j itself is not necessary a problem i think?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
>> >> pwendell@gmail.com
>> >> >> >> >wrote:
>> >> >> >> >>
>> >> >> >> >>> This also seems relevant - but not my area of expertise
>> (whether
>> >> >> this
>> >> >> >> >>> is a valid way to check this).
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >>
>> >> >>
>> >>
>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>> >> >> >> >>>
>> >> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
>> >> >> pwendell@gmail.com>
>> >> >> >> >>> wrote:
>> >> >> >> >>> > Hey Guys,
>> >> >> >> >>> >
>> >> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't
>> really
>> >> >> know
>> >> >> >> >>> > that people are using other slf4j backends, slf4j is in
>> there
>> >> for
>> >> >> >> >>> > historical reasons but I think we may assume in a few places
>> >> that
>> >> >> >> >>> > log4j is being used and we should minimize those.
>> >> >> >> >>> >
>> >> >> >> >>> > We should patch this and get a fix into 0.9.1. So some
>> >> solutions I
>> >> >> >> see
>> >> >> >> >>> are:
>> >> >> >> >>> >
>> >> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this
>> >> one.
>> >> >> >> >>> >
>> >> >> >> >>> > (b) Ask slf4j which backend is active and only try to
>> enforce
>> >> this
>> >> >> >> >>> > default if we know slf4j is using log4j. Do either of you
>> know
>> >> if
>> >> >> >> this
>> >> >> >> >>> > is possible? Not sure if slf4j exposes this.
>> >> >> >> >>> >
>> >> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this.
>> >> The
>> >> >> goal
>> >> >> >> >>> > of this thing is to provide good usability for people who
>> have
>> >> >> linked
>> >> >> >> >>> > against Spark and haven't done anything to configure
>> logging.
>> >> For
>> >> >> >> >>> > beginners we try to minimize the assumptions about what else
>> >> they
>> >> >> >> know
>> >> >> >> >>> > about, and I've found log4j configuration is a huge mental
>> >> barrier
>> >> >> >> for
>> >> >> >> >>> > people who are getting started.
>> >> >> >> >>> >
>> >> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If
>> you
>> >> >> have
>> >> >> >> >>> > any idea if (b) is possible I prefer that one, but it may
>> not
>> >> be
>> >> >> >> >>> > possible or might be brittle.
>> >> >> >> >>> >
>> >> >> >> >>> > - Patrick
>> >> >> >> >>> >
>> >> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
>> >> koert@tresata.com>
>> >> >> >> >>> wrote:
>> >> >> >> >>> >> Totally agree with Paul: a library should not pick the
>> slf4j
>> >> >> >> backend.
>> >> >> >> >>> It
>> >> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is
>> there
>> >> to
>> >> >> >> alert
>> >> >> >> >>> >> people that its their responsibility to pick the back
>> end...
>> >> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us>
>> >> wrote:
>> >> >> >> >>> >>
>> >> >> >> >>> >>> Hi, Patrick --
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is
>> >> the
>> >> >> way
>> >> >> >> >>> that
>> >> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and
>> >> then
>> >> >> on
>> >> >> >> to
>> >> >> >> >>> a
>> >> >> >> >>> >>> different backend (e.g., logback).  We're doing the latter
>> >> and
>> >> >> >> >>> manipulating
>> >> >> >> >>> >>> the dependencies in the build because that's the way the
>> >> >> enclosing
>> >> >> >> >>> >>> application is set up.
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> The issue with the current situation is that there's no
>> way
>> >> for
>> >> >> an
>> >> >> >> >>> end user
>> >> >> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
>> >> >> solution
>> >> >> >> >>> was to
>> >> >> >> >>> >>> use the Maven shade plugin to swap in a version of the
>> >> Logging
>> >> >> >> trait
>> >> >> >> >>> with
>> >> >> >> >>> >>> the body of that method commented out.)  In addition to
>> the
>> >> >> >> situation
>> >> >> >> >>> with
>> >> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT
>> appenders,
>> >> >> you
>> >> >> >> >>> might
>> >> >> >> >>> >>> also run afoul of someone who intentionally configured
>> log4j
>> >> >> with
>> >> >> >> an
>> >> >> >> >>> empty
>> >> >> >> >>> >>> set of appenders at the time that Spark is initializing.
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> I'd be happy with any implementation that lets me choose
>> my
>> >> >> logging
>> >> >> >> >>> >>> backend: override default behavior via system property,
>> >> plug-in
>> >> >> >> >>> >>> architecture, etc.  I do think it's reasonable to expect
>> >> someone
>> >> >> >> >>> digesting
>> >> >> >> >>> >>> a substantial JDK-based system like Spark to understand
>> how
>> >> to
>> >> >> >> >>> initialize
>> >> >> >> >>> >>> logging -- surely they're using logging of some kind
>> >> elsewhere
>> >> >> in
>> >> >> >> >>> their
>> >> >> >> >>> >>> application -- but if you want the default behavior there
>> as
>> >> a
>> >> >> >> >>> courtesy, it
>> >> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j
>> >> WARN)
>> >> >> >> >>> message on
>> >> >> >> >>> >>> the output that says something like "Initialized default
>> >> logging
>> >> >> >> via
>> >> >> >> >>> Log4J;
>> >> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable
>> this
>> >> >> >> >>> behavior." so
>> >> >> >> >>> >>> that it's both convenient and explicit.
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> Cheers.
>> >> >> >> >>> >>> -- Paul
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> --
>> >> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
>> >> >> http://mult.ifario.us/
>> >> >> >> >>> >>>
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
>> >> >> >> pwendell@gmail.com>
>> >> >> >> >>> >>> wrote:
>> >> >> >> >>> >>>
>> >> >> >> >>> >>> > A config option e.g. could just be to add:
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
>> >> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j
>> >> logger if
>> >> >> >> none
>> >> >> >> >>> is
>> >> >> >> >>> >>> > detected. Otherwise Spark will not modify logging
>> behavior.
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>> > Then users could just set this to false if they have a
>> >> logging
>> >> >> >> >>> set-up
>> >> >> >> >>> >>> > that conflicts with this.
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>> > Maybe there is a nicer fix...
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>> >> >> >> >>> pwendell@gmail.com>
>> >> >> >> >>> >>> > wrote:
>> >> >> >> >>> >>> > > Hey Paul,
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature
>> and
>> >> the
>> >> >> >> >>> intent
>> >> >> >> >>> >>> > > was to give users good default behavior if they didn't
>> >> >> include
>> >> >> >> any
>> >> >> >> >>> >>> > > logging configuration on the classpath.
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > The problem with assuming that CL tooling is going to
>> fix
>> >> >> the
>> >> >> >> job
>> >> >> >> >>> is
>> >> >> >> >>> >>> > > that many people link against spark as a library and
>> run
>> >> >> their
>> >> >> >> >>> >>> > > application using their own scripts. In this case the
>> >> first
>> >> >> >> thing
>> >> >> >> >>> >>> > > people see when they run an application that links
>> >> against
>> >> >> >> Spark
>> >> >> >> >>> was a
>> >> >> >> >>> >>> > > big ugly logging warning.
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
>> >> >> >> behavior of
>> >> >> >> >>> >>> > > returning null for the appenders seems a little weird.
>> >> What
>> >> >> is
>> >> >> >> >>> the use
>> >> >> >> >>> >>> > > case for using this and not just directly use
>> >> slf4j-log4j12
>> >> >> >> like
>> >> >> >> >>> Spark
>> >> >> >> >>> >>> > > itself does?
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or
>> was
>> >> >> your
>> >> >> >> >>> plan to
>> >> >> >> >>> >>> > > just revert the existing behavior... We might be able
>> to
>> >> >> add a
>> >> >> >> >>> >>> > > configuration option to disable this logging default
>> >> stuff.
>> >> >> Or
>> >> >> >> we
>> >> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
>> >> >> possible.
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > - Patrick
>> >> >> >> >>> >>> > >
>> >> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
>> >> >> >> prb@mult.ifario.us>
>> >> >> >> >>> >>> wrote:
>> >> >> >> >>> >>> > >> We have a few applications that embed Spark, and in
>> >> 0.8.0
>> >> >> and
>> >> >> >> >>> 0.8.1,
>> >> >> >> >>> >>> we
>> >> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
>> >> >> >> unintentionally
>> >> >> >> >>> >>> forces
>> >> >> >> >>> >>> > >> direct use of log4j as the logging backend.
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging
>> trait:
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>>
>> >> >> >> >>>
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty
>> enumeration
>> >> for
>> >> >> >> >>> appenders
>> >> >> >> >>> >>> to
>> >> >> >> >>> >>> > the
>> >> >> >> >>> >>> > >> ROOT logger:
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>>
>> >> >> >> >>>
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual
>> stack
>> >> >> >> overflow.
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would
>> be
>> >> >> >> >>> significant
>> >> >> >> >>> >>> > enough
>> >> >> >> >>> >>> > >> reversal of recent changes that it's probably worth
>> >> >> discussing
>> >> >> >> >>> before
>> >> >> >> >>> >>> I
>> >> >> >> >>> >>> > >> sink a half hour into it.  My suggestion would be
>> that
>> >> >> >> >>> initialization
>> >> >> >> >>> >>> > (or
>> >> >> >> >>> >>> > >> not) should be left to the user with reasonable
>> default
>> >> >> >> behavior
>> >> >> >> >>> >>> > supplied
>> >> >> >> >>> >>> > >> by the spark commandline tooling and not forced on
>> >> >> >> applications
>> >> >> >> >>> that
>> >> >> >> >>> >>> > >> incorporate Spark.
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> Thoughts/opinions?
>> >> >> >> >>> >>> > >>
>> >> >> >> >>> >>> > >> -- Paul
>> >> >> >> >>> >>> > >> --
>> >> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
>> >> >> >> http://mult.ifario.us/
>> >> >> >> >>> >>> >
>> >> >> >> >>> >>>
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>

Re: 0.9.0 forces log4j usage

Posted by Paul Brown <pr...@mult.ifario.us>.
Hi, Patrick --

I gave that a go locally, and it works as desired.

Best.
-- Paul

—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Ah okay sounds good. This is what I meant earlier by "You have
> some other application that directly calls log4j."... i.e. you have
> for historical reasons installed the log4j-over-slf4j.
>
> Would you mind trying out this fix and seeing if it works? This is
> designed to be a hotfix for 0.9, not a general solution where we rip
> out log4j from our published dependencies:
>
> https://github.com/apache/incubator-spark/pull/560/files
>
> - Patrick
>
> On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> > Hi, Patrick --
> >
> > I forget which other component is responsible, but we're using the
> > log4j-over-slf4j as part of an overall requirement to centralize logging,
> > i.e., *someone* else is logging over log4j and we're pulling that in.
> >  (There's also some jul logging from Jersey, etc.)
> >
> > Goals:
> >
> > - Fully control/capture all possible logging.  (God forbid we have to
> grab
> > System.out/err, but we'd do it if needed.)
> > - Use the backend we like best at the moment.  (Happens to be logback.)
> >
> > Possible cases:
> >
> > - If Spark used Log4j at all, we would pull in that logging via
> > log4j-over-slf4j.
> > - If Spark used only slf4j and referenced no backend, we would use it
> as-is
> > although we'd still have the log4j-over-slf4j because of other libraries.
> > - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
> > would exclude that one dependency (via our POM).
> >
> > Best.
> > -- Paul
> >
> >
> > --
> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >
> >> Hey Paul,
> >>
> >> So if your goal is ultimately to output to logback. Then why don't you
> >> just use slf4j and logback-classic.jar as described here [1]. Why
> >> involve log4j-over-slf4j at all?
> >>
> >> Let's say we refactored the spark build so it didn't advertise
> >> slf4j-log4j12 as a dependency. Would you still be using
> >> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
> >> Spark is somewhat log4j dependent at this point.
> >>
> >> [1] http://www.slf4j.org/manual.html
> >>
> >> - Patrick
> >>
> >> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> >> > Hi, Patrick --
> >> >
> >> > That's close but not quite it.
> >> >
> >> > The issue that occurs is not the delegation loop mentioned in slf4j
> >> > documentation.  The stack overflow is entirely within the code in the
> >> Spark
> >> > trait:
> >> >
> >> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
> >> > at
> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> >> > at org.apache.spark.Logging$class.log(Logging.scala:36)
> >> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
> >> >
> >> >
> >> > And then that repeats.
> >> >
> >> > As for our situation, we exclude the slf4j-log4j12 dependency when we
> >> > import the Spark library (because we don't want to use log4j) and have
> >> > log4j-over-slf4j already in place to ensure that all of the logging in
> >> the
> >> > overall application runs through slf4j and then out through logback.
>  (We
> >> > also, as another poster already mentioned, also force jcl and jul
> through
> >> > slf4j.)
> >> >
> >> > The zen of slf4j for libraries is that the library uses the slf4j API
> and
> >> > then the enclosing application can route logging as it sees fit.
>  Spark
> >> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
> >> same
> >> > for Spark worker CLI.  Spark as a library (versus as a container)
> would
> >> not
> >> > include any backend to the slf4j API and leave this up to the
> >> application.
> >> >  (FWIW, this would also avoid your log4j warning message.)
> >> >
> >> > But as I was saying before, I'd be happy with a situation where I can
> >> avoid
> >> > log4j being enabled or configured, and I think you'll find an existing
> >> > choice of logging framework to be a common scenario for those
> embedding
> >> > Spark in other systems.
> >> >
> >> > Best.
> >> > -- Paul
> >> >
> >> > --
> >> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >> >
> >> >
> >> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >> >
> >> >> Paul,
> >> >>
> >> >> Looking back at your problem. I think it's the one here:
> >> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
> >> >>
> >> >> So let me just be clear what you are doing so I understand. You have
> >> >> some other application that directly calls log4j. So you have to
> >> >> include log4j-over-slf4j to route those logs through slf4j to
> logback.
> >> >>
> >> >> At the same time you embed Spark in this application. In the past it
> >> >> was fine, but now that Spark programmatic ally initializes log4j, it
> >> >> screws up your application because log4j-over-slf4j doesn't work with
> >> >> applications that do this explicilty as discussed here:
> >> >> http://www.slf4j.org/legacy.html
> >> >>
> >> >> Correct?
> >> >>
> >> >> - Patrick
> >> >>
> >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com>
> >> wrote:
> >> >> > got it. that sounds reasonable
> >> >> >
> >> >> >
> >> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <
> pwendell@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Koert - my suggestion was this. We let users use any slf4j backend
> >> >> >> they want. If we detect that they are using the log4j backend and
> >> >> >> *also* they didn't configure any log4j appenders, we set up some
> nice
> >> >> >> defaults for them. If they are using another backend, Spark
> doesn't
> >> >> >> try to modify the configuration at all.
> >> >> >>
> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <koert@tresata.com
> >
> >> >> wrote:
> >> >> >> > well "static binding" is probably the wrong terminology but you
> get
> >> >> the
> >> >> >> > idea. multiple backends are not allowed and cause an even uglier
> >> >> >> warning...
> >> >> >> >
> >> >> >> > see also here:
> >> >> >> > https://github.com/twitter/scalding/pull/636
> >> >> >> > and here:
> >> >> >> >
> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> >> >> >> > all me being annoying and complaining about slf4j-log4j12
> >> dependencies
> >> >> >> > (which did get removed).
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <
> koert@tresata.com>
> >> >> wrote:
> >> >> >> >
> >> >> >> >> the issue is that slf4j uses static binding. you can put only
> one
> >> >> slf4j
> >> >> >> >> backend on the classpath, and that's what it uses. more than
> one
> >> is
> >> >> not
> >> >> >> >> allowed.
> >> >> >> >>
> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and
> >> then
> >> >> you
> >> >> >> >> took away people's choice of slf4j backend which is considered
> bad
> >> >> form
> >> >> >> for
> >> >> >> >> a library, or you do not include it and then people will always
> >> get
> >> >> the
> >> >> >> big
> >> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
> >> >> >> >>
> >> >> >> >> including log4j itself is not necessary a problem i think?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
> >> pwendell@gmail.com
> >> >> >> >wrote:
> >> >> >> >>
> >> >> >> >>> This also seems relevant - but not my area of expertise
> (whether
> >> >> this
> >> >> >> >>> is a valid way to check this).
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
> >> >> >> >>>
> >> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
> >> >> pwendell@gmail.com>
> >> >> >> >>> wrote:
> >> >> >> >>> > Hey Guys,
> >> >> >> >>> >
> >> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't
> really
> >> >> know
> >> >> >> >>> > that people are using other slf4j backends, slf4j is in
> there
> >> for
> >> >> >> >>> > historical reasons but I think we may assume in a few places
> >> that
> >> >> >> >>> > log4j is being used and we should minimize those.
> >> >> >> >>> >
> >> >> >> >>> > We should patch this and get a fix into 0.9.1. So some
> >> solutions I
> >> >> >> see
> >> >> >> >>> are:
> >> >> >> >>> >
> >> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this
> >> one.
> >> >> >> >>> >
> >> >> >> >>> > (b) Ask slf4j which backend is active and only try to
> enforce
> >> this
> >> >> >> >>> > default if we know slf4j is using log4j. Do either of you
> know
> >> if
> >> >> >> this
> >> >> >> >>> > is possible? Not sure if slf4j exposes this.
> >> >> >> >>> >
> >> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this.
> >> The
> >> >> goal
> >> >> >> >>> > of this thing is to provide good usability for people who
> have
> >> >> linked
> >> >> >> >>> > against Spark and haven't done anything to configure
> logging.
> >> For
> >> >> >> >>> > beginners we try to minimize the assumptions about what else
> >> they
> >> >> >> know
> >> >> >> >>> > about, and I've found log4j configuration is a huge mental
> >> barrier
> >> >> >> for
> >> >> >> >>> > people who are getting started.
> >> >> >> >>> >
> >> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If
> you
> >> >> have
> >> >> >> >>> > any idea if (b) is possible I prefer that one, but it may
> not
> >> be
> >> >> >> >>> > possible or might be brittle.
> >> >> >> >>> >
> >> >> >> >>> > - Patrick
> >> >> >> >>> >
> >> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
> >> koert@tresata.com>
> >> >> >> >>> wrote:
> >> >> >> >>> >> Totally agree with Paul: a library should not pick the
> slf4j
> >> >> >> backend.
> >> >> >> >>> It
> >> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is
> there
> >> to
> >> >> >> alert
> >> >> >> >>> >> people that its their responsibility to pick the back
> end...
> >> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us>
> >> wrote:
> >> >> >> >>> >>
> >> >> >> >>> >>> Hi, Patrick --
> >> >> >> >>> >>>
> >> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is
> >> the
> >> >> way
> >> >> >> >>> that
> >> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and
> >> then
> >> >> on
> >> >> >> to
> >> >> >> >>> a
> >> >> >> >>> >>> different backend (e.g., logback).  We're doing the latter
> >> and
> >> >> >> >>> manipulating
> >> >> >> >>> >>> the dependencies in the build because that's the way the
> >> >> enclosing
> >> >> >> >>> >>> application is set up.
> >> >> >> >>> >>>
> >> >> >> >>> >>> The issue with the current situation is that there's no
> way
> >> for
> >> >> an
> >> >> >> >>> end user
> >> >> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
> >> >> solution
> >> >> >> >>> was to
> >> >> >> >>> >>> use the Maven shade plugin to swap in a version of the
> >> Logging
> >> >> >> trait
> >> >> >> >>> with
> >> >> >> >>> >>> the body of that method commented out.)  In addition to
> the
> >> >> >> situation
> >> >> >> >>> with
> >> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT
> appenders,
> >> >> you
> >> >> >> >>> might
> >> >> >> >>> >>> also run afoul of someone who intentionally configured
> log4j
> >> >> with
> >> >> >> an
> >> >> >> >>> empty
> >> >> >> >>> >>> set of appenders at the time that Spark is initializing.
> >> >> >> >>> >>>
> >> >> >> >>> >>> I'd be happy with any implementation that lets me choose
> my
> >> >> logging
> >> >> >> >>> >>> backend: override default behavior via system property,
> >> plug-in
> >> >> >> >>> >>> architecture, etc.  I do think it's reasonable to expect
> >> someone
> >> >> >> >>> digesting
> >> >> >> >>> >>> a substantial JDK-based system like Spark to understand
> how
> >> to
> >> >> >> >>> initialize
> >> >> >> >>> >>> logging -- surely they're using logging of some kind
> >> elsewhere
> >> >> in
> >> >> >> >>> their
> >> >> >> >>> >>> application -- but if you want the default behavior there
> as
> >> a
> >> >> >> >>> courtesy, it
> >> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j
> >> WARN)
> >> >> >> >>> message on
> >> >> >> >>> >>> the output that says something like "Initialized default
> >> logging
> >> >> >> via
> >> >> >> >>> Log4J;
> >> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable
> this
> >> >> >> >>> behavior." so
> >> >> >> >>> >>> that it's both convenient and explicit.
> >> >> >> >>> >>>
> >> >> >> >>> >>> Cheers.
> >> >> >> >>> >>> -- Paul
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>> --
> >> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
> >> >> http://mult.ifario.us/
> >> >> >> >>> >>>
> >> >> >> >>> >>>
> >> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
> >> >> >> pwendell@gmail.com>
> >> >> >> >>> >>> wrote:
> >> >> >> >>> >>>
> >> >> >> >>> >>> > A config option e.g. could just be to add:
> >> >> >> >>> >>> >
> >> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
> >> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j
> >> logger if
> >> >> >> none
> >> >> >> >>> is
> >> >> >> >>> >>> > detected. Otherwise Spark will not modify logging
> behavior.
> >> >> >> >>> >>> >
> >> >> >> >>> >>> > Then users could just set this to false if they have a
> >> logging
> >> >> >> >>> set-up
> >> >> >> >>> >>> > that conflicts with this.
> >> >> >> >>> >>> >
> >> >> >> >>> >>> > Maybe there is a nicer fix...
> >> >> >> >>> >>> >
> >> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
> >> >> >> >>> pwendell@gmail.com>
> >> >> >> >>> >>> > wrote:
> >> >> >> >>> >>> > > Hey Paul,
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature
> and
> >> the
> >> >> >> >>> intent
> >> >> >> >>> >>> > > was to give users good default behavior if they didn't
> >> >> include
> >> >> >> any
> >> >> >> >>> >>> > > logging configuration on the classpath.
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > The problem with assuming that CL tooling is going to
> fix
> >> >> the
> >> >> >> job
> >> >> >> >>> is
> >> >> >> >>> >>> > > that many people link against spark as a library and
> run
> >> >> their
> >> >> >> >>> >>> > > application using their own scripts. In this case the
> >> first
> >> >> >> thing
> >> >> >> >>> >>> > > people see when they run an application that links
> >> against
> >> >> >> Spark
> >> >> >> >>> was a
> >> >> >> >>> >>> > > big ugly logging warning.
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
> >> >> >> behavior of
> >> >> >> >>> >>> > > returning null for the appenders seems a little weird.
> >> What
> >> >> is
> >> >> >> >>> the use
> >> >> >> >>> >>> > > case for using this and not just directly use
> >> slf4j-log4j12
> >> >> >> like
> >> >> >> >>> Spark
> >> >> >> >>> >>> > > itself does?
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or
> was
> >> >> your
> >> >> >> >>> plan to
> >> >> >> >>> >>> > > just revert the existing behavior... We might be able
> to
> >> >> add a
> >> >> >> >>> >>> > > configuration option to disable this logging default
> >> stuff.
> >> >> Or
> >> >> >> we
> >> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
> >> >> possible.
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > - Patrick
> >> >> >> >>> >>> > >
> >> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
> >> >> >> prb@mult.ifario.us>
> >> >> >> >>> >>> wrote:
> >> >> >> >>> >>> > >> We have a few applications that embed Spark, and in
> >> 0.8.0
> >> >> and
> >> >> >> >>> 0.8.1,
> >> >> >> >>> >>> we
> >> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
> >> >> >> unintentionally
> >> >> >> >>> >>> forces
> >> >> >> >>> >>> > >> direct use of log4j as the logging backend.
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging
> trait:
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> >
> >> >> >> >>> >>>
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty
> enumeration
> >> for
> >> >> >> >>> appenders
> >> >> >> >>> >>> to
> >> >> >> >>> >>> > the
> >> >> >> >>> >>> > >> ROOT logger:
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> >
> >> >> >> >>> >>>
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual
> stack
> >> >> >> overflow.
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would
> be
> >> >> >> >>> significant
> >> >> >> >>> >>> > enough
> >> >> >> >>> >>> > >> reversal of recent changes that it's probably worth
> >> >> discussing
> >> >> >> >>> before
> >> >> >> >>> >>> I
> >> >> >> >>> >>> > >> sink a half hour into it.  My suggestion would be
> that
> >> >> >> >>> initialization
> >> >> >> >>> >>> > (or
> >> >> >> >>> >>> > >> not) should be left to the user with reasonable
> default
> >> >> >> behavior
> >> >> >> >>> >>> > supplied
> >> >> >> >>> >>> > >> by the spark commandline tooling and not forced on
> >> >> >> applications
> >> >> >> >>> that
> >> >> >> >>> >>> > >> incorporate Spark.
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> Thoughts/opinions?
> >> >> >> >>> >>> > >>
> >> >> >> >>> >>> > >> -- Paul
> >> >> >> >>> >>> > >> --
> >> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
> >> >> >> http://mult.ifario.us/
> >> >> >> >>> >>> >
> >> >> >> >>> >>>
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Ah okay sounds good. This is what I meant earlier by "You have
some other application that directly calls log4j."... i.e. you have
for historical reasons installed the log4j-over-slf4j.

Would you mind trying out this fix and seeing if it works? This is
designed to be a hotfix for 0.9, not a general solution where we rip
out log4j from our published dependencies:

https://github.com/apache/incubator-spark/pull/560/files

- Patrick

On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> Hi, Patrick --
>
> I forget which other component is responsible, but we're using the
> log4j-over-slf4j as part of an overall requirement to centralize logging,
> i.e., *someone* else is logging over log4j and we're pulling that in.
>  (There's also some jul logging from Jersey, etc.)
>
> Goals:
>
> - Fully control/capture all possible logging.  (God forbid we have to grab
> System.out/err, but we'd do it if needed.)
> - Use the backend we like best at the moment.  (Happens to be logback.)
>
> Possible cases:
>
> - If Spark used Log4j at all, we would pull in that logging via
> log4j-over-slf4j.
> - If Spark used only slf4j and referenced no backend, we would use it as-is
> although we'd still have the log4j-over-slf4j because of other libraries.
> - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
> would exclude that one dependency (via our POM).
>
> Best.
> -- Paul
>
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pw...@gmail.com> wrote:
>
>> Hey Paul,
>>
>> So if your goal is ultimately to output to logback. Then why don't you
>> just use slf4j and logback-classic.jar as described here [1]. Why
>> involve log4j-over-slf4j at all?
>>
>> Let's say we refactored the spark build so it didn't advertise
>> slf4j-log4j12 as a dependency. Would you still be using
>> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
>> Spark is somewhat log4j dependent at this point.
>>
>> [1] http://www.slf4j.org/manual.html
>>
>> - Patrick
>>
>> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>> > Hi, Patrick --
>> >
>> > That's close but not quite it.
>> >
>> > The issue that occurs is not the delegation loop mentioned in slf4j
>> > documentation.  The stack overflow is entirely within the code in the
>> Spark
>> > trait:
>> >
>> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
>> > at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
>> > at org.apache.spark.Logging$class.log(Logging.scala:36)
>> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
>> >
>> >
>> > And then that repeats.
>> >
>> > As for our situation, we exclude the slf4j-log4j12 dependency when we
>> > import the Spark library (because we don't want to use log4j) and have
>> > log4j-over-slf4j already in place to ensure that all of the logging in
>> the
>> > overall application runs through slf4j and then out through logback.  (We
>> > also, as another poster already mentioned, also force jcl and jul through
>> > slf4j.)
>> >
>> > The zen of slf4j for libraries is that the library uses the slf4j API and
>> > then the enclosing application can route logging as it sees fit.  Spark
>> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
>> same
>> > for Spark worker CLI.  Spark as a library (versus as a container) would
>> not
>> > include any backend to the slf4j API and leave this up to the
>> application.
>> >  (FWIW, this would also avoid your log4j warning message.)
>> >
>> > But as I was saying before, I'd be happy with a situation where I can
>> avoid
>> > log4j being enabled or configured, and I think you'll find an existing
>> > choice of logging framework to be a common scenario for those embedding
>> > Spark in other systems.
>> >
>> > Best.
>> > -- Paul
>> >
>> > --
>> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >
>> >
>> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> >
>> >> Paul,
>> >>
>> >> Looking back at your problem. I think it's the one here:
>> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
>> >>
>> >> So let me just be clear what you are doing so I understand. You have
>> >> some other application that directly calls log4j. So you have to
>> >> include log4j-over-slf4j to route those logs through slf4j to logback.
>> >>
>> >> At the same time you embed Spark in this application. In the past it
>> >> was fine, but now that Spark programmatic ally initializes log4j, it
>> >> screws up your application because log4j-over-slf4j doesn't work with
>> >> applications that do this explicilty as discussed here:
>> >> http://www.slf4j.org/legacy.html
>> >>
>> >> Correct?
>> >>
>> >> - Patrick
>> >>
>> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>> >> > got it. that sounds reasonable
>> >> >
>> >> >
>> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Koert - my suggestion was this. We let users use any slf4j backend
>> >> >> they want. If we detect that they are using the log4j backend and
>> >> >> *also* they didn't configure any log4j appenders, we set up some nice
>> >> >> defaults for them. If they are using another backend, Spark doesn't
>> >> >> try to modify the configuration at all.
>> >> >>
>> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com>
>> >> wrote:
>> >> >> > well "static binding" is probably the wrong terminology but you get
>> >> the
>> >> >> > idea. multiple backends are not allowed and cause an even uglier
>> >> >> warning...
>> >> >> >
>> >> >> > see also here:
>> >> >> > https://github.com/twitter/scalding/pull/636
>> >> >> > and here:
>> >> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>> >> >> > all me being annoying and complaining about slf4j-log4j12
>> dependencies
>> >> >> > (which did get removed).
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com>
>> >> wrote:
>> >> >> >
>> >> >> >> the issue is that slf4j uses static binding. you can put only one
>> >> slf4j
>> >> >> >> backend on the classpath, and that's what it uses. more than one
>> is
>> >> not
>> >> >> >> allowed.
>> >> >> >>
>> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and
>> then
>> >> you
>> >> >> >> took away people's choice of slf4j backend which is considered bad
>> >> form
>> >> >> for
>> >> >> >> a library, or you do not include it and then people will always
>> get
>> >> the
>> >> >> big
>> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
>> >> >> >>
>> >> >> >> including log4j itself is not necessary a problem i think?
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
>> pwendell@gmail.com
>> >> >> >wrote:
>> >> >> >>
>> >> >> >>> This also seems relevant - but not my area of expertise (whether
>> >> this
>> >> >> >>> is a valid way to check this).
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >>
>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>> >> >> >>>
>> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
>> >> pwendell@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>> > Hey Guys,
>> >> >> >>> >
>> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't really
>> >> know
>> >> >> >>> > that people are using other slf4j backends, slf4j is in there
>> for
>> >> >> >>> > historical reasons but I think we may assume in a few places
>> that
>> >> >> >>> > log4j is being used and we should minimize those.
>> >> >> >>> >
>> >> >> >>> > We should patch this and get a fix into 0.9.1. So some
>> solutions I
>> >> >> see
>> >> >> >>> are:
>> >> >> >>> >
>> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this
>> one.
>> >> >> >>> >
>> >> >> >>> > (b) Ask slf4j which backend is active and only try to enforce
>> this
>> >> >> >>> > default if we know slf4j is using log4j. Do either of you know
>> if
>> >> >> this
>> >> >> >>> > is possible? Not sure if slf4j exposes this.
>> >> >> >>> >
>> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this.
>> The
>> >> goal
>> >> >> >>> > of this thing is to provide good usability for people who have
>> >> linked
>> >> >> >>> > against Spark and haven't done anything to configure logging.
>> For
>> >> >> >>> > beginners we try to minimize the assumptions about what else
>> they
>> >> >> know
>> >> >> >>> > about, and I've found log4j configuration is a huge mental
>> barrier
>> >> >> for
>> >> >> >>> > people who are getting started.
>> >> >> >>> >
>> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If you
>> >> have
>> >> >> >>> > any idea if (b) is possible I prefer that one, but it may not
>> be
>> >> >> >>> > possible or might be brittle.
>> >> >> >>> >
>> >> >> >>> > - Patrick
>> >> >> >>> >
>> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
>> koert@tresata.com>
>> >> >> >>> wrote:
>> >> >> >>> >> Totally agree with Paul: a library should not pick the slf4j
>> >> >> backend.
>> >> >> >>> It
>> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is there
>> to
>> >> >> alert
>> >> >> >>> >> people that its their responsibility to pick the back end...
>> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us>
>> wrote:
>> >> >> >>> >>
>> >> >> >>> >>> Hi, Patrick --
>> >> >> >>> >>>
>> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is
>> the
>> >> way
>> >> >> >>> that
>> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and
>> then
>> >> on
>> >> >> to
>> >> >> >>> a
>> >> >> >>> >>> different backend (e.g., logback).  We're doing the latter
>> and
>> >> >> >>> manipulating
>> >> >> >>> >>> the dependencies in the build because that's the way the
>> >> enclosing
>> >> >> >>> >>> application is set up.
>> >> >> >>> >>>
>> >> >> >>> >>> The issue with the current situation is that there's no way
>> for
>> >> an
>> >> >> >>> end user
>> >> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
>> >> solution
>> >> >> >>> was to
>> >> >> >>> >>> use the Maven shade plugin to swap in a version of the
>> Logging
>> >> >> trait
>> >> >> >>> with
>> >> >> >>> >>> the body of that method commented out.)  In addition to the
>> >> >> situation
>> >> >> >>> with
>> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders,
>> >> you
>> >> >> >>> might
>> >> >> >>> >>> also run afoul of someone who intentionally configured log4j
>> >> with
>> >> >> an
>> >> >> >>> empty
>> >> >> >>> >>> set of appenders at the time that Spark is initializing.
>> >> >> >>> >>>
>> >> >> >>> >>> I'd be happy with any implementation that lets me choose my
>> >> logging
>> >> >> >>> >>> backend: override default behavior via system property,
>> plug-in
>> >> >> >>> >>> architecture, etc.  I do think it's reasonable to expect
>> someone
>> >> >> >>> digesting
>> >> >> >>> >>> a substantial JDK-based system like Spark to understand how
>> to
>> >> >> >>> initialize
>> >> >> >>> >>> logging -- surely they're using logging of some kind
>> elsewhere
>> >> in
>> >> >> >>> their
>> >> >> >>> >>> application -- but if you want the default behavior there as
>> a
>> >> >> >>> courtesy, it
>> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j
>> WARN)
>> >> >> >>> message on
>> >> >> >>> >>> the output that says something like "Initialized default
>> logging
>> >> >> via
>> >> >> >>> Log4J;
>> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
>> >> >> >>> behavior." so
>> >> >> >>> >>> that it's both convenient and explicit.
>> >> >> >>> >>>
>> >> >> >>> >>> Cheers.
>> >> >> >>> >>> -- Paul
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> --
>> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
>> >> http://mult.ifario.us/
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
>> >> >> pwendell@gmail.com>
>> >> >> >>> >>> wrote:
>> >> >> >>> >>>
>> >> >> >>> >>> > A config option e.g. could just be to add:
>> >> >> >>> >>> >
>> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
>> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j
>> logger if
>> >> >> none
>> >> >> >>> is
>> >> >> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
>> >> >> >>> >>> >
>> >> >> >>> >>> > Then users could just set this to false if they have a
>> logging
>> >> >> >>> set-up
>> >> >> >>> >>> > that conflicts with this.
>> >> >> >>> >>> >
>> >> >> >>> >>> > Maybe there is a nicer fix...
>> >> >> >>> >>> >
>> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>> >> >> >>> pwendell@gmail.com>
>> >> >> >>> >>> > wrote:
>> >> >> >>> >>> > > Hey Paul,
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature and
>> the
>> >> >> >>> intent
>> >> >> >>> >>> > > was to give users good default behavior if they didn't
>> >> include
>> >> >> any
>> >> >> >>> >>> > > logging configuration on the classpath.
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > The problem with assuming that CL tooling is going to fix
>> >> the
>> >> >> job
>> >> >> >>> is
>> >> >> >>> >>> > > that many people link against spark as a library and run
>> >> their
>> >> >> >>> >>> > > application using their own scripts. In this case the
>> first
>> >> >> thing
>> >> >> >>> >>> > > people see when they run an application that links
>> against
>> >> >> Spark
>> >> >> >>> was a
>> >> >> >>> >>> > > big ugly logging warning.
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
>> >> >> behavior of
>> >> >> >>> >>> > > returning null for the appenders seems a little weird.
>> What
>> >> is
>> >> >> >>> the use
>> >> >> >>> >>> > > case for using this and not just directly use
>> slf4j-log4j12
>> >> >> like
>> >> >> >>> Spark
>> >> >> >>> >>> > > itself does?
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or was
>> >> your
>> >> >> >>> plan to
>> >> >> >>> >>> > > just revert the existing behavior... We might be able to
>> >> add a
>> >> >> >>> >>> > > configuration option to disable this logging default
>> stuff.
>> >> Or
>> >> >> we
>> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
>> >> possible.
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > - Patrick
>> >> >> >>> >>> > >
>> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
>> >> >> prb@mult.ifario.us>
>> >> >> >>> >>> wrote:
>> >> >> >>> >>> > >> We have a few applications that embed Spark, and in
>> 0.8.0
>> >> and
>> >> >> >>> 0.8.1,
>> >> >> >>> >>> we
>> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
>> >> >> unintentionally
>> >> >> >>> >>> forces
>> >> >> >>> >>> > >> direct use of log4j as the logging backend.
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >>
>> >> >> >>> >>> >
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration
>> for
>> >> >> >>> appenders
>> >> >> >>> >>> to
>> >> >> >>> >>> > the
>> >> >> >>> >>> > >> ROOT logger:
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >>
>> >> >> >>> >>> >
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> And this causes an infinite loop and an eventual stack
>> >> >> overflow.
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
>> >> >> >>> significant
>> >> >> >>> >>> > enough
>> >> >> >>> >>> > >> reversal of recent changes that it's probably worth
>> >> discussing
>> >> >> >>> before
>> >> >> >>> >>> I
>> >> >> >>> >>> > >> sink a half hour into it.  My suggestion would be that
>> >> >> >>> initialization
>> >> >> >>> >>> > (or
>> >> >> >>> >>> > >> not) should be left to the user with reasonable default
>> >> >> behavior
>> >> >> >>> >>> > supplied
>> >> >> >>> >>> > >> by the spark commandline tooling and not forced on
>> >> >> applications
>> >> >> >>> that
>> >> >> >>> >>> > >> incorporate Spark.
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> Thoughts/opinions?
>> >> >> >>> >>> > >>
>> >> >> >>> >>> > >> -- Paul
>> >> >> >>> >>> > >> --
>> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
>> >> >> http://mult.ifario.us/
>> >> >> >>> >>> >
>> >> >> >>> >>>
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>

Re: 0.9.0 forces log4j usage

Posted by Paul Brown <pr...@mult.ifario.us>.
Hi, Patrick --

I forget which other component is responsible, but we're using the
log4j-over-slf4j as part of an overall requirement to centralize logging,
i.e., *someone* else is logging over log4j and we're pulling that in.
 (There's also some jul logging from Jersey, etc.)

Goals:

- Fully control/capture all possible logging.  (God forbid we have to grab
System.out/err, but we'd do it if needed.)
- Use the backend we like best at the moment.  (Happens to be logback.)

Possible cases:

- If Spark used Log4j at all, we would pull in that logging via
log4j-over-slf4j.
- If Spark used only slf4j and referenced no backend, we would use it as-is
although we'd still have the log4j-over-slf4j because of other libraries.
- If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
would exclude that one dependency (via our POM).

Best.
-- Paul


—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey Paul,
>
> So if your goal is ultimately to output to logback. Then why don't you
> just use slf4j and logback-classic.jar as described here [1]. Why
> involve log4j-over-slf4j at all?
>
> Let's say we refactored the spark build so it didn't advertise
> slf4j-log4j12 as a dependency. Would you still be using
> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
> Spark is somewhat log4j dependent at this point.
>
> [1] http://www.slf4j.org/manual.html
>
> - Patrick
>
> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> > Hi, Patrick --
> >
> > That's close but not quite it.
> >
> > The issue that occurs is not the delegation loop mentioned in slf4j
> > documentation.  The stack overflow is entirely within the code in the
> Spark
> > trait:
> >
> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
> > at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> > at org.apache.spark.Logging$class.log(Logging.scala:36)
> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
> >
> >
> > And then that repeats.
> >
> > As for our situation, we exclude the slf4j-log4j12 dependency when we
> > import the Spark library (because we don't want to use log4j) and have
> > log4j-over-slf4j already in place to ensure that all of the logging in
> the
> > overall application runs through slf4j and then out through logback.  (We
> > also, as another poster already mentioned, also force jcl and jul through
> > slf4j.)
> >
> > The zen of slf4j for libraries is that the library uses the slf4j API and
> > then the enclosing application can route logging as it sees fit.  Spark
> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
> same
> > for Spark worker CLI.  Spark as a library (versus as a container) would
> not
> > include any backend to the slf4j API and leave this up to the
> application.
> >  (FWIW, this would also avoid your log4j warning message.)
> >
> > But as I was saying before, I'd be happy with a situation where I can
> avoid
> > log4j being enabled or configured, and I think you'll find an existing
> > choice of logging framework to be a common scenario for those embedding
> > Spark in other systems.
> >
> > Best.
> > -- Paul
> >
> > --
> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >
> >> Paul,
> >>
> >> Looking back at your problem. I think it's the one here:
> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
> >>
> >> So let me just be clear what you are doing so I understand. You have
> >> some other application that directly calls log4j. So you have to
> >> include log4j-over-slf4j to route those logs through slf4j to logback.
> >>
> >> At the same time you embed Spark in this application. In the past it
> >> was fine, but now that Spark programmatic ally initializes log4j, it
> >> screws up your application because log4j-over-slf4j doesn't work with
> >> applications that do this explicilty as discussed here:
> >> http://www.slf4j.org/legacy.html
> >>
> >> Correct?
> >>
> >> - Patrick
> >>
> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >> > got it. that sounds reasonable
> >> >
> >> >
> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >> >
> >> >> Koert - my suggestion was this. We let users use any slf4j backend
> >> >> they want. If we detect that they are using the log4j backend and
> >> >> *also* they didn't configure any log4j appenders, we set up some nice
> >> >> defaults for them. If they are using another backend, Spark doesn't
> >> >> try to modify the configuration at all.
> >> >>
> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com>
> >> wrote:
> >> >> > well "static binding" is probably the wrong terminology but you get
> >> the
> >> >> > idea. multiple backends are not allowed and cause an even uglier
> >> >> warning...
> >> >> >
> >> >> > see also here:
> >> >> > https://github.com/twitter/scalding/pull/636
> >> >> > and here:
> >> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> >> >> > all me being annoying and complaining about slf4j-log4j12
> dependencies
> >> >> > (which did get removed).
> >> >> >
> >> >> >
> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com>
> >> wrote:
> >> >> >
> >> >> >> the issue is that slf4j uses static binding. you can put only one
> >> slf4j
> >> >> >> backend on the classpath, and that's what it uses. more than one
> is
> >> not
> >> >> >> allowed.
> >> >> >>
> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and
> then
> >> you
> >> >> >> took away people's choice of slf4j backend which is considered bad
> >> form
> >> >> for
> >> >> >> a library, or you do not include it and then people will always
> get
> >> the
> >> >> big
> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
> >> >> >>
> >> >> >> including log4j itself is not necessary a problem i think?
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
> pwendell@gmail.com
> >> >> >wrote:
> >> >> >>
> >> >> >>> This also seems relevant - but not my area of expertise (whether
> >> this
> >> >> >>> is a valid way to check this).
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
> >> >> >>>
> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
> >> pwendell@gmail.com>
> >> >> >>> wrote:
> >> >> >>> > Hey Guys,
> >> >> >>> >
> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't really
> >> know
> >> >> >>> > that people are using other slf4j backends, slf4j is in there
> for
> >> >> >>> > historical reasons but I think we may assume in a few places
> that
> >> >> >>> > log4j is being used and we should minimize those.
> >> >> >>> >
> >> >> >>> > We should patch this and get a fix into 0.9.1. So some
> solutions I
> >> >> see
> >> >> >>> are:
> >> >> >>> >
> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this
> one.
> >> >> >>> >
> >> >> >>> > (b) Ask slf4j which backend is active and only try to enforce
> this
> >> >> >>> > default if we know slf4j is using log4j. Do either of you know
> if
> >> >> this
> >> >> >>> > is possible? Not sure if slf4j exposes this.
> >> >> >>> >
> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this.
> The
> >> goal
> >> >> >>> > of this thing is to provide good usability for people who have
> >> linked
> >> >> >>> > against Spark and haven't done anything to configure logging.
> For
> >> >> >>> > beginners we try to minimize the assumptions about what else
> they
> >> >> know
> >> >> >>> > about, and I've found log4j configuration is a huge mental
> barrier
> >> >> for
> >> >> >>> > people who are getting started.
> >> >> >>> >
> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If you
> >> have
> >> >> >>> > any idea if (b) is possible I prefer that one, but it may not
> be
> >> >> >>> > possible or might be brittle.
> >> >> >>> >
> >> >> >>> > - Patrick
> >> >> >>> >
> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
> koert@tresata.com>
> >> >> >>> wrote:
> >> >> >>> >> Totally agree with Paul: a library should not pick the slf4j
> >> >> backend.
> >> >> >>> It
> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is there
> to
> >> >> alert
> >> >> >>> >> people that its their responsibility to pick the back end...
> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us>
> wrote:
> >> >> >>> >>
> >> >> >>> >>> Hi, Patrick --
> >> >> >>> >>>
> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is
> the
> >> way
> >> >> >>> that
> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and
> then
> >> on
> >> >> to
> >> >> >>> a
> >> >> >>> >>> different backend (e.g., logback).  We're doing the latter
> and
> >> >> >>> manipulating
> >> >> >>> >>> the dependencies in the build because that's the way the
> >> enclosing
> >> >> >>> >>> application is set up.
> >> >> >>> >>>
> >> >> >>> >>> The issue with the current situation is that there's no way
> for
> >> an
> >> >> >>> end user
> >> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
> >> solution
> >> >> >>> was to
> >> >> >>> >>> use the Maven shade plugin to swap in a version of the
> Logging
> >> >> trait
> >> >> >>> with
> >> >> >>> >>> the body of that method commented out.)  In addition to the
> >> >> situation
> >> >> >>> with
> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders,
> >> you
> >> >> >>> might
> >> >> >>> >>> also run afoul of someone who intentionally configured log4j
> >> with
> >> >> an
> >> >> >>> empty
> >> >> >>> >>> set of appenders at the time that Spark is initializing.
> >> >> >>> >>>
> >> >> >>> >>> I'd be happy with any implementation that lets me choose my
> >> logging
> >> >> >>> >>> backend: override default behavior via system property,
> plug-in
> >> >> >>> >>> architecture, etc.  I do think it's reasonable to expect
> someone
> >> >> >>> digesting
> >> >> >>> >>> a substantial JDK-based system like Spark to understand how
> to
> >> >> >>> initialize
> >> >> >>> >>> logging -- surely they're using logging of some kind
> elsewhere
> >> in
> >> >> >>> their
> >> >> >>> >>> application -- but if you want the default behavior there as
> a
> >> >> >>> courtesy, it
> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j
> WARN)
> >> >> >>> message on
> >> >> >>> >>> the output that says something like "Initialized default
> logging
> >> >> via
> >> >> >>> Log4J;
> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
> >> >> >>> behavior." so
> >> >> >>> >>> that it's both convenient and explicit.
> >> >> >>> >>>
> >> >> >>> >>> Cheers.
> >> >> >>> >>> -- Paul
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> --
> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
> >> http://mult.ifario.us/
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
> >> >> pwendell@gmail.com>
> >> >> >>> >>> wrote:
> >> >> >>> >>>
> >> >> >>> >>> > A config option e.g. could just be to add:
> >> >> >>> >>> >
> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j
> logger if
> >> >> none
> >> >> >>> is
> >> >> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
> >> >> >>> >>> >
> >> >> >>> >>> > Then users could just set this to false if they have a
> logging
> >> >> >>> set-up
> >> >> >>> >>> > that conflicts with this.
> >> >> >>> >>> >
> >> >> >>> >>> > Maybe there is a nicer fix...
> >> >> >>> >>> >
> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
> >> >> >>> pwendell@gmail.com>
> >> >> >>> >>> > wrote:
> >> >> >>> >>> > > Hey Paul,
> >> >> >>> >>> > >
> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature and
> the
> >> >> >>> intent
> >> >> >>> >>> > > was to give users good default behavior if they didn't
> >> include
> >> >> any
> >> >> >>> >>> > > logging configuration on the classpath.
> >> >> >>> >>> > >
> >> >> >>> >>> > > The problem with assuming that CL tooling is going to fix
> >> the
> >> >> job
> >> >> >>> is
> >> >> >>> >>> > > that many people link against spark as a library and run
> >> their
> >> >> >>> >>> > > application using their own scripts. In this case the
> first
> >> >> thing
> >> >> >>> >>> > > people see when they run an application that links
> against
> >> >> Spark
> >> >> >>> was a
> >> >> >>> >>> > > big ugly logging warning.
> >> >> >>> >>> > >
> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
> >> >> behavior of
> >> >> >>> >>> > > returning null for the appenders seems a little weird.
> What
> >> is
> >> >> >>> the use
> >> >> >>> >>> > > case for using this and not just directly use
> slf4j-log4j12
> >> >> like
> >> >> >>> Spark
> >> >> >>> >>> > > itself does?
> >> >> >>> >>> > >
> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or was
> >> your
> >> >> >>> plan to
> >> >> >>> >>> > > just revert the existing behavior... We might be able to
> >> add a
> >> >> >>> >>> > > configuration option to disable this logging default
> stuff.
> >> Or
> >> >> we
> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
> >> possible.
> >> >> >>> >>> > >
> >> >> >>> >>> > > - Patrick
> >> >> >>> >>> > >
> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
> >> >> prb@mult.ifario.us>
> >> >> >>> >>> wrote:
> >> >> >>> >>> > >> We have a few applications that embed Spark, and in
> 0.8.0
> >> and
> >> >> >>> 0.8.1,
> >> >> >>> >>> we
> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
> >> >> unintentionally
> >> >> >>> >>> forces
> >> >> >>> >>> > >> direct use of log4j as the logging backend.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
> >> >> >>> >>> > >>
> >> >> >>> >>> > >>
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration
> for
> >> >> >>> appenders
> >> >> >>> >>> to
> >> >> >>> >>> > the
> >> >> >>> >>> > >> ROOT logger:
> >> >> >>> >>> > >>
> >> >> >>> >>> > >>
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> And this causes an infinite loop and an eventual stack
> >> >> overflow.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
> >> >> >>> significant
> >> >> >>> >>> > enough
> >> >> >>> >>> > >> reversal of recent changes that it's probably worth
> >> discussing
> >> >> >>> before
> >> >> >>> >>> I
> >> >> >>> >>> > >> sink a half hour into it.  My suggestion would be that
> >> >> >>> initialization
> >> >> >>> >>> > (or
> >> >> >>> >>> > >> not) should be left to the user with reasonable default
> >> >> behavior
> >> >> >>> >>> > supplied
> >> >> >>> >>> > >> by the spark commandline tooling and not forced on
> >> >> applications
> >> >> >>> that
> >> >> >>> >>> > >> incorporate Spark.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> Thoughts/opinions?
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> -- Paul
> >> >> >>> >>> > >> --
> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
> >> >> http://mult.ifario.us/
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Paul,

So if your goal is ultimately to output to logback. Then why don't you
just use slf4j and logback-classic.jar as described here [1]. Why
involve log4j-over-slf4j at all?

Let's say we refactored the spark build so it didn't advertise
slf4j-log4j12 as a dependency. Would you still be using
log4j-over-slf4j... or is this just a "fix" to deal with the fact that
Spark is somewhat log4j dependent at this point.

[1] http://www.slf4j.org/manual.html

- Patrick

On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> Hi, Patrick --
>
> That's close but not quite it.
>
> The issue that occurs is not the delegation loop mentioned in slf4j
> documentation.  The stack overflow is entirely within the code in the Spark
> trait:
>
> at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
> at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> at org.apache.spark.Logging$class.log(Logging.scala:36)
> at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
>
>
> And then that repeats.
>
> As for our situation, we exclude the slf4j-log4j12 dependency when we
> import the Spark library (because we don't want to use log4j) and have
> log4j-over-slf4j already in place to ensure that all of the logging in the
> overall application runs through slf4j and then out through logback.  (We
> also, as another poster already mentioned, also force jcl and jul through
> slf4j.)
>
> The zen of slf4j for libraries is that the library uses the slf4j API and
> then the enclosing application can route logging as it sees fit.  Spark
> master CLI would log via slf4j and include the slf4j-log4j12 backend; same
> for Spark worker CLI.  Spark as a library (versus as a container) would not
> include any backend to the slf4j API and leave this up to the application.
>  (FWIW, this would also avoid your log4j warning message.)
>
> But as I was saying before, I'd be happy with a situation where I can avoid
> log4j being enabled or configured, and I think you'll find an existing
> choice of logging framework to be a common scenario for those embedding
> Spark in other systems.
>
> Best.
> -- Paul
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com> wrote:
>
>> Paul,
>>
>> Looking back at your problem. I think it's the one here:
>> http://www.slf4j.org/codes.html#log4jDelegationLoop
>>
>> So let me just be clear what you are doing so I understand. You have
>> some other application that directly calls log4j. So you have to
>> include log4j-over-slf4j to route those logs through slf4j to logback.
>>
>> At the same time you embed Spark in this application. In the past it
>> was fine, but now that Spark programmatic ally initializes log4j, it
>> screws up your application because log4j-over-slf4j doesn't work with
>> applications that do this explicilty as discussed here:
>> http://www.slf4j.org/legacy.html
>>
>> Correct?
>>
>> - Patrick
>>
>> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> > got it. that sounds reasonable
>> >
>> >
>> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> >
>> >> Koert - my suggestion was this. We let users use any slf4j backend
>> >> they want. If we detect that they are using the log4j backend and
>> >> *also* they didn't configure any log4j appenders, we set up some nice
>> >> defaults for them. If they are using another backend, Spark doesn't
>> >> try to modify the configuration at all.
>> >>
>> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>> >> > well "static binding" is probably the wrong terminology but you get
>> the
>> >> > idea. multiple backends are not allowed and cause an even uglier
>> >> warning...
>> >> >
>> >> > see also here:
>> >> > https://github.com/twitter/scalding/pull/636
>> >> > and here:
>> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>> >> > all me being annoying and complaining about slf4j-log4j12 dependencies
>> >> > (which did get removed).
>> >> >
>> >> >
>> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>> >> >
>> >> >> the issue is that slf4j uses static binding. you can put only one
>> slf4j
>> >> >> backend on the classpath, and that's what it uses. more than one is
>> not
>> >> >> allowed.
>> >> >>
>> >> >> so you either keep the slf4j-log4j12 dependency for spark, and then
>> you
>> >> >> took away people's choice of slf4j backend which is considered bad
>> form
>> >> for
>> >> >> a library, or you do not include it and then people will always get
>> the
>> >> big
>> >> >> fat ugly warning and slf4j logging will not flow to log4j.
>> >> >>
>> >> >> including log4j itself is not necessary a problem i think?
>> >> >>
>> >> >>
>> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pwendell@gmail.com
>> >> >wrote:
>> >> >>
>> >> >>> This also seems relevant - but not my area of expertise (whether
>> this
>> >> >>> is a valid way to check this).
>> >> >>>
>> >> >>>
>> >> >>>
>> >>
>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>> >> >>>
>> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
>> pwendell@gmail.com>
>> >> >>> wrote:
>> >> >>> > Hey Guys,
>> >> >>> >
>> >> >>> > Thanks for explainning. Ya this is a problem - we didn't really
>> know
>> >> >>> > that people are using other slf4j backends, slf4j is in there for
>> >> >>> > historical reasons but I think we may assume in a few places that
>> >> >>> > log4j is being used and we should minimize those.
>> >> >>> >
>> >> >>> > We should patch this and get a fix into 0.9.1. So some solutions I
>> >> see
>> >> >>> are:
>> >> >>> >
>> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this one.
>> >> >>> >
>> >> >>> > (b) Ask slf4j which backend is active and only try to enforce this
>> >> >>> > default if we know slf4j is using log4j. Do either of you know if
>> >> this
>> >> >>> > is possible? Not sure if slf4j exposes this.
>> >> >>> >
>> >> >>> > (c) Just remove this default stuff. We'd rather not do this. The
>> goal
>> >> >>> > of this thing is to provide good usability for people who have
>> linked
>> >> >>> > against Spark and haven't done anything to configure logging. For
>> >> >>> > beginners we try to minimize the assumptions about what else they
>> >> know
>> >> >>> > about, and I've found log4j configuration is a huge mental barrier
>> >> for
>> >> >>> > people who are getting started.
>> >> >>> >
>> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If you
>> have
>> >> >>> > any idea if (b) is possible I prefer that one, but it may not be
>> >> >>> > possible or might be brittle.
>> >> >>> >
>> >> >>> > - Patrick
>> >> >>> >
>> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
>> >> >>> wrote:
>> >> >>> >> Totally agree with Paul: a library should not pick the slf4j
>> >> backend.
>> >> >>> It
>> >> >>> >> defeats the purpose of slf4j. That big ugly warning is there to
>> >> alert
>> >> >>> >> people that its their responsibility to pick the back end...
>> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>> >> >>> >>
>> >> >>> >>> Hi, Patrick --
>> >> >>> >>>
>> >> >>> >>> From slf4j, you can either backend it into log4j (which is the
>> way
>> >> >>> that
>> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and then
>> on
>> >> to
>> >> >>> a
>> >> >>> >>> different backend (e.g., logback).  We're doing the latter and
>> >> >>> manipulating
>> >> >>> >>> the dependencies in the build because that's the way the
>> enclosing
>> >> >>> >>> application is set up.
>> >> >>> >>>
>> >> >>> >>> The issue with the current situation is that there's no way for
>> an
>> >> >>> end user
>> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
>> solution
>> >> >>> was to
>> >> >>> >>> use the Maven shade plugin to swap in a version of the Logging
>> >> trait
>> >> >>> with
>> >> >>> >>> the body of that method commented out.)  In addition to the
>> >> situation
>> >> >>> with
>> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders,
>> you
>> >> >>> might
>> >> >>> >>> also run afoul of someone who intentionally configured log4j
>> with
>> >> an
>> >> >>> empty
>> >> >>> >>> set of appenders at the time that Spark is initializing.
>> >> >>> >>>
>> >> >>> >>> I'd be happy with any implementation that lets me choose my
>> logging
>> >> >>> >>> backend: override default behavior via system property, plug-in
>> >> >>> >>> architecture, etc.  I do think it's reasonable to expect someone
>> >> >>> digesting
>> >> >>> >>> a substantial JDK-based system like Spark to understand how to
>> >> >>> initialize
>> >> >>> >>> logging -- surely they're using logging of some kind elsewhere
>> in
>> >> >>> their
>> >> >>> >>> application -- but if you want the default behavior there as a
>> >> >>> courtesy, it
>> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
>> >> >>> message on
>> >> >>> >>> the output that says something like "Initialized default logging
>> >> via
>> >> >>> Log4J;
>> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
>> >> >>> behavior." so
>> >> >>> >>> that it's both convenient and explicit.
>> >> >>> >>>
>> >> >>> >>> Cheers.
>> >> >>> >>> -- Paul
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>> --
>> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
>> http://mult.ifario.us/
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
>> >> pwendell@gmail.com>
>> >> >>> >>> wrote:
>> >> >>> >>>
>> >> >>> >>> > A config option e.g. could just be to add:
>> >> >>> >>> >
>> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
>> >> >>> >>> > If set to true, Spark will try to initialize a log4j logger if
>> >> none
>> >> >>> is
>> >> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
>> >> >>> >>> >
>> >> >>> >>> > Then users could just set this to false if they have a logging
>> >> >>> set-up
>> >> >>> >>> > that conflicts with this.
>> >> >>> >>> >
>> >> >>> >>> > Maybe there is a nicer fix...
>> >> >>> >>> >
>> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>> >> >>> pwendell@gmail.com>
>> >> >>> >>> > wrote:
>> >> >>> >>> > > Hey Paul,
>> >> >>> >>> > >
>> >> >>> >>> > > Thanks for digging this up. I worked on this feature and the
>> >> >>> intent
>> >> >>> >>> > > was to give users good default behavior if they didn't
>> include
>> >> any
>> >> >>> >>> > > logging configuration on the classpath.
>> >> >>> >>> > >
>> >> >>> >>> > > The problem with assuming that CL tooling is going to fix
>> the
>> >> job
>> >> >>> is
>> >> >>> >>> > > that many people link against spark as a library and run
>> their
>> >> >>> >>> > > application using their own scripts. In this case the first
>> >> thing
>> >> >>> >>> > > people see when they run an application that links against
>> >> Spark
>> >> >>> was a
>> >> >>> >>> > > big ugly logging warning.
>> >> >>> >>> > >
>> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
>> >> behavior of
>> >> >>> >>> > > returning null for the appenders seems a little weird. What
>> is
>> >> >>> the use
>> >> >>> >>> > > case for using this and not just directly use slf4j-log4j12
>> >> like
>> >> >>> Spark
>> >> >>> >>> > > itself does?
>> >> >>> >>> > >
>> >> >>> >>> > > Did you have a more general fix for this in mind? Or was
>> your
>> >> >>> plan to
>> >> >>> >>> > > just revert the existing behavior... We might be able to
>> add a
>> >> >>> >>> > > configuration option to disable this logging default stuff.
>> Or
>> >> we
>> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
>> possible.
>> >> >>> >>> > >
>> >> >>> >>> > > - Patrick
>> >> >>> >>> > >
>> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
>> >> prb@mult.ifario.us>
>> >> >>> >>> wrote:
>> >> >>> >>> > >> We have a few applications that embed Spark, and in 0.8.0
>> and
>> >> >>> 0.8.1,
>> >> >>> >>> we
>> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
>> >> unintentionally
>> >> >>> >>> forces
>> >> >>> >>> > >> direct use of log4j as the logging backend.
>> >> >>> >>> > >>
>> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
>> >> >>> >>> > >>
>> >> >>> >>> > >>
>> >> >>> >>> >
>> >> >>> >>>
>> >> >>>
>> >>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> >> >>> >>> > >>
>> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
>> >> >>> appenders
>> >> >>> >>> to
>> >> >>> >>> > the
>> >> >>> >>> > >> ROOT logger:
>> >> >>> >>> > >>
>> >> >>> >>> > >>
>> >> >>> >>> >
>> >> >>> >>>
>> >> >>>
>> >>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> >> >>> >>> > >>
>> >> >>> >>> > >> And this causes an infinite loop and an eventual stack
>> >> overflow.
>> >> >>> >>> > >>
>> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
>> >> >>> significant
>> >> >>> >>> > enough
>> >> >>> >>> > >> reversal of recent changes that it's probably worth
>> discussing
>> >> >>> before
>> >> >>> >>> I
>> >> >>> >>> > >> sink a half hour into it.  My suggestion would be that
>> >> >>> initialization
>> >> >>> >>> > (or
>> >> >>> >>> > >> not) should be left to the user with reasonable default
>> >> behavior
>> >> >>> >>> > supplied
>> >> >>> >>> > >> by the spark commandline tooling and not forced on
>> >> applications
>> >> >>> that
>> >> >>> >>> > >> incorporate Spark.
>> >> >>> >>> > >>
>> >> >>> >>> > >> Thoughts/opinions?
>> >> >>> >>> > >>
>> >> >>> >>> > >> -- Paul
>> >> >>> >>> > >> --
>> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
>> >> http://mult.ifario.us/
>> >> >>> >>> >
>> >> >>> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >>
>>

Re: 0.9.0 forces log4j usage

Posted by Paul Brown <pr...@mult.ifario.us>.
Hi, Patrick --

That's close but not quite it.

The issue that occurs is not the delegation loop mentioned in slf4j
documentation.  The stack overflow is entirely within the code in the Spark
trait:

at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
at org.apache.spark.Logging$class.log(Logging.scala:36)
at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)


And then that repeats.

As for our situation, we exclude the slf4j-log4j12 dependency when we
import the Spark library (because we don't want to use log4j) and have
log4j-over-slf4j already in place to ensure that all of the logging in the
overall application runs through slf4j and then out through logback.  (We
also, as another poster already mentioned, also force jcl and jul through
slf4j.)

The zen of slf4j for libraries is that the library uses the slf4j API and
then the enclosing application can route logging as it sees fit.  Spark
master CLI would log via slf4j and include the slf4j-log4j12 backend; same
for Spark worker CLI.  Spark as a library (versus as a container) would not
include any backend to the slf4j API and leave this up to the application.
 (FWIW, this would also avoid your log4j warning message.)

But as I was saying before, I'd be happy with a situation where I can avoid
log4j being enabled or configured, and I think you'll find an existing
choice of logging framework to be a common scenario for those embedding
Spark in other systems.

Best.
-- Paul

—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Paul,
>
> Looking back at your problem. I think it's the one here:
> http://www.slf4j.org/codes.html#log4jDelegationLoop
>
> So let me just be clear what you are doing so I understand. You have
> some other application that directly calls log4j. So you have to
> include log4j-over-slf4j to route those logs through slf4j to logback.
>
> At the same time you embed Spark in this application. In the past it
> was fine, but now that Spark programmatic ally initializes log4j, it
> screws up your application because log4j-over-slf4j doesn't work with
> applications that do this explicilty as discussed here:
> http://www.slf4j.org/legacy.html
>
> Correct?
>
> - Patrick
>
> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > got it. that sounds reasonable
> >
> >
> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >
> >> Koert - my suggestion was this. We let users use any slf4j backend
> >> they want. If we detect that they are using the log4j backend and
> >> *also* they didn't configure any log4j appenders, we set up some nice
> >> defaults for them. If they are using another backend, Spark doesn't
> >> try to modify the configuration at all.
> >>
> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >> > well "static binding" is probably the wrong terminology but you get
> the
> >> > idea. multiple backends are not allowed and cause an even uglier
> >> warning...
> >> >
> >> > see also here:
> >> > https://github.com/twitter/scalding/pull/636
> >> > and here:
> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> >> > all me being annoying and complaining about slf4j-log4j12 dependencies
> >> > (which did get removed).
> >> >
> >> >
> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >> >
> >> >> the issue is that slf4j uses static binding. you can put only one
> slf4j
> >> >> backend on the classpath, and that's what it uses. more than one is
> not
> >> >> allowed.
> >> >>
> >> >> so you either keep the slf4j-log4j12 dependency for spark, and then
> you
> >> >> took away people's choice of slf4j backend which is considered bad
> form
> >> for
> >> >> a library, or you do not include it and then people will always get
> the
> >> big
> >> >> fat ugly warning and slf4j logging will not flow to log4j.
> >> >>
> >> >> including log4j itself is not necessary a problem i think?
> >> >>
> >> >>
> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pwendell@gmail.com
> >> >wrote:
> >> >>
> >> >>> This also seems relevant - but not my area of expertise (whether
> this
> >> >>> is a valid way to check this).
> >> >>>
> >> >>>
> >> >>>
> >>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
> >> >>>
> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
> pwendell@gmail.com>
> >> >>> wrote:
> >> >>> > Hey Guys,
> >> >>> >
> >> >>> > Thanks for explainning. Ya this is a problem - we didn't really
> know
> >> >>> > that people are using other slf4j backends, slf4j is in there for
> >> >>> > historical reasons but I think we may assume in a few places that
> >> >>> > log4j is being used and we should minimize those.
> >> >>> >
> >> >>> > We should patch this and get a fix into 0.9.1. So some solutions I
> >> see
> >> >>> are:
> >> >>> >
> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this one.
> >> >>> >
> >> >>> > (b) Ask slf4j which backend is active and only try to enforce this
> >> >>> > default if we know slf4j is using log4j. Do either of you know if
> >> this
> >> >>> > is possible? Not sure if slf4j exposes this.
> >> >>> >
> >> >>> > (c) Just remove this default stuff. We'd rather not do this. The
> goal
> >> >>> > of this thing is to provide good usability for people who have
> linked
> >> >>> > against Spark and haven't done anything to configure logging. For
> >> >>> > beginners we try to minimize the assumptions about what else they
> >> know
> >> >>> > about, and I've found log4j configuration is a huge mental barrier
> >> for
> >> >>> > people who are getting started.
> >> >>> >
> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If you
> have
> >> >>> > any idea if (b) is possible I prefer that one, but it may not be
> >> >>> > possible or might be brittle.
> >> >>> >
> >> >>> > - Patrick
> >> >>> >
> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
> >> >>> wrote:
> >> >>> >> Totally agree with Paul: a library should not pick the slf4j
> >> backend.
> >> >>> It
> >> >>> >> defeats the purpose of slf4j. That big ugly warning is there to
> >> alert
> >> >>> >> people that its their responsibility to pick the back end...
> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
> >> >>> >>
> >> >>> >>> Hi, Patrick --
> >> >>> >>>
> >> >>> >>> From slf4j, you can either backend it into log4j (which is the
> way
> >> >>> that
> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and then
> on
> >> to
> >> >>> a
> >> >>> >>> different backend (e.g., logback).  We're doing the latter and
> >> >>> manipulating
> >> >>> >>> the dependencies in the build because that's the way the
> enclosing
> >> >>> >>> application is set up.
> >> >>> >>>
> >> >>> >>> The issue with the current situation is that there's no way for
> an
> >> >>> end user
> >> >>> >>> to choose to *not* use the log4j backend.  (My short-term
> solution
> >> >>> was to
> >> >>> >>> use the Maven shade plugin to swap in a version of the Logging
> >> trait
> >> >>> with
> >> >>> >>> the body of that method commented out.)  In addition to the
> >> situation
> >> >>> with
> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders,
> you
> >> >>> might
> >> >>> >>> also run afoul of someone who intentionally configured log4j
> with
> >> an
> >> >>> empty
> >> >>> >>> set of appenders at the time that Spark is initializing.
> >> >>> >>>
> >> >>> >>> I'd be happy with any implementation that lets me choose my
> logging
> >> >>> >>> backend: override default behavior via system property, plug-in
> >> >>> >>> architecture, etc.  I do think it's reasonable to expect someone
> >> >>> digesting
> >> >>> >>> a substantial JDK-based system like Spark to understand how to
> >> >>> initialize
> >> >>> >>> logging -- surely they're using logging of some kind elsewhere
> in
> >> >>> their
> >> >>> >>> application -- but if you want the default behavior there as a
> >> >>> courtesy, it
> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
> >> >>> message on
> >> >>> >>> the output that says something like "Initialized default logging
> >> via
> >> >>> Log4J;
> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
> >> >>> behavior." so
> >> >>> >>> that it's both convenient and explicit.
> >> >>> >>>
> >> >>> >>> Cheers.
> >> >>> >>> -- Paul
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
> http://mult.ifario.us/
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
> >> pwendell@gmail.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > A config option e.g. could just be to add:
> >> >>> >>> >
> >> >>> >>> > spark.logging.loadDefaultLogger (default true)
> >> >>> >>> > If set to true, Spark will try to initialize a log4j logger if
> >> none
> >> >>> is
> >> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
> >> >>> >>> >
> >> >>> >>> > Then users could just set this to false if they have a logging
> >> >>> set-up
> >> >>> >>> > that conflicts with this.
> >> >>> >>> >
> >> >>> >>> > Maybe there is a nicer fix...
> >> >>> >>> >
> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
> >> >>> pwendell@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> > > Hey Paul,
> >> >>> >>> > >
> >> >>> >>> > > Thanks for digging this up. I worked on this feature and the
> >> >>> intent
> >> >>> >>> > > was to give users good default behavior if they didn't
> include
> >> any
> >> >>> >>> > > logging configuration on the classpath.
> >> >>> >>> > >
> >> >>> >>> > > The problem with assuming that CL tooling is going to fix
> the
> >> job
> >> >>> is
> >> >>> >>> > > that many people link against spark as a library and run
> their
> >> >>> >>> > > application using their own scripts. In this case the first
> >> thing
> >> >>> >>> > > people see when they run an application that links against
> >> Spark
> >> >>> was a
> >> >>> >>> > > big ugly logging warning.
> >> >>> >>> > >
> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
> >> behavior of
> >> >>> >>> > > returning null for the appenders seems a little weird. What
> is
> >> >>> the use
> >> >>> >>> > > case for using this and not just directly use slf4j-log4j12
> >> like
> >> >>> Spark
> >> >>> >>> > > itself does?
> >> >>> >>> > >
> >> >>> >>> > > Did you have a more general fix for this in mind? Or was
> your
> >> >>> plan to
> >> >>> >>> > > just revert the existing behavior... We might be able to
> add a
> >> >>> >>> > > configuration option to disable this logging default stuff.
> Or
> >> we
> >> >>> >>> > > could just rip it out - but I'd like to avoid that if
> possible.
> >> >>> >>> > >
> >> >>> >>> > > - Patrick
> >> >>> >>> > >
> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
> >> prb@mult.ifario.us>
> >> >>> >>> wrote:
> >> >>> >>> > >> We have a few applications that embed Spark, and in 0.8.0
> and
> >> >>> 0.8.1,
> >> >>> >>> we
> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
> >> unintentionally
> >> >>> >>> forces
> >> >>> >>> > >> direct use of log4j as the logging backend.
> >> >>> >>> > >>
> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >> >>> >>> > >>
> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
> >> >>> appenders
> >> >>> >>> to
> >> >>> >>> > the
> >> >>> >>> > >> ROOT logger:
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >> >>> >>> > >>
> >> >>> >>> > >> And this causes an infinite loop and an eventual stack
> >> overflow.
> >> >>> >>> > >>
> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
> >> >>> significant
> >> >>> >>> > enough
> >> >>> >>> > >> reversal of recent changes that it's probably worth
> discussing
> >> >>> before
> >> >>> >>> I
> >> >>> >>> > >> sink a half hour into it.  My suggestion would be that
> >> >>> initialization
> >> >>> >>> > (or
> >> >>> >>> > >> not) should be left to the user with reasonable default
> >> behavior
> >> >>> >>> > supplied
> >> >>> >>> > >> by the spark commandline tooling and not forced on
> >> applications
> >> >>> that
> >> >>> >>> > >> incorporate Spark.
> >> >>> >>> > >>
> >> >>> >>> > >> Thoughts/opinions?
> >> >>> >>> > >>
> >> >>> >>> > >> -- Paul
> >> >>> >>> > >> --
> >> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
> >> http://mult.ifario.us/
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >> >>
> >> >>
> >>
>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Paul,

Looking back at your problem. I think it's the one here:
http://www.slf4j.org/codes.html#log4jDelegationLoop

So let me just be clear what you are doing so I understand. You have
some other application that directly calls log4j. So you have to
include log4j-over-slf4j to route those logs through slf4j to logback.

At the same time you embed Spark in this application. In the past it
was fine, but now that Spark programmatic ally initializes log4j, it
screws up your application because log4j-over-slf4j doesn't work with
applications that do this explicilty as discussed here:
http://www.slf4j.org/legacy.html

Correct?

- Patrick

On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com> wrote:
> got it. that sounds reasonable
>
>
> On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com> wrote:
>
>> Koert - my suggestion was this. We let users use any slf4j backend
>> they want. If we detect that they are using the log4j backend and
>> *also* they didn't configure any log4j appenders, we set up some nice
>> defaults for them. If they are using another backend, Spark doesn't
>> try to modify the configuration at all.
>>
>> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com> wrote:
>> > well "static binding" is probably the wrong terminology but you get the
>> > idea. multiple backends are not allowed and cause an even uglier
>> warning...
>> >
>> > see also here:
>> > https://github.com/twitter/scalding/pull/636
>> > and here:
>> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>> > all me being annoying and complaining about slf4j-log4j12 dependencies
>> > (which did get removed).
>> >
>> >
>> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> >
>> >> the issue is that slf4j uses static binding. you can put only one slf4j
>> >> backend on the classpath, and that's what it uses. more than one is not
>> >> allowed.
>> >>
>> >> so you either keep the slf4j-log4j12 dependency for spark, and then you
>> >> took away people's choice of slf4j backend which is considered bad form
>> for
>> >> a library, or you do not include it and then people will always get the
>> big
>> >> fat ugly warning and slf4j logging will not flow to log4j.
>> >>
>> >> including log4j itself is not necessary a problem i think?
>> >>
>> >>
>> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pwendell@gmail.com
>> >wrote:
>> >>
>> >>> This also seems relevant - but not my area of expertise (whether this
>> >>> is a valid way to check this).
>> >>>
>> >>>
>> >>>
>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>> >>>
>> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com>
>> >>> wrote:
>> >>> > Hey Guys,
>> >>> >
>> >>> > Thanks for explainning. Ya this is a problem - we didn't really know
>> >>> > that people are using other slf4j backends, slf4j is in there for
>> >>> > historical reasons but I think we may assume in a few places that
>> >>> > log4j is being used and we should minimize those.
>> >>> >
>> >>> > We should patch this and get a fix into 0.9.1. So some solutions I
>> see
>> >>> are:
>> >>> >
>> >>> > (a) Add SparkConf option to disable this. I'm fine with this one.
>> >>> >
>> >>> > (b) Ask slf4j which backend is active and only try to enforce this
>> >>> > default if we know slf4j is using log4j. Do either of you know if
>> this
>> >>> > is possible? Not sure if slf4j exposes this.
>> >>> >
>> >>> > (c) Just remove this default stuff. We'd rather not do this. The goal
>> >>> > of this thing is to provide good usability for people who have linked
>> >>> > against Spark and haven't done anything to configure logging. For
>> >>> > beginners we try to minimize the assumptions about what else they
>> know
>> >>> > about, and I've found log4j configuration is a huge mental barrier
>> for
>> >>> > people who are getting started.
>> >>> >
>> >>> > Paul if you submit a patch doing (a) we can merge it in. If you have
>> >>> > any idea if (b) is possible I prefer that one, but it may not be
>> >>> > possible or might be brittle.
>> >>> >
>> >>> > - Patrick
>> >>> >
>> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
>> >>> wrote:
>> >>> >> Totally agree with Paul: a library should not pick the slf4j
>> backend.
>> >>> It
>> >>> >> defeats the purpose of slf4j. That big ugly warning is there to
>> alert
>> >>> >> people that its their responsibility to pick the back end...
>> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>> >>> >>
>> >>> >>> Hi, Patrick --
>> >>> >>>
>> >>> >>> From slf4j, you can either backend it into log4j (which is the way
>> >>> that
>> >>> >>> Spark is shipped) or you can route log4j through slf4j and then on
>> to
>> >>> a
>> >>> >>> different backend (e.g., logback).  We're doing the latter and
>> >>> manipulating
>> >>> >>> the dependencies in the build because that's the way the enclosing
>> >>> >>> application is set up.
>> >>> >>>
>> >>> >>> The issue with the current situation is that there's no way for an
>> >>> end user
>> >>> >>> to choose to *not* use the log4j backend.  (My short-term solution
>> >>> was to
>> >>> >>> use the Maven shade plugin to swap in a version of the Logging
>> trait
>> >>> with
>> >>> >>> the body of that method commented out.)  In addition to the
>> situation
>> >>> with
>> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you
>> >>> might
>> >>> >>> also run afoul of someone who intentionally configured log4j with
>> an
>> >>> empty
>> >>> >>> set of appenders at the time that Spark is initializing.
>> >>> >>>
>> >>> >>> I'd be happy with any implementation that lets me choose my logging
>> >>> >>> backend: override default behavior via system property, plug-in
>> >>> >>> architecture, etc.  I do think it's reasonable to expect someone
>> >>> digesting
>> >>> >>> a substantial JDK-based system like Spark to understand how to
>> >>> initialize
>> >>> >>> logging -- surely they're using logging of some kind elsewhere in
>> >>> their
>> >>> >>> application -- but if you want the default behavior there as a
>> >>> courtesy, it
>> >>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
>> >>> message on
>> >>> >>> the output that says something like "Initialized default logging
>> via
>> >>> Log4J;
>> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
>> >>> behavior." so
>> >>> >>> that it's both convenient and explicit.
>> >>> >>>
>> >>> >>> Cheers.
>> >>> >>> -- Paul
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >>> >>>
>> >>> >>>
>> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
>> pwendell@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > A config option e.g. could just be to add:
>> >>> >>> >
>> >>> >>> > spark.logging.loadDefaultLogger (default true)
>> >>> >>> > If set to true, Spark will try to initialize a log4j logger if
>> none
>> >>> is
>> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
>> >>> >>> >
>> >>> >>> > Then users could just set this to false if they have a logging
>> >>> set-up
>> >>> >>> > that conflicts with this.
>> >>> >>> >
>> >>> >>> > Maybe there is a nicer fix...
>> >>> >>> >
>> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>> >>> pwendell@gmail.com>
>> >>> >>> > wrote:
>> >>> >>> > > Hey Paul,
>> >>> >>> > >
>> >>> >>> > > Thanks for digging this up. I worked on this feature and the
>> >>> intent
>> >>> >>> > > was to give users good default behavior if they didn't include
>> any
>> >>> >>> > > logging configuration on the classpath.
>> >>> >>> > >
>> >>> >>> > > The problem with assuming that CL tooling is going to fix the
>> job
>> >>> is
>> >>> >>> > > that many people link against spark as a library and run their
>> >>> >>> > > application using their own scripts. In this case the first
>> thing
>> >>> >>> > > people see when they run an application that links against
>> Spark
>> >>> was a
>> >>> >>> > > big ugly logging warning.
>> >>> >>> > >
>> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
>> behavior of
>> >>> >>> > > returning null for the appenders seems a little weird. What is
>> >>> the use
>> >>> >>> > > case for using this and not just directly use slf4j-log4j12
>> like
>> >>> Spark
>> >>> >>> > > itself does?
>> >>> >>> > >
>> >>> >>> > > Did you have a more general fix for this in mind? Or was your
>> >>> plan to
>> >>> >>> > > just revert the existing behavior... We might be able to add a
>> >>> >>> > > configuration option to disable this logging default stuff. Or
>> we
>> >>> >>> > > could just rip it out - but I'd like to avoid that if possible.
>> >>> >>> > >
>> >>> >>> > > - Patrick
>> >>> >>> > >
>> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
>> prb@mult.ifario.us>
>> >>> >>> wrote:
>> >>> >>> > >> We have a few applications that embed Spark, and in 0.8.0 and
>> >>> 0.8.1,
>> >>> >>> we
>> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
>> unintentionally
>> >>> >>> forces
>> >>> >>> > >> direct use of log4j as the logging backend.
>> >>> >>> > >>
>> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> >
>> >>> >>>
>> >>>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> >>> >>> > >>
>> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
>> >>> appenders
>> >>> >>> to
>> >>> >>> > the
>> >>> >>> > >> ROOT logger:
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> >
>> >>> >>>
>> >>>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> >>> >>> > >>
>> >>> >>> > >> And this causes an infinite loop and an eventual stack
>> overflow.
>> >>> >>> > >>
>> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
>> >>> significant
>> >>> >>> > enough
>> >>> >>> > >> reversal of recent changes that it's probably worth discussing
>> >>> before
>> >>> >>> I
>> >>> >>> > >> sink a half hour into it.  My suggestion would be that
>> >>> initialization
>> >>> >>> > (or
>> >>> >>> > >> not) should be left to the user with reasonable default
>> behavior
>> >>> >>> > supplied
>> >>> >>> > >> by the spark commandline tooling and not forced on
>> applications
>> >>> that
>> >>> >>> > >> incorporate Spark.
>> >>> >>> > >>
>> >>> >>> > >> Thoughts/opinions?
>> >>> >>> > >>
>> >>> >>> > >> -- Paul
>> >>> >>> > >> --
>> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
>> http://mult.ifario.us/
>> >>> >>> >
>> >>> >>>
>> >>>
>> >>
>> >>
>>

Re: 0.9.0 forces log4j usage

Posted by Koert Kuipers <ko...@tresata.com>.
got it. that sounds reasonable


On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Koert - my suggestion was this. We let users use any slf4j backend
> they want. If we detect that they are using the log4j backend and
> *also* they didn't configure any log4j appenders, we set up some nice
> defaults for them. If they are using another backend, Spark doesn't
> try to modify the configuration at all.
>
> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com> wrote:
> > well "static binding" is probably the wrong terminology but you get the
> > idea. multiple backends are not allowed and cause an even uglier
> warning...
> >
> > see also here:
> > https://github.com/twitter/scalding/pull/636
> > and here:
> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> > all me being annoying and complaining about slf4j-log4j12 dependencies
> > (which did get removed).
> >
> >
> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com> wrote:
> >
> >> the issue is that slf4j uses static binding. you can put only one slf4j
> >> backend on the classpath, and that's what it uses. more than one is not
> >> allowed.
> >>
> >> so you either keep the slf4j-log4j12 dependency for spark, and then you
> >> took away people's choice of slf4j backend which is considered bad form
> for
> >> a library, or you do not include it and then people will always get the
> big
> >> fat ugly warning and slf4j logging will not flow to log4j.
> >>
> >> including log4j itself is not necessary a problem i think?
> >>
> >>
> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pwendell@gmail.com
> >wrote:
> >>
> >>> This also seems relevant - but not my area of expertise (whether this
> >>> is a valid way to check this).
> >>>
> >>>
> >>>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
> >>>
> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com>
> >>> wrote:
> >>> > Hey Guys,
> >>> >
> >>> > Thanks for explainning. Ya this is a problem - we didn't really know
> >>> > that people are using other slf4j backends, slf4j is in there for
> >>> > historical reasons but I think we may assume in a few places that
> >>> > log4j is being used and we should minimize those.
> >>> >
> >>> > We should patch this and get a fix into 0.9.1. So some solutions I
> see
> >>> are:
> >>> >
> >>> > (a) Add SparkConf option to disable this. I'm fine with this one.
> >>> >
> >>> > (b) Ask slf4j which backend is active and only try to enforce this
> >>> > default if we know slf4j is using log4j. Do either of you know if
> this
> >>> > is possible? Not sure if slf4j exposes this.
> >>> >
> >>> > (c) Just remove this default stuff. We'd rather not do this. The goal
> >>> > of this thing is to provide good usability for people who have linked
> >>> > against Spark and haven't done anything to configure logging. For
> >>> > beginners we try to minimize the assumptions about what else they
> know
> >>> > about, and I've found log4j configuration is a huge mental barrier
> for
> >>> > people who are getting started.
> >>> >
> >>> > Paul if you submit a patch doing (a) we can merge it in. If you have
> >>> > any idea if (b) is possible I prefer that one, but it may not be
> >>> > possible or might be brittle.
> >>> >
> >>> > - Patrick
> >>> >
> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
> >>> wrote:
> >>> >> Totally agree with Paul: a library should not pick the slf4j
> backend.
> >>> It
> >>> >> defeats the purpose of slf4j. That big ugly warning is there to
> alert
> >>> >> people that its their responsibility to pick the back end...
> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
> >>> >>
> >>> >>> Hi, Patrick --
> >>> >>>
> >>> >>> From slf4j, you can either backend it into log4j (which is the way
> >>> that
> >>> >>> Spark is shipped) or you can route log4j through slf4j and then on
> to
> >>> a
> >>> >>> different backend (e.g., logback).  We're doing the latter and
> >>> manipulating
> >>> >>> the dependencies in the build because that's the way the enclosing
> >>> >>> application is set up.
> >>> >>>
> >>> >>> The issue with the current situation is that there's no way for an
> >>> end user
> >>> >>> to choose to *not* use the log4j backend.  (My short-term solution
> >>> was to
> >>> >>> use the Maven shade plugin to swap in a version of the Logging
> trait
> >>> with
> >>> >>> the body of that method commented out.)  In addition to the
> situation
> >>> with
> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you
> >>> might
> >>> >>> also run afoul of someone who intentionally configured log4j with
> an
> >>> empty
> >>> >>> set of appenders at the time that Spark is initializing.
> >>> >>>
> >>> >>> I'd be happy with any implementation that lets me choose my logging
> >>> >>> backend: override default behavior via system property, plug-in
> >>> >>> architecture, etc.  I do think it's reasonable to expect someone
> >>> digesting
> >>> >>> a substantial JDK-based system like Spark to understand how to
> >>> initialize
> >>> >>> logging -- surely they're using logging of some kind elsewhere in
> >>> their
> >>> >>> application -- but if you want the default behavior there as a
> >>> courtesy, it
> >>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
> >>> message on
> >>> >>> the output that says something like "Initialized default logging
> via
> >>> Log4J;
> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
> >>> behavior." so
> >>> >>> that it's both convenient and explicit.
> >>> >>>
> >>> >>> Cheers.
> >>> >>> -- Paul
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >>> >>>
> >>> >>>
> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <
> pwendell@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>> > A config option e.g. could just be to add:
> >>> >>> >
> >>> >>> > spark.logging.loadDefaultLogger (default true)
> >>> >>> > If set to true, Spark will try to initialize a log4j logger if
> none
> >>> is
> >>> >>> > detected. Otherwise Spark will not modify logging behavior.
> >>> >>> >
> >>> >>> > Then users could just set this to false if they have a logging
> >>> set-up
> >>> >>> > that conflicts with this.
> >>> >>> >
> >>> >>> > Maybe there is a nicer fix...
> >>> >>> >
> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
> >>> pwendell@gmail.com>
> >>> >>> > wrote:
> >>> >>> > > Hey Paul,
> >>> >>> > >
> >>> >>> > > Thanks for digging this up. I worked on this feature and the
> >>> intent
> >>> >>> > > was to give users good default behavior if they didn't include
> any
> >>> >>> > > logging configuration on the classpath.
> >>> >>> > >
> >>> >>> > > The problem with assuming that CL tooling is going to fix the
> job
> >>> is
> >>> >>> > > that many people link against spark as a library and run their
> >>> >>> > > application using their own scripts. In this case the first
> thing
> >>> >>> > > people see when they run an application that links against
> Spark
> >>> was a
> >>> >>> > > big ugly logging warning.
> >>> >>> > >
> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this
> behavior of
> >>> >>> > > returning null for the appenders seems a little weird. What is
> >>> the use
> >>> >>> > > case for using this and not just directly use slf4j-log4j12
> like
> >>> Spark
> >>> >>> > > itself does?
> >>> >>> > >
> >>> >>> > > Did you have a more general fix for this in mind? Or was your
> >>> plan to
> >>> >>> > > just revert the existing behavior... We might be able to add a
> >>> >>> > > configuration option to disable this logging default stuff. Or
> we
> >>> >>> > > could just rip it out - but I'd like to avoid that if possible.
> >>> >>> > >
> >>> >>> > > - Patrick
> >>> >>> > >
> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <
> prb@mult.ifario.us>
> >>> >>> wrote:
> >>> >>> > >> We have a few applications that embed Spark, and in 0.8.0 and
> >>> 0.8.1,
> >>> >>> we
> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and
> unintentionally
> >>> >>> forces
> >>> >>> > >> direct use of log4j as the logging backend.
> >>> >>> > >>
> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
> >>> >>> > >>
> >>> >>> > >>
> >>> >>> >
> >>> >>>
> >>>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >>> >>> > >>
> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
> >>> appenders
> >>> >>> to
> >>> >>> > the
> >>> >>> > >> ROOT logger:
> >>> >>> > >>
> >>> >>> > >>
> >>> >>> >
> >>> >>>
> >>>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >>> >>> > >>
> >>> >>> > >> And this causes an infinite loop and an eventual stack
> overflow.
> >>> >>> > >>
> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
> >>> significant
> >>> >>> > enough
> >>> >>> > >> reversal of recent changes that it's probably worth discussing
> >>> before
> >>> >>> I
> >>> >>> > >> sink a half hour into it.  My suggestion would be that
> >>> initialization
> >>> >>> > (or
> >>> >>> > >> not) should be left to the user with reasonable default
> behavior
> >>> >>> > supplied
> >>> >>> > >> by the spark commandline tooling and not forced on
> applications
> >>> that
> >>> >>> > >> incorporate Spark.
> >>> >>> > >>
> >>> >>> > >> Thoughts/opinions?
> >>> >>> > >>
> >>> >>> > >> -- Paul
> >>> >>> > >> --
> >>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. |
> http://mult.ifario.us/
> >>> >>> >
> >>> >>>
> >>>
> >>
> >>
>

Re: 0.9.0 forces log4j usage

Posted by Sean Owen <so...@cloudera.com>.
As a slf4j user, FWIW, I think this approach is fine. Just note that
you will have to handle log4j classes via reflection if they are not
going to always be on the user classpath.

Is it sufficient to bundle log4j.properties? no programmatic config
mess then. I don't know log4j enough to know if that accomplishes the
goal.

While we're on the topic, I see that additional slf4j shims are in
place to route java.util.logging calls through slf4j and therefore on
to log4j or whatever for centralized config. The same ought to be done
for commons-logging, no? and we also should really exclude log4j and
include log4j-to-slf4j if we really want people to be able to use
something besides log4j.

I can easily whip up a PR if anyone thinks that's a good idea.


On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Koert - my suggestion was this. We let users use any slf4j backend
> they want. If we detect that they are using the log4j backend and
> *also* they didn't configure any log4j appenders, we set up some nice
> defaults for them. If they are using another backend, Spark doesn't
> try to modify the configuration at all.
>
> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com> wrote:
>> well "static binding" is probably the wrong terminology but you get the
>> idea. multiple backends are not allowed and cause an even uglier warning...
>>
>> see also here:
>> https://github.com/twitter/scalding/pull/636
>> and here:
>> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
>> all me being annoying and complaining about slf4j-log4j12 dependencies
>> (which did get removed).
>>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Koert - my suggestion was this. We let users use any slf4j backend
they want. If we detect that they are using the log4j backend and
*also* they didn't configure any log4j appenders, we set up some nice
defaults for them. If they are using another backend, Spark doesn't
try to modify the configuration at all.

On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com> wrote:
> well "static binding" is probably the wrong terminology but you get the
> idea. multiple backends are not allowed and cause an even uglier warning...
>
> see also here:
> https://github.com/twitter/scalding/pull/636
> and here:
> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> all me being annoying and complaining about slf4j-log4j12 dependencies
> (which did get removed).
>
>
> On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> the issue is that slf4j uses static binding. you can put only one slf4j
>> backend on the classpath, and that's what it uses. more than one is not
>> allowed.
>>
>> so you either keep the slf4j-log4j12 dependency for spark, and then you
>> took away people's choice of slf4j backend which is considered bad form for
>> a library, or you do not include it and then people will always get the big
>> fat ugly warning and slf4j logging will not flow to log4j.
>>
>> including log4j itself is not necessary a problem i think?
>>
>>
>> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pw...@gmail.com>wrote:
>>
>>> This also seems relevant - but not my area of expertise (whether this
>>> is a valid way to check this).
>>>
>>>
>>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>>>
>>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>> > Hey Guys,
>>> >
>>> > Thanks for explainning. Ya this is a problem - we didn't really know
>>> > that people are using other slf4j backends, slf4j is in there for
>>> > historical reasons but I think we may assume in a few places that
>>> > log4j is being used and we should minimize those.
>>> >
>>> > We should patch this and get a fix into 0.9.1. So some solutions I see
>>> are:
>>> >
>>> > (a) Add SparkConf option to disable this. I'm fine with this one.
>>> >
>>> > (b) Ask slf4j which backend is active and only try to enforce this
>>> > default if we know slf4j is using log4j. Do either of you know if this
>>> > is possible? Not sure if slf4j exposes this.
>>> >
>>> > (c) Just remove this default stuff. We'd rather not do this. The goal
>>> > of this thing is to provide good usability for people who have linked
>>> > against Spark and haven't done anything to configure logging. For
>>> > beginners we try to minimize the assumptions about what else they know
>>> > about, and I've found log4j configuration is a huge mental barrier for
>>> > people who are getting started.
>>> >
>>> > Paul if you submit a patch doing (a) we can merge it in. If you have
>>> > any idea if (b) is possible I prefer that one, but it may not be
>>> > possible or might be brittle.
>>> >
>>> > - Patrick
>>> >
>>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>> >> Totally agree with Paul: a library should not pick the slf4j backend.
>>> It
>>> >> defeats the purpose of slf4j. That big ugly warning is there to alert
>>> >> people that its their responsibility to pick the back end...
>>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>>> >>
>>> >>> Hi, Patrick --
>>> >>>
>>> >>> From slf4j, you can either backend it into log4j (which is the way
>>> that
>>> >>> Spark is shipped) or you can route log4j through slf4j and then on to
>>> a
>>> >>> different backend (e.g., logback).  We're doing the latter and
>>> manipulating
>>> >>> the dependencies in the build because that's the way the enclosing
>>> >>> application is set up.
>>> >>>
>>> >>> The issue with the current situation is that there's no way for an
>>> end user
>>> >>> to choose to *not* use the log4j backend.  (My short-term solution
>>> was to
>>> >>> use the Maven shade plugin to swap in a version of the Logging trait
>>> with
>>> >>> the body of that method commented out.)  In addition to the situation
>>> with
>>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you
>>> might
>>> >>> also run afoul of someone who intentionally configured log4j with an
>>> empty
>>> >>> set of appenders at the time that Spark is initializing.
>>> >>>
>>> >>> I'd be happy with any implementation that lets me choose my logging
>>> >>> backend: override default behavior via system property, plug-in
>>> >>> architecture, etc.  I do think it's reasonable to expect someone
>>> digesting
>>> >>> a substantial JDK-based system like Spark to understand how to
>>> initialize
>>> >>> logging -- surely they're using logging of some kind elsewhere in
>>> their
>>> >>> application -- but if you want the default behavior there as a
>>> courtesy, it
>>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
>>> message on
>>> >>> the output that says something like "Initialized default logging via
>>> Log4J;
>>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
>>> behavior." so
>>> >>> that it's both convenient and explicit.
>>> >>>
>>> >>> Cheers.
>>> >>> -- Paul
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>> >>>
>>> >>>
>>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> > A config option e.g. could just be to add:
>>> >>> >
>>> >>> > spark.logging.loadDefaultLogger (default true)
>>> >>> > If set to true, Spark will try to initialize a log4j logger if none
>>> is
>>> >>> > detected. Otherwise Spark will not modify logging behavior.
>>> >>> >
>>> >>> > Then users could just set this to false if they have a logging
>>> set-up
>>> >>> > that conflicts with this.
>>> >>> >
>>> >>> > Maybe there is a nicer fix...
>>> >>> >
>>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>>> pwendell@gmail.com>
>>> >>> > wrote:
>>> >>> > > Hey Paul,
>>> >>> > >
>>> >>> > > Thanks for digging this up. I worked on this feature and the
>>> intent
>>> >>> > > was to give users good default behavior if they didn't include any
>>> >>> > > logging configuration on the classpath.
>>> >>> > >
>>> >>> > > The problem with assuming that CL tooling is going to fix the job
>>> is
>>> >>> > > that many people link against spark as a library and run their
>>> >>> > > application using their own scripts. In this case the first thing
>>> >>> > > people see when they run an application that links against Spark
>>> was a
>>> >>> > > big ugly logging warning.
>>> >>> > >
>>> >>> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
>>> >>> > > returning null for the appenders seems a little weird. What is
>>> the use
>>> >>> > > case for using this and not just directly use slf4j-log4j12 like
>>> Spark
>>> >>> > > itself does?
>>> >>> > >
>>> >>> > > Did you have a more general fix for this in mind? Or was your
>>> plan to
>>> >>> > > just revert the existing behavior... We might be able to add a
>>> >>> > > configuration option to disable this logging default stuff. Or we
>>> >>> > > could just rip it out - but I'd like to avoid that if possible.
>>> >>> > >
>>> >>> > > - Patrick
>>> >>> > >
>>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
>>> >>> wrote:
>>> >>> > >> We have a few applications that embed Spark, and in 0.8.0 and
>>> 0.8.1,
>>> >>> we
>>> >>> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
>>> >>> forces
>>> >>> > >> direct use of log4j as the logging backend.
>>> >>> > >>
>>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
>>> >>> > >>
>>> >>> > >>
>>> >>> >
>>> >>>
>>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>>> >>> > >>
>>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
>>> appenders
>>> >>> to
>>> >>> > the
>>> >>> > >> ROOT logger:
>>> >>> > >>
>>> >>> > >>
>>> >>> >
>>> >>>
>>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>>> >>> > >>
>>> >>> > >> And this causes an infinite loop and an eventual stack overflow.
>>> >>> > >>
>>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
>>> significant
>>> >>> > enough
>>> >>> > >> reversal of recent changes that it's probably worth discussing
>>> before
>>> >>> I
>>> >>> > >> sink a half hour into it.  My suggestion would be that
>>> initialization
>>> >>> > (or
>>> >>> > >> not) should be left to the user with reasonable default behavior
>>> >>> > supplied
>>> >>> > >> by the spark commandline tooling and not forced on applications
>>> that
>>> >>> > >> incorporate Spark.
>>> >>> > >>
>>> >>> > >> Thoughts/opinions?
>>> >>> > >>
>>> >>> > >> -- Paul
>>> >>> > >> --
>>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>> >>> >
>>> >>>
>>>
>>
>>

Re: 0.9.0 forces log4j usage

Posted by Koert Kuipers <ko...@tresata.com>.
well "static binding" is probably the wrong terminology but you get the
idea. multiple backends are not allowed and cause an even uglier warning...

see also here:
https://github.com/twitter/scalding/pull/636
and here:
https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
all me being annoying and complaining about slf4j-log4j12 dependencies
(which did get removed).


On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <ko...@tresata.com> wrote:

> the issue is that slf4j uses static binding. you can put only one slf4j
> backend on the classpath, and that's what it uses. more than one is not
> allowed.
>
> so you either keep the slf4j-log4j12 dependency for spark, and then you
> took away people's choice of slf4j backend which is considered bad form for
> a library, or you do not include it and then people will always get the big
> fat ugly warning and slf4j logging will not flow to log4j.
>
> including log4j itself is not necessary a problem i think?
>
>
> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pw...@gmail.com>wrote:
>
>> This also seems relevant - but not my area of expertise (whether this
>> is a valid way to check this).
>>
>>
>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>>
>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> > Hey Guys,
>> >
>> > Thanks for explainning. Ya this is a problem - we didn't really know
>> > that people are using other slf4j backends, slf4j is in there for
>> > historical reasons but I think we may assume in a few places that
>> > log4j is being used and we should minimize those.
>> >
>> > We should patch this and get a fix into 0.9.1. So some solutions I see
>> are:
>> >
>> > (a) Add SparkConf option to disable this. I'm fine with this one.
>> >
>> > (b) Ask slf4j which backend is active and only try to enforce this
>> > default if we know slf4j is using log4j. Do either of you know if this
>> > is possible? Not sure if slf4j exposes this.
>> >
>> > (c) Just remove this default stuff. We'd rather not do this. The goal
>> > of this thing is to provide good usability for people who have linked
>> > against Spark and haven't done anything to configure logging. For
>> > beginners we try to minimize the assumptions about what else they know
>> > about, and I've found log4j configuration is a huge mental barrier for
>> > people who are getting started.
>> >
>> > Paul if you submit a patch doing (a) we can merge it in. If you have
>> > any idea if (b) is possible I prefer that one, but it may not be
>> > possible or might be brittle.
>> >
>> > - Patrick
>> >
>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>> >> Totally agree with Paul: a library should not pick the slf4j backend.
>> It
>> >> defeats the purpose of slf4j. That big ugly warning is there to alert
>> >> people that its their responsibility to pick the back end...
>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>> >>
>> >>> Hi, Patrick --
>> >>>
>> >>> From slf4j, you can either backend it into log4j (which is the way
>> that
>> >>> Spark is shipped) or you can route log4j through slf4j and then on to
>> a
>> >>> different backend (e.g., logback).  We're doing the latter and
>> manipulating
>> >>> the dependencies in the build because that's the way the enclosing
>> >>> application is set up.
>> >>>
>> >>> The issue with the current situation is that there's no way for an
>> end user
>> >>> to choose to *not* use the log4j backend.  (My short-term solution
>> was to
>> >>> use the Maven shade plugin to swap in a version of the Logging trait
>> with
>> >>> the body of that method commented out.)  In addition to the situation
>> with
>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you
>> might
>> >>> also run afoul of someone who intentionally configured log4j with an
>> empty
>> >>> set of appenders at the time that Spark is initializing.
>> >>>
>> >>> I'd be happy with any implementation that lets me choose my logging
>> >>> backend: override default behavior via system property, plug-in
>> >>> architecture, etc.  I do think it's reasonable to expect someone
>> digesting
>> >>> a substantial JDK-based system like Spark to understand how to
>> initialize
>> >>> logging -- surely they're using logging of some kind elsewhere in
>> their
>> >>> application -- but if you want the default behavior there as a
>> courtesy, it
>> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
>> message on
>> >>> the output that says something like "Initialized default logging via
>> Log4J;
>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
>> behavior." so
>> >>> that it's both convenient and explicit.
>> >>>
>> >>> Cheers.
>> >>> -- Paul
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >>>
>> >>>
>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > A config option e.g. could just be to add:
>> >>> >
>> >>> > spark.logging.loadDefaultLogger (default true)
>> >>> > If set to true, Spark will try to initialize a log4j logger if none
>> is
>> >>> > detected. Otherwise Spark will not modify logging behavior.
>> >>> >
>> >>> > Then users could just set this to false if they have a logging
>> set-up
>> >>> > that conflicts with this.
>> >>> >
>> >>> > Maybe there is a nicer fix...
>> >>> >
>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <
>> pwendell@gmail.com>
>> >>> > wrote:
>> >>> > > Hey Paul,
>> >>> > >
>> >>> > > Thanks for digging this up. I worked on this feature and the
>> intent
>> >>> > > was to give users good default behavior if they didn't include any
>> >>> > > logging configuration on the classpath.
>> >>> > >
>> >>> > > The problem with assuming that CL tooling is going to fix the job
>> is
>> >>> > > that many people link against spark as a library and run their
>> >>> > > application using their own scripts. In this case the first thing
>> >>> > > people see when they run an application that links against Spark
>> was a
>> >>> > > big ugly logging warning.
>> >>> > >
>> >>> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
>> >>> > > returning null for the appenders seems a little weird. What is
>> the use
>> >>> > > case for using this and not just directly use slf4j-log4j12 like
>> Spark
>> >>> > > itself does?
>> >>> > >
>> >>> > > Did you have a more general fix for this in mind? Or was your
>> plan to
>> >>> > > just revert the existing behavior... We might be able to add a
>> >>> > > configuration option to disable this logging default stuff. Or we
>> >>> > > could just rip it out - but I'd like to avoid that if possible.
>> >>> > >
>> >>> > > - Patrick
>> >>> > >
>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
>> >>> wrote:
>> >>> > >> We have a few applications that embed Spark, and in 0.8.0 and
>> 0.8.1,
>> >>> we
>> >>> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
>> >>> forces
>> >>> > >> direct use of log4j as the logging backend.
>> >>> > >>
>> >>> > >> The issue is here in the org.apache.spark.Logging trait:
>> >>> > >>
>> >>> > >>
>> >>> >
>> >>>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> >>> > >>
>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
>> appenders
>> >>> to
>> >>> > the
>> >>> > >> ROOT logger:
>> >>> > >>
>> >>> > >>
>> >>> >
>> >>>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> >>> > >>
>> >>> > >> And this causes an infinite loop and an eventual stack overflow.
>> >>> > >>
>> >>> > >> I'm happy to submit a Jira and a patch, but it would be
>> significant
>> >>> > enough
>> >>> > >> reversal of recent changes that it's probably worth discussing
>> before
>> >>> I
>> >>> > >> sink a half hour into it.  My suggestion would be that
>> initialization
>> >>> > (or
>> >>> > >> not) should be left to the user with reasonable default behavior
>> >>> > supplied
>> >>> > >> by the spark commandline tooling and not forced on applications
>> that
>> >>> > >> incorporate Spark.
>> >>> > >>
>> >>> > >> Thoughts/opinions?
>> >>> > >>
>> >>> > >> -- Paul
>> >>> > >> --
>> >>> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >>> >
>> >>>
>>
>
>

Re: 0.9.0 forces log4j usage

Posted by Koert Kuipers <ko...@tresata.com>.
the issue is that slf4j uses static binding. you can put only one slf4j
backend on the classpath, and that's what it uses. more than one is not
allowed.

so you either keep the slf4j-log4j12 dependency for spark, and then you
took away people's choice of slf4j backend which is considered bad form for
a library, or you do not include it and then people will always get the big
fat ugly warning and slf4j logging will not flow to log4j.

including log4j itself is not necessary a problem i think?


On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <pw...@gmail.com> wrote:

> This also seems relevant - but not my area of expertise (whether this
> is a valid way to check this).
>
>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
>
> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > Hey Guys,
> >
> > Thanks for explainning. Ya this is a problem - we didn't really know
> > that people are using other slf4j backends, slf4j is in there for
> > historical reasons but I think we may assume in a few places that
> > log4j is being used and we should minimize those.
> >
> > We should patch this and get a fix into 0.9.1. So some solutions I see
> are:
> >
> > (a) Add SparkConf option to disable this. I'm fine with this one.
> >
> > (b) Ask slf4j which backend is active and only try to enforce this
> > default if we know slf4j is using log4j. Do either of you know if this
> > is possible? Not sure if slf4j exposes this.
> >
> > (c) Just remove this default stuff. We'd rather not do this. The goal
> > of this thing is to provide good usability for people who have linked
> > against Spark and haven't done anything to configure logging. For
> > beginners we try to minimize the assumptions about what else they know
> > about, and I've found log4j configuration is a huge mental barrier for
> > people who are getting started.
> >
> > Paul if you submit a patch doing (a) we can merge it in. If you have
> > any idea if (b) is possible I prefer that one, but it may not be
> > possible or might be brittle.
> >
> > - Patrick
> >
> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com> wrote:
> >> Totally agree with Paul: a library should not pick the slf4j backend. It
> >> defeats the purpose of slf4j. That big ugly warning is there to alert
> >> people that its their responsibility to pick the back end...
> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
> >>
> >>> Hi, Patrick --
> >>>
> >>> From slf4j, you can either backend it into log4j (which is the way that
> >>> Spark is shipped) or you can route log4j through slf4j and then on to a
> >>> different backend (e.g., logback).  We're doing the latter and
> manipulating
> >>> the dependencies in the build because that's the way the enclosing
> >>> application is set up.
> >>>
> >>> The issue with the current situation is that there's no way for an end
> user
> >>> to choose to *not* use the log4j backend.  (My short-term solution was
> to
> >>> use the Maven shade plugin to swap in a version of the Logging trait
> with
> >>> the body of that method commented out.)  In addition to the situation
> with
> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you might
> >>> also run afoul of someone who intentionally configured log4j with an
> empty
> >>> set of appenders at the time that Spark is initializing.
> >>>
> >>> I'd be happy with any implementation that lets me choose my logging
> >>> backend: override default behavior via system property, plug-in
> >>> architecture, etc.  I do think it's reasonable to expect someone
> digesting
> >>> a substantial JDK-based system like Spark to understand how to
> initialize
> >>> logging -- surely they're using logging of some kind elsewhere in their
> >>> application -- but if you want the default behavior there as a
> courtesy, it
> >>> might be worth putting an INFO (versus a the glaring log4j WARN)
> message on
> >>> the output that says something like "Initialized default logging via
> Log4J;
> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this
> behavior." so
> >>> that it's both convenient and explicit.
> >>>
> >>> Cheers.
> >>> -- Paul
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >>>
> >>>
> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
> >>> wrote:
> >>>
> >>> > A config option e.g. could just be to add:
> >>> >
> >>> > spark.logging.loadDefaultLogger (default true)
> >>> > If set to true, Spark will try to initialize a log4j logger if none
> is
> >>> > detected. Otherwise Spark will not modify logging behavior.
> >>> >
> >>> > Then users could just set this to false if they have a logging set-up
> >>> > that conflicts with this.
> >>> >
> >>> > Maybe there is a nicer fix...
> >>> >
> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pwendell@gmail.com
> >
> >>> > wrote:
> >>> > > Hey Paul,
> >>> > >
> >>> > > Thanks for digging this up. I worked on this feature and the intent
> >>> > > was to give users good default behavior if they didn't include any
> >>> > > logging configuration on the classpath.
> >>> > >
> >>> > > The problem with assuming that CL tooling is going to fix the job
> is
> >>> > > that many people link against spark as a library and run their
> >>> > > application using their own scripts. In this case the first thing
> >>> > > people see when they run an application that links against Spark
> was a
> >>> > > big ugly logging warning.
> >>> > >
> >>> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
> >>> > > returning null for the appenders seems a little weird. What is the
> use
> >>> > > case for using this and not just directly use slf4j-log4j12 like
> Spark
> >>> > > itself does?
> >>> > >
> >>> > > Did you have a more general fix for this in mind? Or was your plan
> to
> >>> > > just revert the existing behavior... We might be able to add a
> >>> > > configuration option to disable this logging default stuff. Or we
> >>> > > could just rip it out - but I'd like to avoid that if possible.
> >>> > >
> >>> > > - Patrick
> >>> > >
> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
> >>> wrote:
> >>> > >> We have a few applications that embed Spark, and in 0.8.0 and
> 0.8.1,
> >>> we
> >>> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
> >>> forces
> >>> > >> direct use of log4j as the logging backend.
> >>> > >>
> >>> > >> The issue is here in the org.apache.spark.Logging trait:
> >>> > >>
> >>> > >>
> >>> >
> >>>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >>> > >>
> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for
> appenders
> >>> to
> >>> > the
> >>> > >> ROOT logger:
> >>> > >>
> >>> > >>
> >>> >
> >>>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >>> > >>
> >>> > >> And this causes an infinite loop and an eventual stack overflow.
> >>> > >>
> >>> > >> I'm happy to submit a Jira and a patch, but it would be
> significant
> >>> > enough
> >>> > >> reversal of recent changes that it's probably worth discussing
> before
> >>> I
> >>> > >> sink a half hour into it.  My suggestion would be that
> initialization
> >>> > (or
> >>> > >> not) should be left to the user with reasonable default behavior
> >>> > supplied
> >>> > >> by the spark commandline tooling and not forced on applications
> that
> >>> > >> incorporate Spark.
> >>> > >>
> >>> > >> Thoughts/opinions?
> >>> > >>
> >>> > >> -- Paul
> >>> > >> --
> >>> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >>> >
> >>>
>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
This also seems relevant - but not my area of expertise (whether this
is a valid way to check this).

http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to

On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <pw...@gmail.com> wrote:
> Hey Guys,
>
> Thanks for explainning. Ya this is a problem - we didn't really know
> that people are using other slf4j backends, slf4j is in there for
> historical reasons but I think we may assume in a few places that
> log4j is being used and we should minimize those.
>
> We should patch this and get a fix into 0.9.1. So some solutions I see are:
>
> (a) Add SparkConf option to disable this. I'm fine with this one.
>
> (b) Ask slf4j which backend is active and only try to enforce this
> default if we know slf4j is using log4j. Do either of you know if this
> is possible? Not sure if slf4j exposes this.
>
> (c) Just remove this default stuff. We'd rather not do this. The goal
> of this thing is to provide good usability for people who have linked
> against Spark and haven't done anything to configure logging. For
> beginners we try to minimize the assumptions about what else they know
> about, and I've found log4j configuration is a huge mental barrier for
> people who are getting started.
>
> Paul if you submit a patch doing (a) we can merge it in. If you have
> any idea if (b) is possible I prefer that one, but it may not be
> possible or might be brittle.
>
> - Patrick
>
> On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com> wrote:
>> Totally agree with Paul: a library should not pick the slf4j backend. It
>> defeats the purpose of slf4j. That big ugly warning is there to alert
>> people that its their responsibility to pick the back end...
>> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>>
>>> Hi, Patrick --
>>>
>>> From slf4j, you can either backend it into log4j (which is the way that
>>> Spark is shipped) or you can route log4j through slf4j and then on to a
>>> different backend (e.g., logback).  We're doing the latter and manipulating
>>> the dependencies in the build because that's the way the enclosing
>>> application is set up.
>>>
>>> The issue with the current situation is that there's no way for an end user
>>> to choose to *not* use the log4j backend.  (My short-term solution was to
>>> use the Maven shade plugin to swap in a version of the Logging trait with
>>> the body of that method commented out.)  In addition to the situation with
>>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you might
>>> also run afoul of someone who intentionally configured log4j with an empty
>>> set of appenders at the time that Spark is initializing.
>>>
>>> I'd be happy with any implementation that lets me choose my logging
>>> backend: override default behavior via system property, plug-in
>>> architecture, etc.  I do think it's reasonable to expect someone digesting
>>> a substantial JDK-based system like Spark to understand how to initialize
>>> logging -- surely they're using logging of some kind elsewhere in their
>>> application -- but if you want the default behavior there as a courtesy, it
>>> might be worth putting an INFO (versus a the glaring log4j WARN) message on
>>> the output that says something like "Initialized default logging via Log4J;
>>> pass -Dspark.logging.loadDefaultLogger=false to disable this behavior." so
>>> that it's both convenient and explicit.
>>>
>>> Cheers.
>>> -- Paul
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>>
>>>
>>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>>
>>> > A config option e.g. could just be to add:
>>> >
>>> > spark.logging.loadDefaultLogger (default true)
>>> > If set to true, Spark will try to initialize a log4j logger if none is
>>> > detected. Otherwise Spark will not modify logging behavior.
>>> >
>>> > Then users could just set this to false if they have a logging set-up
>>> > that conflicts with this.
>>> >
>>> > Maybe there is a nicer fix...
>>> >
>>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pw...@gmail.com>
>>> > wrote:
>>> > > Hey Paul,
>>> > >
>>> > > Thanks for digging this up. I worked on this feature and the intent
>>> > > was to give users good default behavior if they didn't include any
>>> > > logging configuration on the classpath.
>>> > >
>>> > > The problem with assuming that CL tooling is going to fix the job is
>>> > > that many people link against spark as a library and run their
>>> > > application using their own scripts. In this case the first thing
>>> > > people see when they run an application that links against Spark was a
>>> > > big ugly logging warning.
>>> > >
>>> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
>>> > > returning null for the appenders seems a little weird. What is the use
>>> > > case for using this and not just directly use slf4j-log4j12 like Spark
>>> > > itself does?
>>> > >
>>> > > Did you have a more general fix for this in mind? Or was your plan to
>>> > > just revert the existing behavior... We might be able to add a
>>> > > configuration option to disable this logging default stuff. Or we
>>> > > could just rip it out - but I'd like to avoid that if possible.
>>> > >
>>> > > - Patrick
>>> > >
>>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
>>> wrote:
>>> > >> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1,
>>> we
>>> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
>>> forces
>>> > >> direct use of log4j as the logging backend.
>>> > >>
>>> > >> The issue is here in the org.apache.spark.Logging trait:
>>> > >>
>>> > >>
>>> >
>>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>>> > >>
>>> > >> log4j-over-slf4j *always* returns an empty enumeration for appenders
>>> to
>>> > the
>>> > >> ROOT logger:
>>> > >>
>>> > >>
>>> >
>>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>>> > >>
>>> > >> And this causes an infinite loop and an eventual stack overflow.
>>> > >>
>>> > >> I'm happy to submit a Jira and a patch, but it would be significant
>>> > enough
>>> > >> reversal of recent changes that it's probably worth discussing before
>>> I
>>> > >> sink a half hour into it.  My suggestion would be that initialization
>>> > (or
>>> > >> not) should be left to the user with reasonable default behavior
>>> > supplied
>>> > >> by the spark commandline tooling and not forced on applications that
>>> > >> incorporate Spark.
>>> > >>
>>> > >> Thoughts/opinions?
>>> > >>
>>> > >> -- Paul
>>> > >> --
>>> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>> >
>>>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Guys,

Thanks for explainning. Ya this is a problem - we didn't really know
that people are using other slf4j backends, slf4j is in there for
historical reasons but I think we may assume in a few places that
log4j is being used and we should minimize those.

We should patch this and get a fix into 0.9.1. So some solutions I see are:

(a) Add SparkConf option to disable this. I'm fine with this one.

(b) Ask slf4j which backend is active and only try to enforce this
default if we know slf4j is using log4j. Do either of you know if this
is possible? Not sure if slf4j exposes this.

(c) Just remove this default stuff. We'd rather not do this. The goal
of this thing is to provide good usability for people who have linked
against Spark and haven't done anything to configure logging. For
beginners we try to minimize the assumptions about what else they know
about, and I've found log4j configuration is a huge mental barrier for
people who are getting started.

Paul if you submit a patch doing (a) we can merge it in. If you have
any idea if (b) is possible I prefer that one, but it may not be
possible or might be brittle.

- Patrick

On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <ko...@tresata.com> wrote:
> Totally agree with Paul: a library should not pick the slf4j backend. It
> defeats the purpose of slf4j. That big ugly warning is there to alert
> people that its their responsibility to pick the back end...
> On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:
>
>> Hi, Patrick --
>>
>> From slf4j, you can either backend it into log4j (which is the way that
>> Spark is shipped) or you can route log4j through slf4j and then on to a
>> different backend (e.g., logback).  We're doing the latter and manipulating
>> the dependencies in the build because that's the way the enclosing
>> application is set up.
>>
>> The issue with the current situation is that there's no way for an end user
>> to choose to *not* use the log4j backend.  (My short-term solution was to
>> use the Maven shade plugin to swap in a version of the Logging trait with
>> the body of that method commented out.)  In addition to the situation with
>> log4j-over-slf4j and the empty enumeration of ROOT appenders, you might
>> also run afoul of someone who intentionally configured log4j with an empty
>> set of appenders at the time that Spark is initializing.
>>
>> I'd be happy with any implementation that lets me choose my logging
>> backend: override default behavior via system property, plug-in
>> architecture, etc.  I do think it's reasonable to expect someone digesting
>> a substantial JDK-based system like Spark to understand how to initialize
>> logging -- surely they're using logging of some kind elsewhere in their
>> application -- but if you want the default behavior there as a courtesy, it
>> might be worth putting an INFO (versus a the glaring log4j WARN) message on
>> the output that says something like "Initialized default logging via Log4J;
>> pass -Dspark.logging.loadDefaultLogger=false to disable this behavior." so
>> that it's both convenient and explicit.
>>
>> Cheers.
>> -- Paul
>>
>>
>>
>>
>>
>>
>> --
>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>>
>>
>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>> > A config option e.g. could just be to add:
>> >
>> > spark.logging.loadDefaultLogger (default true)
>> > If set to true, Spark will try to initialize a log4j logger if none is
>> > detected. Otherwise Spark will not modify logging behavior.
>> >
>> > Then users could just set this to false if they have a logging set-up
>> > that conflicts with this.
>> >
>> > Maybe there is a nicer fix...
>> >
>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pw...@gmail.com>
>> > wrote:
>> > > Hey Paul,
>> > >
>> > > Thanks for digging this up. I worked on this feature and the intent
>> > > was to give users good default behavior if they didn't include any
>> > > logging configuration on the classpath.
>> > >
>> > > The problem with assuming that CL tooling is going to fix the job is
>> > > that many people link against spark as a library and run their
>> > > application using their own scripts. In this case the first thing
>> > > people see when they run an application that links against Spark was a
>> > > big ugly logging warning.
>> > >
>> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
>> > > returning null for the appenders seems a little weird. What is the use
>> > > case for using this and not just directly use slf4j-log4j12 like Spark
>> > > itself does?
>> > >
>> > > Did you have a more general fix for this in mind? Or was your plan to
>> > > just revert the existing behavior... We might be able to add a
>> > > configuration option to disable this logging default stuff. Or we
>> > > could just rip it out - but I'd like to avoid that if possible.
>> > >
>> > > - Patrick
>> > >
>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
>> wrote:
>> > >> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1,
>> we
>> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
>> forces
>> > >> direct use of log4j as the logging backend.
>> > >>
>> > >> The issue is here in the org.apache.spark.Logging trait:
>> > >>
>> > >>
>> >
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>> > >>
>> > >> log4j-over-slf4j *always* returns an empty enumeration for appenders
>> to
>> > the
>> > >> ROOT logger:
>> > >>
>> > >>
>> >
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>> > >>
>> > >> And this causes an infinite loop and an eventual stack overflow.
>> > >>
>> > >> I'm happy to submit a Jira and a patch, but it would be significant
>> > enough
>> > >> reversal of recent changes that it's probably worth discussing before
>> I
>> > >> sink a half hour into it.  My suggestion would be that initialization
>> > (or
>> > >> not) should be left to the user with reasonable default behavior
>> > supplied
>> > >> by the spark commandline tooling and not forced on applications that
>> > >> incorporate Spark.
>> > >>
>> > >> Thoughts/opinions?
>> > >>
>> > >> -- Paul
>> > >> --
>> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >
>>

Re: 0.9.0 forces log4j usage

Posted by Koert Kuipers <ko...@tresata.com>.
Totally agree with Paul: a library should not pick the slf4j backend. It
defeats the purpose of slf4j. That big ugly warning is there to alert
people that its their responsibility to pick the back end...
On Feb 7, 2014 3:55 AM, "Paul Brown" <pr...@mult.ifario.us> wrote:

> Hi, Patrick --
>
> From slf4j, you can either backend it into log4j (which is the way that
> Spark is shipped) or you can route log4j through slf4j and then on to a
> different backend (e.g., logback).  We're doing the latter and manipulating
> the dependencies in the build because that's the way the enclosing
> application is set up.
>
> The issue with the current situation is that there's no way for an end user
> to choose to *not* use the log4j backend.  (My short-term solution was to
> use the Maven shade plugin to swap in a version of the Logging trait with
> the body of that method commented out.)  In addition to the situation with
> log4j-over-slf4j and the empty enumeration of ROOT appenders, you might
> also run afoul of someone who intentionally configured log4j with an empty
> set of appenders at the time that Spark is initializing.
>
> I'd be happy with any implementation that lets me choose my logging
> backend: override default behavior via system property, plug-in
> architecture, etc.  I do think it's reasonable to expect someone digesting
> a substantial JDK-based system like Spark to understand how to initialize
> logging -- surely they're using logging of some kind elsewhere in their
> application -- but if you want the default behavior there as a courtesy, it
> might be worth putting an INFO (versus a the glaring log4j WARN) message on
> the output that says something like "Initialized default logging via Log4J;
> pass -Dspark.logging.loadDefaultLogger=false to disable this behavior." so
> that it's both convenient and explicit.
>
> Cheers.
> -- Paul
>
>
>
>
>
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
> > A config option e.g. could just be to add:
> >
> > spark.logging.loadDefaultLogger (default true)
> > If set to true, Spark will try to initialize a log4j logger if none is
> > detected. Otherwise Spark will not modify logging behavior.
> >
> > Then users could just set this to false if they have a logging set-up
> > that conflicts with this.
> >
> > Maybe there is a nicer fix...
> >
> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pw...@gmail.com>
> > wrote:
> > > Hey Paul,
> > >
> > > Thanks for digging this up. I worked on this feature and the intent
> > > was to give users good default behavior if they didn't include any
> > > logging configuration on the classpath.
> > >
> > > The problem with assuming that CL tooling is going to fix the job is
> > > that many people link against spark as a library and run their
> > > application using their own scripts. In this case the first thing
> > > people see when they run an application that links against Spark was a
> > > big ugly logging warning.
> > >
> > > I'm not super familiar with log4j-over-slf4j, but this behavior of
> > > returning null for the appenders seems a little weird. What is the use
> > > case for using this and not just directly use slf4j-log4j12 like Spark
> > > itself does?
> > >
> > > Did you have a more general fix for this in mind? Or was your plan to
> > > just revert the existing behavior... We might be able to add a
> > > configuration option to disable this logging default stuff. Or we
> > > could just rip it out - but I'd like to avoid that if possible.
> > >
> > > - Patrick
> > >
> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us>
> wrote:
> > >> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1,
> we
> > >> were able to use slf4j, but 0.9.0 broke that and unintentionally
> forces
> > >> direct use of log4j as the logging backend.
> > >>
> > >> The issue is here in the org.apache.spark.Logging trait:
> > >>
> > >>
> >
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> > >>
> > >> log4j-over-slf4j *always* returns an empty enumeration for appenders
> to
> > the
> > >> ROOT logger:
> > >>
> > >>
> >
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> > >>
> > >> And this causes an infinite loop and an eventual stack overflow.
> > >>
> > >> I'm happy to submit a Jira and a patch, but it would be significant
> > enough
> > >> reversal of recent changes that it's probably worth discussing before
> I
> > >> sink a half hour into it.  My suggestion would be that initialization
> > (or
> > >> not) should be left to the user with reasonable default behavior
> > supplied
> > >> by the spark commandline tooling and not forced on applications that
> > >> incorporate Spark.
> > >>
> > >> Thoughts/opinions?
> > >>
> > >> -- Paul
> > >> --
> > >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
>

Re: 0.9.0 forces log4j usage

Posted by Paul Brown <pr...@mult.ifario.us>.
Hi, Patrick --

>From slf4j, you can either backend it into log4j (which is the way that
Spark is shipped) or you can route log4j through slf4j and then on to a
different backend (e.g., logback).  We're doing the latter and manipulating
the dependencies in the build because that's the way the enclosing
application is set up.

The issue with the current situation is that there's no way for an end user
to choose to *not* use the log4j backend.  (My short-term solution was to
use the Maven shade plugin to swap in a version of the Logging trait with
the body of that method commented out.)  In addition to the situation with
log4j-over-slf4j and the empty enumeration of ROOT appenders, you might
also run afoul of someone who intentionally configured log4j with an empty
set of appenders at the time that Spark is initializing.

I'd be happy with any implementation that lets me choose my logging
backend: override default behavior via system property, plug-in
architecture, etc.  I do think it's reasonable to expect someone digesting
a substantial JDK-based system like Spark to understand how to initialize
logging — surely they're using logging of some kind elsewhere in their
application — but if you want the default behavior there as a courtesy, it
might be worth putting an INFO (versus a the glaring log4j WARN) message on
the output that says something like "Initialized default logging via Log4J;
pass -Dspark.logging.loadDefaultLogger=false to disable this behavior." so
that it's both convenient and explicit.

Cheers.
-- Paul






—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell <pw...@gmail.com> wrote:

> A config option e.g. could just be to add:
>
> spark.logging.loadDefaultLogger (default true)
> If set to true, Spark will try to initialize a log4j logger if none is
> detected. Otherwise Spark will not modify logging behavior.
>
> Then users could just set this to false if they have a logging set-up
> that conflicts with this.
>
> Maybe there is a nicer fix...
>
> On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > Hey Paul,
> >
> > Thanks for digging this up. I worked on this feature and the intent
> > was to give users good default behavior if they didn't include any
> > logging configuration on the classpath.
> >
> > The problem with assuming that CL tooling is going to fix the job is
> > that many people link against spark as a library and run their
> > application using their own scripts. In this case the first thing
> > people see when they run an application that links against Spark was a
> > big ugly logging warning.
> >
> > I'm not super familiar with log4j-over-slf4j, but this behavior of
> > returning null for the appenders seems a little weird. What is the use
> > case for using this and not just directly use slf4j-log4j12 like Spark
> > itself does?
> >
> > Did you have a more general fix for this in mind? Or was your plan to
> > just revert the existing behavior... We might be able to add a
> > configuration option to disable this logging default stuff. Or we
> > could just rip it out - but I'd like to avoid that if possible.
> >
> > - Patrick
> >
> > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> >> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we
> >> were able to use slf4j, but 0.9.0 broke that and unintentionally forces
> >> direct use of log4j as the logging backend.
> >>
> >> The issue is here in the org.apache.spark.Logging trait:
> >>
> >>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >>
> >> log4j-over-slf4j *always* returns an empty enumeration for appenders to
> the
> >> ROOT logger:
> >>
> >>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >>
> >> And this causes an infinite loop and an eventual stack overflow.
> >>
> >> I'm happy to submit a Jira and a patch, but it would be significant
> enough
> >> reversal of recent changes that it's probably worth discussing before I
> >> sink a half hour into it.  My suggestion would be that initialization
> (or
> >> not) should be left to the user with reasonable default behavior
> supplied
> >> by the spark commandline tooling and not forced on applications that
> >> incorporate Spark.
> >>
> >> Thoughts/opinions?
> >>
> >> -- Paul
> >> --
> >> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
A config option e.g. could just be to add:

spark.logging.loadDefaultLogger (default true)
If set to true, Spark will try to initialize a log4j logger if none is
detected. Otherwise Spark will not modify logging behavior.

Then users could just set this to false if they have a logging set-up
that conflicts with this.

Maybe there is a nicer fix...

On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell <pw...@gmail.com> wrote:
> Hey Paul,
>
> Thanks for digging this up. I worked on this feature and the intent
> was to give users good default behavior if they didn't include any
> logging configuration on the classpath.
>
> The problem with assuming that CL tooling is going to fix the job is
> that many people link against spark as a library and run their
> application using their own scripts. In this case the first thing
> people see when they run an application that links against Spark was a
> big ugly logging warning.
>
> I'm not super familiar with log4j-over-slf4j, but this behavior of
> returning null for the appenders seems a little weird. What is the use
> case for using this and not just directly use slf4j-log4j12 like Spark
> itself does?
>
> Did you have a more general fix for this in mind? Or was your plan to
> just revert the existing behavior... We might be able to add a
> configuration option to disable this logging default stuff. Or we
> could just rip it out - but I'd like to avoid that if possible.
>
> - Patrick
>
> On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us> wrote:
>> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we
>> were able to use slf4j, but 0.9.0 broke that and unintentionally forces
>> direct use of log4j as the logging backend.
>>
>> The issue is here in the org.apache.spark.Logging trait:
>>
>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>>
>> log4j-over-slf4j *always* returns an empty enumeration for appenders to the
>> ROOT logger:
>>
>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>>
>> And this causes an infinite loop and an eventual stack overflow.
>>
>> I'm happy to submit a Jira and a patch, but it would be significant enough
>> reversal of recent changes that it's probably worth discussing before I
>> sink a half hour into it.  My suggestion would be that initialization (or
>> not) should be left to the user with reasonable default behavior supplied
>> by the spark commandline tooling and not forced on applications that
>> incorporate Spark.
>>
>> Thoughts/opinions?
>>
>> -- Paul
>> --
>> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/

Re: 0.9.0 forces log4j usage

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Paul,

Thanks for digging this up. I worked on this feature and the intent
was to give users good default behavior if they didn't include any
logging configuration on the classpath.

The problem with assuming that CL tooling is going to fix the job is
that many people link against spark as a library and run their
application using their own scripts. In this case the first thing
people see when they run an application that links against Spark was a
big ugly logging warning.

I'm not super familiar with log4j-over-slf4j, but this behavior of
returning null for the appenders seems a little weird. What is the use
case for using this and not just directly use slf4j-log4j12 like Spark
itself does?

Did you have a more general fix for this in mind? Or was your plan to
just revert the existing behavior... We might be able to add a
configuration option to disable this logging default stuff. Or we
could just rip it out - but I'd like to avoid that if possible.

- Patrick

On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown <pr...@mult.ifario.us> wrote:
> We have a few applications that embed Spark, and in 0.8.0 and 0.8.1, we
> were able to use slf4j, but 0.9.0 broke that and unintentionally forces
> direct use of log4j as the logging backend.
>
> The issue is here in the org.apache.spark.Logging trait:
>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
>
> log4j-over-slf4j *always* returns an empty enumeration for appenders to the
> ROOT logger:
>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
>
> And this causes an infinite loop and an eventual stack overflow.
>
> I'm happy to submit a Jira and a patch, but it would be significant enough
> reversal of recent changes that it's probably worth discussing before I
> sink a half hour into it.  My suggestion would be that initialization (or
> not) should be left to the user with reasonable default behavior supplied
> by the spark commandline tooling and not forced on applications that
> incorporate Spark.
>
> Thoughts/opinions?
>
> -- Paul
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/