You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@felix.apache.org by Niclas Hedhman <ni...@hedhman.org> on 2008/01/29 05:02:44 UTC
Deadlock in Felix
Hi,
I have been away for a while and don't really know if this has been looked at.
The story can be found at [1], but goes roughly like this;
The guy is starting a bunch of bundles on Felix, incl Pax Logging and Felix
Configuration Admin.
The Felix internals will find the LogService implementation and use it for its
own logging needs.
At the same time, the Configuration Admin gives Pax Logging a new
Configuration, which triggers loading of classes.
Now, the StartLevel thread locks a o.a.f.module.ModuleFactoryImpl and blocks
on the o.a.log4j.spi.RootLogger. The Configuration thread will do the
opposite...
The immediate question; Any good ideas on how to get around this?
The only one I can think of is to start a new thread and delay the
Configuration update() X milliseconds, but that sounds fairly britle.
The larger question is more like; Isn't it necessary for the framework to be
lock-free when calling out to code outside its control??
(that was something I concluded with my own similar framework back in 1999)
[1] http://issues.ops4j.org/jira/browse/PAXLOGGING-20
Cheers
--
Niclas Hedhman, Software Developer
I live here; http://tinyurl.com/2qq9er
I work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug
Re: Deadlock in Felix
Posted by Karl Pauls <ka...@gmail.com>.
> On Monday 11 February 2008 16:17, Karl Pauls wrote:
> > However, do you (and others) think
> > it is acceptable to deliver log events asynchronously or at least
> > later then the actual point they occur?
>
> If you are only worried about Pax Logging case, then you need to do nothing.
> The locking in Log4J is severe enough for us to consider it "our problem" and
> I will probably later in the week do the "buffering" for you, which will be
> in proper order.
>
> I am not sure if there are any other Log Services out there, which will have
> similar problems.
>
> The broader concern probably still remains; Should the framework have any
> locks in place when calling outside itself, whether being log message, events
> or callbacks? The spec is pretty silent about it, and I have no clue how
> Felix measure up against the others on this point. But I think it is a valid
> question to ask both here and to the spec guys.
Well, we only hold locks in the case of log messages. For the other calls we
should be lock free .
regards,
Karl
> Cheers
> --
> Niclas Hedhman, Software Developer
>
> I live here; http://tinyurl.com/2qq9er
> I work here; http://tinyurl.com/2ymelc
> I relax here; http://tinyurl.com/2cgsug
>
--
Karl Pauls
karlpauls@gmail.com
Re: Deadlock in Felix
Posted by "Richard S. Hall" <he...@ungoverned.org>.
Niclas Hedhman wrote:
> The broader concern probably still remains; Should the framework have any
> locks in place when calling outside itself, whether being log message, events
> or callbacks? The spec is pretty silent about it, and I have no clue how
> Felix measure up against the others on this point. But I think it is a valid
> question to ask both here and to the spec guys.
No, it shouldn't. And I do think the spec mentions this somewhere.
However, it might not always be possible to achieve. Still, we should
review each callback point and verify its correctness.
-> richard
Re: Deadlock in Felix
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Monday 11 February 2008 16:17, Karl Pauls wrote:
> However, do you (and others) think
> it is acceptable to deliver log events asynchronously or at least
> later then the actual point they occur?
If you are only worried about Pax Logging case, then you need to do nothing.
The locking in Log4J is severe enough for us to consider it "our problem" and
I will probably later in the week do the "buffering" for you, which will be
in proper order.
I am not sure if there are any other Log Services out there, which will have
similar problems.
The broader concern probably still remains; Should the framework have any
locks in place when calling outside itself, whether being log message, events
or callbacks? The spec is pretty silent about it, and I have no clue how
Felix measure up against the others on this point. But I think it is a valid
question to ask both here and to the spec guys.
Cheers
--
Niclas Hedhman, Software Developer
I live here; http://tinyurl.com/2qq9er
I work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug
Re: Deadlock in Felix
Posted by Karl Pauls <ka...@gmail.com>.
On Feb 11, 2008 9:29 AM, Felix Meschberger <fm...@gmail.com> wrote:
> Hi,
>
> Am Montag, den 11.02.2008, 09:17 +0100 schrieb Karl Pauls:
> > > Later on, I think I'll change Pax Logging to move all incoming requests (log
> > > and config) to its own thread to prevent the whole issue all together.
> >
> > Good - so we don't have to hurry. However, do you (and others) think
> > it is acceptable to deliver log events asynchronously or at least
> > later then the actual point they occur? We have the eventdispatching
> > thread around anyways and I think it should be doable to use it to
> > deliver events. But I will only spend time looking into this if the
> > general opinion is that it is worthwhile ...
>
> I am not sure, whether this would be a good solution, because you would
> then have log messages from the application and log messages from the
> framework being out of think with regard to actuall order of processing.
> This could cause some headaches when trying to interpret the log files.
>
> In addition, I tend to think, that the way Log4J locks itself and by
> loading classes inside that lock potentially the complete system is kind
> of problematic. Not sure how NLog4J and Logback act in this respect.
>
> On the Felix side, we might come up with a solution where we may
> actively delay logging when inside some synchronized/locked code.
> Probably the only really problematic location for logging is inside the
> ClassLoader support. Could we gather messages there to be logged after
> releasing any locks ?
Sure, we could do that. Let's see whether there are other opinions
else I'll give it a shot.
regards,
Karl
> Regards
> Felix
>
>
--
Karl Pauls
karlpauls@gmail.com
Re: Deadlock in Felix
Posted by Felix Meschberger <fm...@gmail.com>.
Hi,
Am Montag, den 11.02.2008, 09:17 +0100 schrieb Karl Pauls:
> > Later on, I think I'll change Pax Logging to move all incoming requests (log
> > and config) to its own thread to prevent the whole issue all together.
>
> Good - so we don't have to hurry. However, do you (and others) think
> it is acceptable to deliver log events asynchronously or at least
> later then the actual point they occur? We have the eventdispatching
> thread around anyways and I think it should be doable to use it to
> deliver events. But I will only spend time looking into this if the
> general opinion is that it is worthwhile ...
I am not sure, whether this would be a good solution, because you would
then have log messages from the application and log messages from the
framework being out of think with regard to actuall order of processing.
This could cause some headaches when trying to interpret the log files.
In addition, I tend to think, that the way Log4J locks itself and by
loading classes inside that lock potentially the complete system is kind
of problematic. Not sure how NLog4J and Logback act in this respect.
On the Felix side, we might come up with a solution where we may
actively delay logging when inside some synchronized/locked code.
Probably the only really problematic location for logging is inside the
ClassLoader support. Could we gather messages there to be logged after
releasing any locks ?
Regards
Felix
Re: Deadlock in Felix
Posted by Karl Pauls <ka...@gmail.com>.
> On Saturday 02 February 2008 08:20, Karl Pauls wrote:
>
> > So would it be acceptable to deliver log calls asynchronously? If so I
> > can probably make that change quickly ...
>
> For now, the immediate problem solved by changing framework loglevel to INFO.
Ok.
> Later on, I think I'll change Pax Logging to move all incoming requests (log
> and config) to its own thread to prevent the whole issue all together.
Good - so we don't have to hurry. However, do you (and others) think
it is acceptable to deliver log events asynchronously or at least
later then the actual point they occur? We have the eventdispatching
thread around anyways and I think it should be doable to use it to
deliver events. But I will only spend time looking into this if the
general opinion is that it is worthwhile ...
regards,
Karl
> Cheers
> --
> Niclas Hedhman, Software Developer
>
> I live here; http://tinyurl.com/2qq9er
> I work here; http://tinyurl.com/2ymelc
> I relax here; http://tinyurl.com/2cgsug
>
--
Karl Pauls
karlpauls@gmail.com
Re: Deadlock in Felix
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Saturday 02 February 2008 08:20, Karl Pauls wrote:
> So would it be acceptable to deliver log calls asynchronously? If so I
> can probably make that change quickly ...
For now, the immediate problem solved by changing framework loglevel to INFO.
Later on, I think I'll change Pax Logging to move all incoming requests (log
and config) to its own thread to prevent the whole issue all together.
Cheers
--
Niclas Hedhman, Software Developer
I live here; http://tinyurl.com/2qq9er
I work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug
Re: Deadlock in Felix
Posted by Karl Pauls <ka...@gmail.com>.
On Jan 30, 2008 11:31 AM, Stuart McCulloch <st...@jayway.net> wrote:
> On 30/01/2008, Felix Meschberger <fm...@gmail.com> wrote:
> >
> > Hi Niclas,
> >
> > The problem is (a) the generous synchronisation of Log4J and (b) locking
> > used by stuff used for class loading. In our projects we regularly face
> > issues between Log4J and our ClassLoader implementations synchronizing
> > on ClassLoader.loadClass().
> >
> > The deadlock occurrs because both parties - framework and Log4J - lock
> > "big" parts of their code and call to code outside of their scope while
> > being locked: the framework calls the LogService outside of the
> > framework and Log4J calls into class loading outside of Log4J.
> >
> > On solution, I could imagine, is not using Log4J, which may or may not
> > be an option. Maybe SLF4J or Logback could be an option here ? [ In
> > Sling we actually use Logback as a logging backend for our LogService
> > implementation ]
> >
> > Another solution would be to enhance the framework Logger to diable the
> > use of a LogService. E.g. by defining a framework property, which when
> > set, causes the Logger to never use the LogService.
> >
> > Both solutions don't sound right ...
>
>
> other possible solutions:
>
> a) have a separate thread make the LogService call (fed from a queue)
> although you'd have to be careful not to introduce other deadlocks
So would it be acceptable to deliver log calls asynchronously? If so I
can probably make that change quickly ...
regards,
Karl
> b) delay sending log messages from critical sections of the framework
> ie. log to a buffer, then send the messages when it's safe to do so
>
>
> Regards
> > Felix
> >
> > Am Mittwoch, den 30.01.2008, 13:51 +0800 schrieb Niclas Hedhman:
> > > On Tuesday 29 January 2008 16:55, Karl Pauls wrote:
> > > > Could you have them retry using Felix 1.0.3? This might be related to
> > > > some of the concurrency things we fixed.
> > > >
> > > > In case they can not be bothered retrying on Felix 1.0.3 then maybe
> > > > they can provide a minimal config file that only uses publicly
> > > > available bundles and has this issue (then I can look into it).
> > >
> > > I am looking at the code in the trunk, and it appears that the locking
> > that
> > > triggers this is in place.
> > >
> > > As always, it is a bit tricky to setup threading problems. SO, I would
> > like to
> > > run a "head exercise" first.
> > >
> > > 1. The Starter thread "FelixStartLevel" locks the ModuleFactoryImpl
> > instance
> > > in R4SearchPolicyCore.resolve().
> > >
> > > 2. The Configuration Admin thread "Configuration Updater" calls the
> > LogService
> > > with the Configuration instance, which leads to Log4J locks on its own
> > > RootLogger and calls ClassLoader.loadClass() on something found in the
> > > configuration. This leads to trying to acquire the ModuleFactoryImpl
> > lock
> > > either in R4SearchPolicyCore.resolve() or in the provided stack trace
> > > R4SearchPolicyCore.getInUseCandidates() due to a ClassNotFound in the
> > > previous step.
> > >
> > > 3. The "FelixStartLevel" thread reaches
> > > m_logger.log(Logger.LOG_DEBUG, "WIRE: " + wires[wireIdx]);
> > > in R4SearchPolicyCore.createWires() and he log() method will try to
> > acquire
> > > the RootLogger lock.
> > >
> > > 4. DEADLOCK.
> > >
> > >
> > > I agree this is very special to the LogService, since Felix binds to it
> > and
> > > uses it for its internal use, and the responsibility is across two
> > different
> > > systems. Suggestions are welcome.
> > >
> > >
> > > Cheers
> >
> >
>
>
> --
> Cheers, Stuart
>
--
Karl Pauls
karlpauls@gmail.com
Re: Deadlock in Felix
Posted by Stuart McCulloch <st...@jayway.net>.
On 30/01/2008, Felix Meschberger <fm...@gmail.com> wrote:
>
> Hi Niclas,
>
> The problem is (a) the generous synchronisation of Log4J and (b) locking
> used by stuff used for class loading. In our projects we regularly face
> issues between Log4J and our ClassLoader implementations synchronizing
> on ClassLoader.loadClass().
>
> The deadlock occurrs because both parties - framework and Log4J - lock
> "big" parts of their code and call to code outside of their scope while
> being locked: the framework calls the LogService outside of the
> framework and Log4J calls into class loading outside of Log4J.
>
> On solution, I could imagine, is not using Log4J, which may or may not
> be an option. Maybe SLF4J or Logback could be an option here ? [ In
> Sling we actually use Logback as a logging backend for our LogService
> implementation ]
>
> Another solution would be to enhance the framework Logger to diable the
> use of a LogService. E.g. by defining a framework property, which when
> set, causes the Logger to never use the LogService.
>
> Both solutions don't sound right ...
other possible solutions:
a) have a separate thread make the LogService call (fed from a queue)
although you'd have to be careful not to introduce other deadlocks
b) delay sending log messages from critical sections of the framework
ie. log to a buffer, then send the messages when it's safe to do so
Regards
> Felix
>
> Am Mittwoch, den 30.01.2008, 13:51 +0800 schrieb Niclas Hedhman:
> > On Tuesday 29 January 2008 16:55, Karl Pauls wrote:
> > > Could you have them retry using Felix 1.0.3? This might be related to
> > > some of the concurrency things we fixed.
> > >
> > > In case they can not be bothered retrying on Felix 1.0.3 then maybe
> > > they can provide a minimal config file that only uses publicly
> > > available bundles and has this issue (then I can look into it).
> >
> > I am looking at the code in the trunk, and it appears that the locking
> that
> > triggers this is in place.
> >
> > As always, it is a bit tricky to setup threading problems. SO, I would
> like to
> > run a "head exercise" first.
> >
> > 1. The Starter thread "FelixStartLevel" locks the ModuleFactoryImpl
> instance
> > in R4SearchPolicyCore.resolve().
> >
> > 2. The Configuration Admin thread "Configuration Updater" calls the
> LogService
> > with the Configuration instance, which leads to Log4J locks on its own
> > RootLogger and calls ClassLoader.loadClass() on something found in the
> > configuration. This leads to trying to acquire the ModuleFactoryImpl
> lock
> > either in R4SearchPolicyCore.resolve() or in the provided stack trace
> > R4SearchPolicyCore.getInUseCandidates() due to a ClassNotFound in the
> > previous step.
> >
> > 3. The "FelixStartLevel" thread reaches
> > m_logger.log(Logger.LOG_DEBUG, "WIRE: " + wires[wireIdx]);
> > in R4SearchPolicyCore.createWires() and he log() method will try to
> acquire
> > the RootLogger lock.
> >
> > 4. DEADLOCK.
> >
> >
> > I agree this is very special to the LogService, since Felix binds to it
> and
> > uses it for its internal use, and the responsibility is across two
> different
> > systems. Suggestions are welcome.
> >
> >
> > Cheers
>
>
--
Cheers, Stuart
Re: Deadlock in Felix
Posted by Carsten Ziegeler <cz...@apache.org>.
Felix Meschberger wrote:
> Hi Niclas,
>
> The problem is (a) the generous synchronisation of Log4J and (b) locking
> used by stuff used for class loading. In our projects we regularly face
> issues between Log4J and our ClassLoader implementations synchronizing
> on ClassLoader.loadClass().
>
> The deadlock occurrs because both parties - framework and Log4J - lock
> "big" parts of their code and call to code outside of their scope while
> being locked: the framework calls the LogService outside of the
> framework and Log4J calls into class loading outside of Log4J.
>
> On solution, I could imagine, is not using Log4J, which may or may not
> be an option. Maybe SLF4J or Logback could be an option here ? [ In
> Sling we actually use Logback as a logging backend for our LogService
> implementation ]
>
After I read Niclas problems, I think that I had this problem once or twice
in Sling as well - a very rare deadlock during a logging statement. I
never investigated further as a restart solved the problem :) and it
occured only for one day two weeks ago.
So I fear switching to a different logging implementation makes things
better but not optimal :)
Carsten
--
Carsten Ziegeler
cziegeler@apache.org
Re: Deadlock in Felix
Posted by Felix Meschberger <fm...@gmail.com>.
Hi Niclas,
The problem is (a) the generous synchronisation of Log4J and (b) locking
used by stuff used for class loading. In our projects we regularly face
issues between Log4J and our ClassLoader implementations synchronizing
on ClassLoader.loadClass().
The deadlock occurrs because both parties - framework and Log4J - lock
"big" parts of their code and call to code outside of their scope while
being locked: the framework calls the LogService outside of the
framework and Log4J calls into class loading outside of Log4J.
On solution, I could imagine, is not using Log4J, which may or may not
be an option. Maybe SLF4J or Logback could be an option here ? [ In
Sling we actually use Logback as a logging backend for our LogService
implementation ]
Another solution would be to enhance the framework Logger to diable the
use of a LogService. E.g. by defining a framework property, which when
set, causes the Logger to never use the LogService.
Both solutions don't sound right ...
Regards
Felix
Am Mittwoch, den 30.01.2008, 13:51 +0800 schrieb Niclas Hedhman:
> On Tuesday 29 January 2008 16:55, Karl Pauls wrote:
> > Could you have them retry using Felix 1.0.3? This might be related to
> > some of the concurrency things we fixed.
> >
> > In case they can not be bothered retrying on Felix 1.0.3 then maybe
> > they can provide a minimal config file that only uses publicly
> > available bundles and has this issue (then I can look into it).
>
> I am looking at the code in the trunk, and it appears that the locking that
> triggers this is in place.
>
> As always, it is a bit tricky to setup threading problems. SO, I would like to
> run a "head exercise" first.
>
> 1. The Starter thread "FelixStartLevel" locks the ModuleFactoryImpl instance
> in R4SearchPolicyCore.resolve().
>
> 2. The Configuration Admin thread "Configuration Updater" calls the LogService
> with the Configuration instance, which leads to Log4J locks on its own
> RootLogger and calls ClassLoader.loadClass() on something found in the
> configuration. This leads to trying to acquire the ModuleFactoryImpl lock
> either in R4SearchPolicyCore.resolve() or in the provided stack trace
> R4SearchPolicyCore.getInUseCandidates() due to a ClassNotFound in the
> previous step.
>
> 3. The "FelixStartLevel" thread reaches
> m_logger.log(Logger.LOG_DEBUG, "WIRE: " + wires[wireIdx]);
> in R4SearchPolicyCore.createWires() and he log() method will try to acquire
> the RootLogger lock.
>
> 4. DEADLOCK.
>
>
> I agree this is very special to the LogService, since Felix binds to it and
> uses it for its internal use, and the responsibility is across two different
> systems. Suggestions are welcome.
>
>
> Cheers
Re: Deadlock in Felix
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Tuesday 29 January 2008 16:55, Karl Pauls wrote:
> Could you have them retry using Felix 1.0.3? This might be related to
> some of the concurrency things we fixed.
>
> In case they can not be bothered retrying on Felix 1.0.3 then maybe
> they can provide a minimal config file that only uses publicly
> available bundles and has this issue (then I can look into it).
I am looking at the code in the trunk, and it appears that the locking that
triggers this is in place.
As always, it is a bit tricky to setup threading problems. SO, I would like to
run a "head exercise" first.
1. The Starter thread "FelixStartLevel" locks the ModuleFactoryImpl instance
in R4SearchPolicyCore.resolve().
2. The Configuration Admin thread "Configuration Updater" calls the LogService
with the Configuration instance, which leads to Log4J locks on its own
RootLogger and calls ClassLoader.loadClass() on something found in the
configuration. This leads to trying to acquire the ModuleFactoryImpl lock
either in R4SearchPolicyCore.resolve() or in the provided stack trace
R4SearchPolicyCore.getInUseCandidates() due to a ClassNotFound in the
previous step.
3. The "FelixStartLevel" thread reaches
m_logger.log(Logger.LOG_DEBUG, "WIRE: " + wires[wireIdx]);
in R4SearchPolicyCore.createWires() and he log() method will try to acquire
the RootLogger lock.
4. DEADLOCK.
I agree this is very special to the LogService, since Felix binds to it and
uses it for its internal use, and the responsibility is across two different
systems. Suggestions are welcome.
Cheers
--
Niclas Hedhman, Software Developer
I live here; http://tinyurl.com/2qq9er
I work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug
Re: Deadlock in Felix
Posted by Karl Pauls <ka...@gmail.com>.
Could you have them retry using Felix 1.0.3? This might be related to
some of the concurrency things we fixed.
In case they can not be bothered retrying on Felix 1.0.3 then maybe
they can provide a minimal config file that only uses publicly
available bundles and has this issue (then I can look into it).
regards,
Karl
> Hi,
>
> I have been away for a while and don't really know if this has been looked at.
>
> The story can be found at [1], but goes roughly like this;
>
> The guy is starting a bunch of bundles on Felix, incl Pax Logging and Felix
> Configuration Admin.
> The Felix internals will find the LogService implementation and use it for its
> own logging needs.
> At the same time, the Configuration Admin gives Pax Logging a new
> Configuration, which triggers loading of classes.
>
> Now, the StartLevel thread locks a o.a.f.module.ModuleFactoryImpl and blocks
> on the o.a.log4j.spi.RootLogger. The Configuration thread will do the
> opposite...
>
> The immediate question; Any good ideas on how to get around this?
> The only one I can think of is to start a new thread and delay the
> Configuration update() X milliseconds, but that sounds fairly britle.
>
> The larger question is more like; Isn't it necessary for the framework to be
> lock-free when calling out to code outside its control??
> (that was something I concluded with my own similar framework back in 1999)
>
>
> [1] http://issues.ops4j.org/jira/browse/PAXLOGGING-20
>
>
> Cheers
> --
> Niclas Hedhman, Software Developer
>
> I live here; http://tinyurl.com/2qq9er
> I work here; http://tinyurl.com/2ymelc
> I relax here; http://tinyurl.com/2cgsug
>
--
Karl Pauls
karlpauls@gmail.com