You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@karaf.apache.org by Maurice Betzel <m....@gaston-schul.com> on 2020/05/01 08:46:52 UTC

Karaf Cellar Osgi Event Admin timeout

Hi community,

the Karaf cluster i am running with Karaf Cellar uses also the OSGi Event
Broadcasting support (eventadmin) and i have noticed that it sometimes stops
broadcasting without logging any errors. I can simulate the behavior if i
add a break-point on the receiving side of the event and do not release it
for a time if it gets hit.
Then i do get a lot of exceptions in the logs, but the point is that any
subsequent events do not get propagated any more, in production nor in my
test.
In order to restart eventing without a node restart i have to change the 
org.apache.felix.eventadmin.impl.EventAdmin.cfg and save it so the Felix
event-admin is reinitialized.
Because i do not see any exceptions logged in production i was wondering if
anybody else has experienced or even resolved the issue.

My current config:

org.apache.felix.eventadmin.AddTimestamp=true
org.apache.felix.eventadmin.AddSubject=true
org.apache.felix.eventadmin.Timeout=30000

i think it must timeout fast enough so i do not block to many threads.



--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
org.apache.karaf.cellar.node.cfg

# OSGi event handler
handler.org.apache.karaf.cellar.event.ClusterEventHandler = true

Yep!



-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
Missed your the status request:

  │ Node           │ Status │ Event Handler
──┼────────────────┼────────┼─────────────────────────────────────────────────────────
  │ 10.1.7.16:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.1.7.16:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.1.7.16:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.1.7.16:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.1.7.16:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.1.7.16:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.1.7.16:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
  │ 10.2.1.23:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.2.1.23:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.2.1.23:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.2.1.23:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.2.1.23:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.2.1.23:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.2.1.23:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
  │ 10.1.7.61:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.1.7.61:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.1.7.61:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.1.7.61:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.1.7.61:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.1.7.61:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.1.7.61:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
x │ 10.2.1.18:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
x │ 10.2.1.18:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
x │ 10.2.1.18:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
x │ 10.2.1.18:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
x │ 10.2.1.18:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
x │ 10.2.1.18:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
x │ 10.2.1.18:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
  │ 10.1.7.48:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.1.7.48:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.1.7.48:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.1.7.48:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.1.7.48:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.1.7.48:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.1.7.48:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
  │ 10.1.7.59:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.1.7.59:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.1.7.59:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.1.7.59:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.1.7.59:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.1.7.59:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.1.7.59:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler
  │ 10.1.7.62:5701 │ OFF    │
org.apache.karaf.cellar.bundle.BundleEventHandler
  │ 10.1.7.62:5701 │ OFF    │
org.apache.karaf.cellar.config.ConfigurationEventHandler
  │ 10.1.7.62:5701 │ ON     │ org.apache.karaf.cellar.kar.KarEventHandler
  │ 10.1.7.62:5701 │ ON     │
org.apache.karaf.cellar.features.RepositoryEventHandler
  │ 10.1.7.62:5701 │ OFF    │
org.apache.karaf.cellar.features.FeaturesEventHandler
  │ 10.1.7.62:5701 │ ON     │
org.apache.karaf.cellar.dosgi.RemoteServiceCallHandler
  │ 10.1.7.62:5701 │ ON     │
org.apache.karaf.cellar.event.ClusterEventHandler




-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Jean-Baptiste Onofre <jb...@nanthrax.net>.
Hi,

Just to be sure: you are using the Cellar ClusterEventHandler right ?

Can you check the state of the EventHandler service in cellar-event bundle ?

Regards
JB

> Le 1 mai 2020 à 16:56, Maurice Betzel <m....@gaston-schul.com> a écrit :
> 
> I guess async uses sockets and timeouts as well. I must dive into that code
> though to confirm my assumption.
> 
> 
> 
> -----
> if ( you want ) { you can } else { you can't }
> --
> Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html


Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
I guess async uses sockets and timeouts as well. I must dive into that code
though to confirm my assumption.



-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Erwin Hogeweg <er...@me.com>.
Hi Maurice,

I had that happen once when an event handler thread was ‘stuck’ on reading from a socket. That only happend once in a blue moon though, so that was pretty hard to find. 
That was with synchronous event delivery though. I don’t now if that happens with async delivery as well.

Just FYI.


Cheers,

Erwin


> On May 1, 2020, at 10:08, Maurice Betzel <m....@gaston-schul.com> wrote:
> 
> So the questions remains, when does blacklisting happen? Only on resource
> starvation or has someone else discovered something else regarding this
> topic?
> 
> 
> 
> -----
> if ( you want ) { you can } else { you can't }
> --
> Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html


Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
Yeah, the stack-traces where me fooling around. On production i only see the
blacklisting warning on the producer. See post Nr. 3.



-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Tim Ward <ti...@paremus.com>.
Blacklisting happens when an Event Admin implementation determines that an Event Handler is “stuck” or “too slow” <https://docs.osgi.org/specification/osgi.cmpn/7.0.0/service.event.html#d0e47607>. If you put a breakpoint into your event handler then this will almost certainly trigger blacklisting because your human-driven debug stepping will take “too long” and cause Event Admin to think that your handler is broken.

All the best,

Tim

> On 1 May 2020, at 15:08, Maurice Betzel <m....@gaston-schul.com> wrote:
> 
> So the questions remains, when does blacklisting happen? Only on resource
> starvation or has someone else discovered something else regarding this
> topic?
> 
> 
> 
> -----
> if ( you want ) { you can } else { you can't }
> --
> Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html


Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
So the questions remains, when does blacklisting happen? Only on resource
starvation or has someone else discovered something else regarding this
topic?



-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
Boggers its a warning and is comming from Karaf and not from Felix :(, So i
should have seen this always in the logs, strange thing about the Hazelcast
exceptions turning up on the same breakpoint as the post before.



-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
Hold your horses, i did not have org.apache.felix logging on DEBUG 

After a complete reinstall an restart i get the following if i block the
consuming side, no hazelcast exceptions anymore:

2020-05-01T14:58:48,148 | DEBUG | Camel (platform-facade-stream-service)
thread #1 - JmsConsumer[streamFacadeQueue] |
DatabaseChangeNotificationListener | 158 -
eu.abeel.platform.facade.mariadb.stream.persistence - 3.2.0 | ServiceEvent:
org.osgi.service.event.Event
[topic=eu/abeel/platform/facade/mariadb/stream/declaration]
{eu.abeel.platform.proxy.knowledgevalues.stream.exchange.service=eu.abeel.platform.facade.mariadb.api.model.DatabaseChangeNotification@393450b9}
2020-05-01T14:58:48,149 | DEBUG | EventAdminThread #15 |
DatabaseChangeNotificationHandler | 229 -
eu.abeel.platform.proxy.knowledgevalues.stream.exchange.service.outbound -
1.0.0 | Received event on topic
eu/abeel/platform/facade/mariadb/stream/declaration
2020-05-01T15:03:34,913 | WARN  | EventAdminAsyncThread #3 | eventadmin                      
| 3 - org.apache.karaf.services.eventadmin - 4.2.6 | EventAdmin:
Blacklisting ServiceReference [[org.osgi.service.event.EventHandler] |
Bundle(eu.abeel.platform.proxy.knowledgevalues.stream.exchange.service.outbound
[229])] due to timeout!
2020-05-01T15:03:34,914 | WARN  | EventAdminThread #15 | eventadmin                      
| 3 - org.apache.karaf.services.eventadmin - 4.2.6 | EventAdmin:
Blacklisting ServiceReference [[org.osgi.service.event.EventHandler] |
Bundle(eu.abeel.platform.proxy.knowledgevalues.stream.exchange.service.outbound
[229])] due to timeout!
2020-05-01T15:03:34,919 | DEBUG | Camel (platform-facade-stream-service)
thread #1 - JmsConsumer[streamFacadeQueue] | service                         
| 186 - eu.abeel.platform.facade.stream.service - 3.2.0 | Registered XML
file : 6c248c72-64c8-49bd-a908-a5d269f9ca0a.xml
2020-05-01T15:03:34,919 | DEBUG | Camel (platform-facade-stream-service)
thread #1 - JmsConsumer[streamFacadeQueue] | service                         
| 186 - eu.abeel.platform.facade.stream.service - 3.2.0 | Writing Stream XML
file :
/data/CUSTOMS/Interface/AGS/Incoming/6c248c72-64c8-49bd-a908-a5d269f9ca0a.xml





-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Maurice Betzel <m....@gaston-schul.com>.
The producer uses post (async) to not block the Oracle DCN listener thread
comming from the JDBC driver so it can return asap.
This does not happen very often, like once a month where timely distances
may vary (no pattern), and indeed i see blacklist logs from Felix if i
remember correctly. The only solution on the net i found was setting the
timeout to 0, e.g. no timeout at all. This i find very troublesome, blocking
event-threads for a long time or even permanent, destabilizing the whole
node? So my timeout is set at 30 seconds which should be sufficient as the
JVM params i monitored are seldom at max levels.
On the consuming side the event is handed over ASAP to a ExcecuterService to
free up the event producing thread, which is comming from Hazelcast. If this
is blocked you will see the following without blacklisting logs, but also
stopping further eventing after release of the breakpoint:



Side question: is this consumer call synchronous comming from Hazelcast into
the OSGi EventAdmin?

So this seems to come out of nowhere from the producing node as i cannot
find any exceptions not system specific limitations, speculating on network
glitches? But then i should see more exceptions right?




-----
if ( you want ) { you can } else { you can't }
--
Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html

Re: Karaf Cellar Osgi Event Admin timeout

Posted by Jean-Baptiste Onofre <jb...@nanthrax.net>.
Hi Maurice,

Are you posting or sending event in EventAdmin (sync or async) ?

If you switch the root logger to debug, don’t you see blacklisting of event handlers ?

Regards
JB

> Le 1 mai 2020 à 10:46, Maurice Betzel <m....@gaston-schul.com> a écrit :
> 
> Hi community,
> 
> the Karaf cluster i am running with Karaf Cellar uses also the OSGi Event
> Broadcasting support (eventadmin) and i have noticed that it sometimes stops
> broadcasting without logging any errors. I can simulate the behavior if i
> add a break-point on the receiving side of the event and do not release it
> for a time if it gets hit.
> Then i do get a lot of exceptions in the logs, but the point is that any
> subsequent events do not get propagated any more, in production nor in my
> test.
> In order to restart eventing without a node restart i have to change the 
> org.apache.felix.eventadmin.impl.EventAdmin.cfg and save it so the Felix
> event-admin is reinitialized.
> Because i do not see any exceptions logged in production i was wondering if
> anybody else has experienced or even resolved the issue.
> 
> My current config:
> 
> org.apache.felix.eventadmin.AddTimestamp=true
> org.apache.felix.eventadmin.AddSubject=true
> org.apache.felix.eventadmin.Timeout=30000
> 
> i think it must timeout fast enough so i do not block to many threads.
> 
> 
> 
> --
> Sent from: http://karaf.922171.n3.nabble.com/Karaf-User-f930749.html