You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@celix.apache.org by Pepijn Noltes <pe...@gmail.com> on 2022/02/08 15:23:41 UTC

C++17 headers and next release

Hi All,

It has been a while since the last Celix release and I think it is time to
prepare for the next release.

One of the recent changes is that Celix now also has C++ support.
The C++ support is done by a header-only C++17 implementation that is built
on top of the Celix C framework and utils libraries.
A benefit of a header-only C++ implementation is that the Celix
framework and utils libraries can be kept pure C libraries and thus no
unwanted C++ dependency is introduced for C only users.

The reason why I did not propose a release earlier, also has to do with C++
support and specifically which version of C++ (C++11, C++14, C++17 or maybe
even C++20) to support.
Releasing a new Celix version earlier could have meant we were "stuck" with
C++11 or we needed to release a major (backwards incompatible) update.

Another change is that celix now handles all service registration, service
un-registration, service tracker start and service tracker stop events on a
single Celix event thread.
This was done to be able to prevent multithreading programming for users in
certain situations. Please note that this is a deviation from the OSGi
spec.

Additionally to new features in the framework and utils libraries, there
are also other new features:
 - Celix::Promise library. This is an implementation and adaptation of the
OSGi Promises specification [1]. Note this lib is still experimental.
 - Celix::PushStreams library. This is an implementation and adaptation of
the OSGi Push Stream specification [2]. Note this lib is still experimental.
 - Celix::ShellCxx bundle. A C++ implementation of the Celix::shell which
also supports a C++ Shell Command interface.
 - C++ Remote Service Admin bundles. A set of bundles which is an
implementing and adaptation of the OSGi Remote Service Admin specification
[3].
 - Support for multiple cmake build types (Debug, RelWithDebInfo)

[1] https://docs.osgi.org/specification/osgi.cmpn/7.0.0/util.promise.html
[2] https://docs.osgi.org/specification/osgi.cmpn/7.0.0/util.pushstream.html
[3]
https://docs.osgi.org/specification/osgi.cmpn/7.0.0/service.remoteserviceadmin.html

Lastly, a lot of stability improvements were added and a lot of bugs were
fixed.

Before releasing I would like to update some of the documentation and I am
currently busy with preparing a pull request for that.
Are there any other requests, remarks or things that need attention
concerning a new Celix release?

Greetings,
Pepijn

Re: ABBA deadlock

Posted by Pepijn Noltes <pe...@gmail.com>.
On Fri, Feb 25, 2022 at 12:00 PM Peng Zheng <zh...@qq.com.invalid>
wrote:

> On 2/25/22 17:54, Pepijn Noltes wrote:
> > Hi Peng,
> >
> > To start off, I see that some emails are directly mailed to me. I
> > think for future mails these discussions can go to the celix dev
> > mailing list.
>
> Oops, I kept ctrl+shift+Ring without noticing this ;p
>
> >
> > Maybe it is good to explain some history about the Celix event thread
> > and async api.
> >
> > Celix started with implementing the OSGi api from the specification as
> > directly as possible.
> > But this gave some issues:
> > 1) No real protected of services going away and their memory being
> > deallocated or even their bundle libraries being "dlclosed".
> > This led to the introduction of the "use services" approach instead of
> > the Java BundleContext.getServiceReference and the hiding of service
> > references for the user.
> >
> >
> > 2) C (pthread) mutexes are not the same as java synchronized
> > Essentially (if I am correct) Java synchronized works as a recursive
> > mutex. For Celix we did not want to promote the use of recursive
> > mutexes (the whole recursive mutexes are evil discussion).
> > But using normal mutexes in combination with service trackers which
> > triggers service usage in their "add" callbacks quickly lead to
> deadlocks.
> > This is especially true for something like the remote services. e.g.
> > in the same thread:  remote service found (RSA mutex) -> tracker add
> > callback called for component -> component registered remote services
> > -> new remote service added in RSA tracker callback (RSA mutex) ->
> > deadlock.
> >
> > Note that for Java OSGi it is rarer to have issues with this, but they
> > still occur (murphy's law). Mostly as a current modification
> > exception, because synchronized (recursive mutexes) do not protect the
> > expected invariants.
> >
> > This led to the introduction of handling service registration,
> > creating service trackers and using services one Celix event thread
> > and an async api next to the sync api.
> > In my experience this solved quite some issues we had, but is also not
> > perfect.
> > Especially because a combination of sync/async api is now possible,
> > this can lead to "solving" a deadlock issue in one spot only to have
> > it pop up again a bit further.
> >
> > And to be honest I do not know how we can tackle this in a better way.
> > The sync api is sometimes needed, for example if a component loses 1
> > of its required dependencies it needs to stop (remove its own
> > services) before the required dependency service can finish removing,
> > this requires a sync unregister service call. But to prevent deadlock
> > issues for the user, ideally all service registration, service
> > unregistration events are done async.
> >
> > I do have 1 though: Create a Celix event thread pool (instead of a
> > single thread), which can spin up new threads if needed to prevent
> > events being blocked by other events.
> > But I am not sure if this really solves the issue and whether it is
> > feasible to detect if an event needs to be handled on another thread.
> > It also has an impact on the footprint of Celix.
> >
> > In short I think the whole 1 Celix event thread, combination of
> > sync/async should be kept as-is and released for now. But I think it
> > would be wise to rethink the future of Celix and maybe try thinking
> > about a Celix 3 release with a major update/refactoring of how MT
> > issues are handled.
>
> The event loop design is elegant, and solves a lot of problems.
>

Check, good to hear :)


> What I'm really against is the behavior of
> celix_bundleContext_useServiceWithOptions.
> IMHO, the event loop is a very precious resource, which should only run
> non-blocking fast administrative tasks, i.e. the event loop should be
> for framework usage only.
> I also notices bundle install/uninstall is already offload to separate
> thread, which is really nice. This is quite similar to libuv's treatment
> of file operations (throwing them into thread pool).
> For high level application development, services often do things like
> performing network/file IO, querying database, CPU-intensive tasks, to
> name a few.
> These rarely have to mess with the event loop. Calling them on the event
> loop should be avoided at all costs IMO.
> However, as an easy-to-use API, the current default behavior of
> celix_bundleContext_useServiceWithOptions does encourage its users to do
> such things.
>

Yes, I think you are correct that the current setup for
celix_bundleContext_useServiceWithOptions does too much on the event thread.
IMO the main use case for the celix_bundleContext_useServiceWithOptions to
work on the event thread (and thus trigger the "service on demand"
pattern), is for testing purposes.
It is often easier to set up a use service call and see if you get an
expected service and that its expected behaviour.

So I think the addition of a "use directly" option is very welcome, does
not have to trigger the "service on demand" pattern and ideally should be
the default.
But to prevent breaking the current "use service" behaviour, I think it
wiser for now to keep the "use directly" option opt-in.



>
> >
> >
> > On Wed, Feb 23, 2022 at 2:00 PM Peng Zheng <zh...@qq.com> wrote:
> >
> >     On 2/23/22 19:13, Peng Zheng wrote:
> >>     On 2/23/22 18:41, Peng Zheng wrote:
> >>>>
> >>>>     psa->topicSenders.mutex is another example of external tracker
> >>>>     lock, which is locked by pubsub_zmqAdmin_removeProtocolSvc.
> >>>>     Noting that bundle is always stopped by a non-event-loop
> >>>>     thread, the following call chain will also lead to dead lock by
> >>>>     the observed pattern.
> >>>>     psa_zmq_stop->pubsub_zmqAdmin_destroy(holing
> >>>>
>  psa->topicSenders.mutex)->pubsub_zmqTopicSender_destroy->pubsubInterceptorsHandler_destroy->celix_bundleContext_stopTracker
> >>>>
> >>     Sorry, this example is wrong. psa_zmq_stop already stops all
> >>     trackers.
> >>
> >     Replacing sync api with async version solve the first deadlock.
> >     Then comes a second ABBA deadlock:
> >
> >     futex_abstimed_wait_cancelable 0x00007fd5efc6e7b1
> >     __pthread_cond_wait_common 0x00007fd5efc6e7b1
> >     __pthread_cond_timedwait 0x00007fd5efc6e7b1
> >     celixThreadCondition_timedwaitRelative celix_threads.c:196
> >     celix_framework_waitForGenericEvent framework.c:2729
> >     celix_bundleContext_useServiceWithOptions bundle_context.c:1228
> >     pubsub_serializerHandler_createForMarkerService
> pubsub_serializer_handler.c:169
> >     pubsub_zmqAdmin_getSerializationHandler pubsub_zmq_admin.c:759
> >     pubsub_zmqAdmin_setupTopicSender pubsub_zmq_admin.c:390
> >     pstm_setupTopicSenderCallback pubsub_topology_manager.c:903
> >     celix_serviceTracker_useHighestRankingService service_tracker.c:760
> >     celix_bundleContext_useServiceWithOptions_2_UseServiceTracker
> bundle_context.c:1187
> >     celix_bundleContext_useServiceWithOptions bundle_context.c:1222
> >     celix_bundleContext_useServiceWithId bundle_context.c:1132
> >     pstm_setupTopicSenders pubsub_topology_manager.c:977
> >     pstm_psaHandlingThread pubsub_topology_manager.c:1135
> >     start_thread 0x00007fd5efc67609
> >     clone 0x00007fd5ef840293
> >
> >     futex_wait_cancelable 0x00007fd5efc6e376
> >     __pthread_cond_wait_common 0x00007fd5efc6e376
> >     __pthread_cond_wait 0x00007fd5efc6e376
> >     celixThreadCondition_wait celix_threads.c:173
> >     tracked_waitAndDestroy service_tracker.c:82
> >     serviceTracker_untrackTracked service_tracker.c:576
> >     serviceTracker_untrack service_tracker.c:541
> >     serviceTracker_serviceChanged service_tracker.c:354
> >     celix_serviceRegistry_serviceChanged service_registry.c:1131
> >     serviceRegistry_unregisterService service_registry.c:272
> >     serviceRegistration_unregister service_registration.c:147
> >     celix_serviceRegistry_unregisterService service_registry.c:1212
> >     fw_handleEventRequest framework.c:1545
> >     fw_handleEvents framework.c:1595
> >     fw_eventDispatcher framework.c:1621
> >     start_thread 0x00007fd5efc67609
> >     clone 0x00007fd5ef840293
> >
> >     The first task, which is using "pubsub_admin", prevents the celix
> >     event loop (now waiting for pubsub_admin users vanishing)
> >     processing other event, which in turn prevents the first task
> >     making any progress.
> >
> >     I didn't expect the provided service do such nontrivial thing as
> >     calling celix_bundleContext_useServiceWithOptions.
> >
> >     Calling service on caller's thread should work if the service
> >     itself does not mess with the event loop. But obviously this
> >     assumption does not hold.
> >     I'd better leave celix_bundleContext_useServiceWithOptions
> >     untouched. So I'll revert the last commit.
> >     The PR should be considered complete and ready for review.
> >
> >
> > Yeah I recognise this. Especially the whole PubSubAdmin and
> > RemoteServiceAdmin approach tends to trigger deadlock quite quickly.
> > Maybe this can be solved by redesign and not allowing the "use
> > services" call in these situations (responsibility of the bundle
> > programmers).
> >
> > WDYT?
> The usage of celix_bundleContext_useServiceWithOptions by
> PubSubAdmin/RemoteServiceAdmin does fall into framework administrative
> usage, and they are not slow blocking operations.
> So I'm OK with them either.
>
> To avoid breaking existing users, I choose to add additional behavior
> without affecting the default.
>
> >     --
> >     Peng Zheng
> >
>
> --
> Peng Zheng
>

Greetings,
Pepijn

Re: ABBA deadlock

Posted by Peng Zheng <zh...@qq.com.INVALID>.
On 2/25/22 17:54, Pepijn Noltes wrote:
> Hi Peng,
>
> To start off, I see that some emails are directly mailed to me. I 
> think for future mails these discussions can go to the celix dev 
> mailing list.

Oops, I kept ctrl+shift+Ring without noticing this ;p

>
> Maybe it is good to explain some history about the Celix event thread 
> and async api.
>
> Celix started with implementing the OSGi api from the specification as 
> directly as possible.
> But this gave some issues:
> 1) No real protected of services going away and their memory being 
> deallocated or even their bundle libraries being "dlclosed".
> This led to the introduction of the "use services" approach instead of 
> the Java BundleContext.getServiceReference and the hiding of service 
> references for the user.
>
>
> 2) C (pthread) mutexes are not the same as java synchronized
> Essentially (if I am correct) Java synchronized works as a recursive 
> mutex. For Celix we did not want to promote the use of recursive 
> mutexes (the whole recursive mutexes are evil discussion).
> But using normal mutexes in combination with service trackers which 
> triggers service usage in their "add" callbacks quickly lead to deadlocks.
> This is especially true for something like the remote services. e.g. 
> in the same thread:  remote service found (RSA mutex) -> tracker add 
> callback called for component -> component registered remote services 
> -> new remote service added in RSA tracker callback (RSA mutex) -> 
> deadlock.
>
> Note that for Java OSGi it is rarer to have issues with this, but they 
> still occur (murphy's law). Mostly as a current modification 
> exception, because synchronized (recursive mutexes) do not protect the 
> expected invariants.
>
> This led to the introduction of handling service registration, 
> creating service trackers and using services one Celix event thread 
> and an async api next to the sync api.
> In my experience this solved quite some issues we had, but is also not 
> perfect.
> Especially because a combination of sync/async api is now possible, 
> this can lead to "solving" a deadlock issue in one spot only to have 
> it pop up again a bit further.
>
> And to be honest I do not know how we can tackle this in a better way.
> The sync api is sometimes needed, for example if a component loses 1 
> of its required dependencies it needs to stop (remove its own 
> services) before the required dependency service can finish removing, 
> this requires a sync unregister service call. But to prevent deadlock 
> issues for the user, ideally all service registration, service 
> unregistration events are done async.
>
> I do have 1 though: Create a Celix event thread pool (instead of a 
> single thread), which can spin up new threads if needed to prevent 
> events being blocked by other events.
> But I am not sure if this really solves the issue and whether it is 
> feasible to detect if an event needs to be handled on another thread.  
> It also has an impact on the footprint of Celix.
>
> In short I think the whole 1 Celix event thread, combination of 
> sync/async should be kept as-is and released for now. But I think it 
> would be wise to rethink the future of Celix and maybe try thinking 
> about a Celix 3 release with a major update/refactoring of how MT 
> issues are handled.

The event loop design is elegant, and solves a lot of problems.
What I'm really against is the behavior of 
celix_bundleContext_useServiceWithOptions.
IMHO, the event loop is a very precious resource, which should only run 
non-blocking fast administrative tasks, i.e. the event loop should be 
for framework usage only.
I also notices bundle install/uninstall is already offload to separate 
thread, which is really nice. This is quite similar to libuv's treatment 
of file operations (throwing them into thread pool).
For high level application development, services often do things like 
performing network/file IO, querying database, CPU-intensive tasks, to 
name a few.
These rarely have to mess with the event loop. Calling them on the event 
loop should be avoided at all costs IMO.
However, as an easy-to-use API, the current default behavior of 
celix_bundleContext_useServiceWithOptions does encourage its users to do 
such things.

>
>
> On Wed, Feb 23, 2022 at 2:00 PM Peng Zheng <zh...@qq.com> wrote:
>
>     On 2/23/22 19:13, Peng Zheng wrote:
>>     On 2/23/22 18:41, Peng Zheng wrote:
>>>>
>>>>     psa->topicSenders.mutex is another example of external tracker
>>>>     lock, which is locked by pubsub_zmqAdmin_removeProtocolSvc.
>>>>     Noting that bundle is always stopped by a non-event-loop
>>>>     thread, the following call chain will also lead to dead lock by
>>>>     the observed pattern.
>>>>     psa_zmq_stop->pubsub_zmqAdmin_destroy(holing
>>>>     psa->topicSenders.mutex)->pubsub_zmqTopicSender_destroy->pubsubInterceptorsHandler_destroy->celix_bundleContext_stopTracker
>>>>
>>     Sorry, this example is wrong. psa_zmq_stop already stops all
>>     trackers.
>>
>     Replacing sync api with async version solve the first deadlock.
>     Then comes a second ABBA deadlock:
>
>     futex_abstimed_wait_cancelable 0x00007fd5efc6e7b1
>     __pthread_cond_wait_common 0x00007fd5efc6e7b1
>     __pthread_cond_timedwait 0x00007fd5efc6e7b1
>     celixThreadCondition_timedwaitRelative celix_threads.c:196
>     celix_framework_waitForGenericEvent framework.c:2729
>     celix_bundleContext_useServiceWithOptions bundle_context.c:1228
>     pubsub_serializerHandler_createForMarkerService pubsub_serializer_handler.c:169
>     pubsub_zmqAdmin_getSerializationHandler pubsub_zmq_admin.c:759
>     pubsub_zmqAdmin_setupTopicSender pubsub_zmq_admin.c:390
>     pstm_setupTopicSenderCallback pubsub_topology_manager.c:903
>     celix_serviceTracker_useHighestRankingService service_tracker.c:760
>     celix_bundleContext_useServiceWithOptions_2_UseServiceTracker bundle_context.c:1187
>     celix_bundleContext_useServiceWithOptions bundle_context.c:1222
>     celix_bundleContext_useServiceWithId bundle_context.c:1132
>     pstm_setupTopicSenders pubsub_topology_manager.c:977
>     pstm_psaHandlingThread pubsub_topology_manager.c:1135
>     start_thread 0x00007fd5efc67609
>     clone 0x00007fd5ef840293
>
>     futex_wait_cancelable 0x00007fd5efc6e376
>     __pthread_cond_wait_common 0x00007fd5efc6e376
>     __pthread_cond_wait 0x00007fd5efc6e376
>     celixThreadCondition_wait celix_threads.c:173
>     tracked_waitAndDestroy service_tracker.c:82
>     serviceTracker_untrackTracked service_tracker.c:576
>     serviceTracker_untrack service_tracker.c:541
>     serviceTracker_serviceChanged service_tracker.c:354
>     celix_serviceRegistry_serviceChanged service_registry.c:1131
>     serviceRegistry_unregisterService service_registry.c:272
>     serviceRegistration_unregister service_registration.c:147
>     celix_serviceRegistry_unregisterService service_registry.c:1212
>     fw_handleEventRequest framework.c:1545
>     fw_handleEvents framework.c:1595
>     fw_eventDispatcher framework.c:1621
>     start_thread 0x00007fd5efc67609
>     clone 0x00007fd5ef840293
>
>     The first task, which is using "pubsub_admin", prevents the celix
>     event loop (now waiting for pubsub_admin users vanishing)
>     processing other event, which in turn prevents the first task
>     making any progress.
>
>     I didn't expect the provided service do such nontrivial thing as
>     calling celix_bundleContext_useServiceWithOptions.
>
>     Calling service on caller's thread should work if the service
>     itself does not mess with the event loop. But obviously this
>     assumption does not hold.
>     I'd better leave celix_bundleContext_useServiceWithOptions
>     untouched. So I'll revert the last commit.
>     The PR should be considered complete and ready for review.
>
>
> Yeah I recognise this. Especially the whole PubSubAdmin and 
> RemoteServiceAdmin approach tends to trigger deadlock quite quickly.
> Maybe this can be solved by redesign and not allowing the "use 
> services" call in these situations (responsibility of the bundle 
> programmers).
>
> WDYT?
The usage of celix_bundleContext_useServiceWithOptions by 
PubSubAdmin/RemoteServiceAdmin does fall into framework administrative 
usage, and they are not slow blocking operations.
So I'm OK with them either.

To avoid breaking existing users, I choose to add additional behavior 
without affecting the default.

>     -- 
>     Peng Zheng
>

-- 
Peng Zheng

Re: C++17 headers and next release

Posted by Peng Zheng <zh...@qq.com.INVALID>.
On 2/8/22 23:23, Pepijn Noltes wrote:
> Hi All,
>
> It has been a while since the last Celix release and I think it is time to
> prepare for the next release.

Thanks for making a more stable Celix release happen.

> The reason why I did not propose a release earlier, also has to do with C++
> support and specifically which version of C++ (C++11, C++14, C++17 or maybe
> even C++20) to support.
> Releasing a new Celix version earlier could have meant we were "stuck" with
> C++11 or we needed to release a major (backwards incompatible) update.

What's the minimum required version of gcc to support this?
In embedded linux world, toolchain is largely a SOC vendor thing,
i.e., we use whatever they provide. None of the seven toolchains I use
in my daytime job has gcc 7 support. That's unfortunate while I see no 
chance
of change in the coming 2 years.

Is the nhohmann/json approach of support multiple C++ versions acceptable?
If so, I consider making future PRs after the release under discussion 
happens.

> Another change is that celix now handles all service registration, service
> un-registration, service tracker start and service tracker stop events on a
> single Celix event thread.
> This was done to be able to prevent multithreading programming for users in
> certain situations. Please note that this is a deviation from the OSGi
> spec.
This solution is elegant. However, invoking service on event loop is bad.
Any blocking service is blocking the whole world.  We cannot require 
every service
to be non-blocking and users can call celix_bundleContext_useService for 
whatever
service they want to use.

I'll make a PR before this weekend.

> Before releasing I would like to update some of the documentation and I am
> currently busy with preparing a pull request for that.
> Are there any other requests, remarks or things that need attention
> concerning a new Celix release?

Two things come to my mind:

1. return value of remote services ([1]):
Should we employ error code scheme like HRESULT ([2]) to fix this?

2. Celix package ([3])
The main pain of using Celix is its cross-compilation (including all its 
dependency).
C/C++ package/dependency manager like conan/vcpkg solve this kind of 
problem like npm does for js.

I use conan for my celix build. Having upstream support will be way better.

>
> Greetings,
> Pepijn
>
[1] https://github.com/apache/celix/issues/396
[2] https://en.wikipedia.org/wiki/HRESULT
[3] https://github.com/apache/celix/issues/204

-- 
Peng Zheng