You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nuttx.apache.org by 奇诺 <zy...@126.com> on 2023/03/30 07:57:41 UTC

[Breaking change] Move nxmutex to sched

Hi, all:
I submitted a PR regarding mutex, the changes are as follows: 1. Move some mutex operations to sched;2. Add mutex_clocklock, mutex_set_protocol, and mutex_get_protocol interfaces;3. Remove cancellation point operations in mutex because it is not a cancellation point.

In addition, there will be a series of modifications based on this PR in the future.
1. pthread mutex is changed to depend on nxmutex instead of nxsem;
2. nxmutex no longer depends on nxsem and is implemented independently;
3. Move priority inheritance from nxsem to nxmutex, and optimize the performance of priority inheritance;


If there are any issues with this modification, please inform me, thanks!


PR: https://github.com/apache/nuttx/pull/8645


BR

Re: Development priorities (was: [Breaking change] Move nxmutex to sched)

Posted by Fotis Panagiotopoulos <f....@gmail.com>.

> >> I have mixed feelings myself and hope that we get some consensus
through
> >> dialog.  One one hand, it is important to stay faithful to documented
> >> standard and undocumented conventions for the use the a POSIX/Unix
> >> systems.  But on the other hand, unlike other OSs that strive toward
> >> standard conformance, we are an RTOS and must satisfy certain
> >> requirements for deterministic, real time behavior.
> >>
> >> What do you all think?

The most important aspect of NuttX is that it is a real-time OS.
It is its "special" feature.

If I wanted just a POSIX system, then I would simply use Linux.
Linux MPUs and SOMs are very cheap nowadays, and are not lacking in any way
compared to MCUs.

Being real-time is crucial for many applications.
Applications that only NuttX can satisfy, and not Linux (or anything
similar).
Ignoring this aspect of the OS, simply destroys it. It loses its purpose.
And breaks all systems developed till now.

Standards compliance is very important, for reasons we all understand and
agree on, but it should not
interfere with the actual purpose of the OS. Some small deviations were
always there, needed to make
such a system work, but they were always a "characteristic" of NuttX. Not a
"limitation".

Please do not get fanatic over the standards. Some things may be defined in
standards but they may
be totally useless or "stupid" for the actual real-life applications that
NuttX is used for. Similarly, some
"custom" features may be very well needed. Always be realistic.

>  1. Real time, deterministic behavior,
>  2. Standards compliance, and
>  3. OS Footprint

I couldn't agree more.
I propose a change in INVIOLABLES.md making the above totally clear.


On Fri, Mar 31, 2023 at 7:40 PM Tomek CEDRO <to...@cedro.info> wrote:

> On Fri, Mar 31, 2023 at 6:31 PM Nathan Hartman wrote:
> > In "[Breaking change] Move nxmutex to sched" there was a more general
> > discussion about our development priorities: What is most important to
> > us about NuttX?
> >
> > I'm pulling out this part of the discussion to a new thread, to avoid
> > clogging up the nxmutex discussion...
> > (..)
> > On Thu, Mar 30, 2023 at 5:44 PM Gregory Nutt wrote:
> > > We are creating something uncommon; we are creating an RTOS that let's
> > > you run POSIX (read  Linux ) code while retaining the real time,
> > > deterministic performance of an RTOS  If we sacrifice either the real
> > > time nature or POSIX compatibility, then we have failed.
> > >
> > > We are not building another Linux.  We already have a very nice one,
> > > thank you.
> > >
> > > We have had other discussions recently about tradeoffs between POSIX
> > > compatibility and code size.  I don't think that was resolved to
> > > everyone's satisfaction.
> > >
> > > It seems to me that when we have to make trade-offs , we tend to do so
> > > according to the following three values:
> > >
> > >  1. Real time, deterministic behavior,
> > >  2. Standards compliance, and
> > >  3. OS Footprint
> > >
> > > Based on recent decisions and tradeoffs, I list those in what seems to
> > > be their decreasing order of importance to the project. Do you agree
> > > with those values and their importance?  If so, should they be
> enshrined
> > > somehow?
> > >
> > > Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly
> addresses
> > > a lower level set of design values:  Modularity, coding style, etc.
> >
> > This is a very good summary. I think it describes very well what our
> > priorities have been until now, I think we should keep the same list
> > of priorities moving forward, and it might be useful to codify it
> > somewhere that NuttX is developed with this order of importance
> > (copying Greg's summary):
> >
> > [[[
> >  1. Real time, deterministic behavior,
> >  2. Standards compliance, and
> >  3. OS Footprint
> > ]]]
> >
> > Regarding (1), as has been said by Greg, myself, and probably others,
> > the real time deterministic behavior is critical. Without that, I
> > can't really use NuttX for anything significant.
> >
> > Regarding (2), the standards compliance is very helpful because it
> > makes it possible to write and test most of the non-real-time code on
> > a PC, where the edit-compile-debug cycle is much faster and more
> > convenient, and then move working code to embedded.
> >
> > Regarding (3), being careful not to grow the OS Flash footprint too
> > much means that we can make long-lived products and upgrade their
> > firmware well into the future. This is important for things used in
> > industry and infrastructure, where product life cycles are measured in
> > years to decades.
>
> +1 +1 +1 :-) :-) :-)
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>

Re: Development priorities (was: [Breaking change] Move nxmutex to sched)

Posted by Tomek CEDRO <to...@cedro.info>.

On Fri, Mar 31, 2023 at 6:31 PM Nathan Hartman wrote:
> In "[Breaking change] Move nxmutex to sched" there was a more general
> discussion about our development priorities: What is most important to
> us about NuttX?
>
> I'm pulling out this part of the discussion to a new thread, to avoid
> clogging up the nxmutex discussion...
> (..)
> On Thu, Mar 30, 2023 at 5:44 PM Gregory Nutt wrote:
> > We are creating something uncommon; we are creating an RTOS that let's
> > you run POSIX (read  Linux ) code while retaining the real time,
> > deterministic performance of an RTOS  If we sacrifice either the real
> > time nature or POSIX compatibility, then we have failed.
> >
> > We are not building another Linux.  We already have a very nice one,
> > thank you.
> >
> > We have had other discussions recently about tradeoffs between POSIX
> > compatibility and code size.  I don't think that was resolved to
> > everyone's satisfaction.
> >
> > It seems to me that when we have to make trade-offs , we tend to do so
> > according to the following three values:
> >
> >  1. Real time, deterministic behavior,
> >  2. Standards compliance, and
> >  3. OS Footprint
> >
> > Based on recent decisions and tradeoffs, I list those in what seems to
> > be their decreasing order of importance to the project. Do you agree
> > with those values and their importance?  If so, should they be enshrined
> > somehow?
> >
> > Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly addresses
> > a lower level set of design values:  Modularity, coding style, etc.
>
> This is a very good summary. I think it describes very well what our
> priorities have been until now, I think we should keep the same list
> of priorities moving forward, and it might be useful to codify it
> somewhere that NuttX is developed with this order of importance
> (copying Greg's summary):
>
> [[[
>  1. Real time, deterministic behavior,
>  2. Standards compliance, and
>  3. OS Footprint
> ]]]
>
> Regarding (1), as has been said by Greg, myself, and probably others,
> the real time deterministic behavior is critical. Without that, I
> can't really use NuttX for anything significant.
>
> Regarding (2), the standards compliance is very helpful because it
> makes it possible to write and test most of the non-real-time code on
> a PC, where the edit-compile-debug cycle is much faster and more
> convenient, and then move working code to embedded.
>
> Regarding (3), being careful not to grow the OS Flash footprint too
> much means that we can make long-lived products and upgrade their
> firmware well into the future. This is important for things used in
> industry and infrastructure, where product life cycles are measured in
> years to decades.

+1 +1 +1 :-) :-) :-)

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Development priorities (was: [Breaking change] Move nxmutex to sched)

Posted by Nathan Hartman <ha...@gmail.com>.

In "[Breaking change] Move nxmutex to sched" there was a more general
discussion about our development priorities: What is most important to
us about NuttX?

I'm pulling out this part of the discussion to a new thread, to avoid
clogging up the nxmutex discussion...

For context, with my replies below:

On Thu, Mar 30, 2023 at 5:44 PM Gregory Nutt <sp...@gmail.com> wrote:
>
>
> >> I have mixed feelings myself and hope that we get some consensus through
> >> dialog.  One one hand, it is important to stay faithful to documented
> >> standard and undocumented conventions for the use the a POSIX/Unix
> >> systems.  But on the other hand, unlike other OSs that strive toward
> >> standard conformance, we are an RTOS and must satisfy certain
> >> requirements for deterministic, real time behavior.
> >>
> >> What do you all think?
> >
> > My opinion is that we have to respect the requirements for deterministic
> > real-time behavior, even though that implies the addition of certain
> > non-standard interfaces. Otherwise we lose our identity as a real time
> > operating system and the applications I am doing with NuttX (and I'm sure
> > many other people) will not be possible.
> >
> > That said, I also very much like that NuttX strives for standards
> > conformance. For me, this means that most non-real-time code can be
> > developed and tested on a PC with a faster code-compile-debug cycle than
> > embedded and then moved over to embedded when it's ready. This has been a
> > huge productivity boost for me (and I'm sure, once again, for many other
> > people).
> >
> > How, then, do we satisfy both needs?
> >
> > I think the answer is that as long as standard functions behave like the
> > standards and practices expect, and deviations from the standards use
> > identifiers that do not collide with the standards, both needs are
> > satisfied well. Applications that do not utilize our real time "extensions"
> > will not notice the difference, and applications that do utilize them will
> > meet real time requirements as needed.
> >
> > I think that in large part we are already doing exactly that, so there
> > isn't really a problem that needs fixing here.
> >
> > I don't know the details of this specific PR yet, so I am just giving my
> > opinion about the premise of NuttX in general.
>
> Well said.
>
> We are creating something uncommon; we are creating an RTOS that let's
> you run POSIX (read  Linux ) code while retaining the real time,
> deterministic performance of an RTOS  If we sacrifice either the real
> time nature or POSIX compatibility, then we have failed.
>
> We are not building another Linux.  We already have a very nice one,
> thank you.
>
> We have had other discussions recently about tradeoffs between POSIX
> compatibility and code size.  I don't think that was resolved to
> everyone's satisfaction.
>
> It seems to me that when we have to make trade-offs , we tend to do so
> according to the following three values:
>
>  1. Real time, deterministic behavior,
>  2. Standards compliance, and
>  3. OS Footprint
>
> Based on recent decisions and tradeoffs, I list those in what seems to
> be their decreasing order of importance to the project. Do you agree
> with those values and their importance?  If so, should they be enshrined
> somehow?
>
> Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly addresses
> a lower level set of design values:  Modularity, coding style, etc.

This is a very good summary. I think it describes very well what our
priorities have been until now, I think we should keep the same list
of priorities moving forward, and it might be useful to codify it
somewhere that NuttX is developed with this order of importance
(copying Greg's summary):

[[[
 1. Real time, deterministic behavior,
 2. Standards compliance, and
 3. OS Footprint
]]]

Regarding (1), as has been said by Greg, myself, and probably others,
the real time deterministic behavior is critical. Without that, I
can't really use NuttX for anything significant.

Regarding (2), the standards compliance is very helpful because it
makes it possible to write and test most of the non-real-time code on
a PC, where the edit-compile-debug cycle is much faster and more
convenient, and then move working code to embedded.

Regarding (3), being careful not to grow the OS Flash footprint too
much means that we can make long-lived products and upgrade their
firmware well into the future. This is important for things used in
industry and infrastructure, where product life cycles are measured in
years to decades.

Cheers,
Nathan

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

>> I have mixed feelings myself and hope that we get some consensus through
>> dialog.  One one hand, it is important to stay faithful to documented
>> standard and undocumented conventions for the use the a POSIX/Unix
>> systems.  But on the other hand, unlike other OSs that strive toward
>> standard conformance, we are an RTOS and must satisfy certain
>> requirements for deterministic, real time behavior.
>>
>> What do you all think?
>
> My opinion is that we have to respect the requirements for deterministic
> real-time behavior, even though that implies the addition of certain
> non-standard interfaces. Otherwise we lose our identity as a real time
> operating system and the applications I am doing with NuttX (and I'm sure
> many other people) will not be possible.
>
> That said, I also very much like that NuttX strives for standards
> conformance. For me, this means that most non-real-time code can be
> developed and tested on a PC with a faster code-compile-debug cycle than
> embedded and then moved over to embedded when it's ready. This has been a
> huge productivity boost for me (and I'm sure, once again, for many other
> people).
>
> How, then, do we satisfy both needs?
>
> I think the answer is that as long as standard functions behave like the
> standards and practices expect, and deviations from the standards use
> identifiers that do not collide with the standards, both needs are
> satisfied well. Applications that do not utilize our real time "extensions"
> will not notice the difference, and applications that do utilize them will
> meet real time requirements as needed.
>
> I think that in large part we are already doing exactly that, so there
> isn't really a problem that needs fixing here.
>
> I don't know the details of this specific PR yet, so I am just giving my
> opinion about the premise of NuttX in general.

Well said.

We are creating something uncommon; we are creating an RTOS that let's 
you run POSIX (read  Linux ) code while retaining the real time, 
deterministic performance of an RTOS  If we sacrifice either the real 
time nature or POSIX compatibility, then we have failed.

We are not building another Linux.  We already have a very nice one, 
thank you.

We have had other discussions recently about tradeoffs between POSIX 
compatibility and code size.  I don't think that was resolved to 
everyone's satisfaction.

It seems to me that when we have to make trade-offs , we tend to do so 
according to the following three values:

 1. Real time, deterministic behavior,
 2. Standards compliance, and
 3. OS Footprint

Based on recent decisions and tradeoffs, I list those in what seems to 
be their decreasing order of importance to the project. Do you agree 
with those values and their importance?  If so, should they be enshrined 
somehow?

Some of this is in INVIOLABLES.md.  But INVIOLABLES.md  mostly addresses 
a lower level set of design values:  Modularity, coding style, etc.

Re:Re: [Breaking change] Move nxmutex to sched

Posted by 奇诺 <zy...@126.com>.



>My opinion is that we have to respect the requirements for deterministic

>real-time behavior, even though that implies the addition of certain
>non-standard interfaces. Otherwise we lose our identity as a real time
>operating system and the applications I am doing with NuttX (and I'm sure
>many other people) will not be possible.
>
>That said, I also very much like that NuttX strives for standards
>conformance. For me, this means that most non-real-time code can be
>developed and tested on a PC with a faster code-compile-debug cycle than
>embedded and then moved over to embedded when it's ready. This has been a
>huge productivity boost for me (and I'm sure, once again, for many other
>people).
>
>How, then, do we satisfy both needs?
>
>I think the answer is that as long as standard functions behave like the
>standards and practices expect, and deviations from the standards use
>identifiers that do not collide with the standards, both needs are
>satisfied well. Applications that do not utilize our real time "extensions"
>will not notice the difference, and applications that do utilize them will
>meet real time requirements as needed.
>
>I think that in large part we are already doing exactly that, so there
>isn't really a problem that needs fixing here.
>
>I don't know the details of this specific PR yet, so I am just giving my
>opinion about the premise of NuttX in general.



Hi, All:


I have also submitted the code that will be optimized next. The PR is as follows:


https://github.com/apache/nuttx/pull/8743

The main reasons for removing priority inheritance from the semaphore are as follows:

1. "Signaling semaphore" does not need to support priority inheritance. 
     After modification, nxsem only handles signaling semaphore.
     Some RTOS I have seen, such as Zephyr, also handle it this way.

2. It is beneficial to optimize the processes related to priority inheritance.
    From the results of my optimization, the optimization effect is very obvious.


Thanks.
Yuan

Re: [Breaking change] Move nxmutex to sched

Posted by Nathan Hartman <ha...@gmail.com>.

On Thu, Mar 30, 2023 at 9:49 AM Gregory Nutt <sp...@gmail.com> wrote:

>
> >> 3. Move priority inheritance from nxsem to nxmutex, and optimize the
> >> performance of priority inheritance;
> >
> > That will break every a lot of people's code.  There is probably a lot
> > of application logic that depends on priority inheritance working on
> > counting semaphores.  Removing that will problems for a lot of people.
> >
> > This is, however, consistent with how Linux works.
>
> AFAIK there is nothing that prohibits semaphores from supporting
> priority inheritance -- other than the fact that there is no "owner" of
> a semaphore as there is for a mutex.  Priority inheritance is very
> important for the correct behavior of some real time systems so we were
> able to use priority inheritance with counting semaphores by adding some
> non-standard interfaces to control whether a semaphore supports priority
> inheritance or not by its usage:
>
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>
> I have mixed feelings myself and hope that we get some consensus through
> dialog.  One one hand, it is important to stay faithful to documented
> standard and undocumented conventions for the use the a POSIX/Unix
> systems.  But on the other hand, unlike other OSs that strive toward
> standard conformance, we are an RTOS and must satisfy certain
> requirements for deterministic, real time behavior.
>
> What do you all think?

My opinion is that we have to respect the requirements for deterministic
real-time behavior, even though that implies the addition of certain
non-standard interfaces. Otherwise we lose our identity as a real time
operating system and the applications I am doing with NuttX (and I'm sure
many other people) will not be possible.

That said, I also very much like that NuttX strives for standards
conformance. For me, this means that most non-real-time code can be
developed and tested on a PC with a faster code-compile-debug cycle than
embedded and then moved over to embedded when it's ready. This has been a
huge productivity boost for me (and I'm sure, once again, for many other
people).

How, then, do we satisfy both needs?

I think the answer is that as long as standard functions behave like the
standards and practices expect, and deviations from the standards use
identifiers that do not collide with the standards, both needs are
satisfied well. Applications that do not utilize our real time "extensions"
will not notice the difference, and applications that do utilize them will
meet real time requirements as needed.

I think that in large part we are already doing exactly that, so there
isn't really a problem that needs fixing here.

I don't know the details of this specific PR yet, so I am just giving my
opinion about the premise of NuttX in general.

Nathan

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

Same problem as 
https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance

On 3/31/2023 2:19 PM, Xiang Xiao wrote:
> When the priority inheritance is enabled on a semaphore, sem_wait will add
> the current thread to the semaphore's holder table and expect that the same
> thread will call sem_post later to remove it from the holder table.
> If we mess this fundamental assumption by waiting/posting from different
> threads, many strange things will happen. For example, let's consider
> what's happen when a program send a TCP packet:
>
>     1. The send task call sem_wait to become a holder and get IOB
>     2. Network subsystem copy the user buffer into IOB and add IOB to the
>     queue
>     3. The send task exit and then semphare contain a dangling pointer to
>     the sending tcb
>     4. After network subsystem send IOB to the wire and return it  the pool,
>     sem_post is called and will touch the dangling pointer
>
> Zeng Zhaoxiu provides a patch(https://github.com/apache/nuttx/pull/5171) to
> workaround this issue.
> But, the semaphore holder tracking can't work as we expect anymore.
>
> On Sat, Apr 1, 2023 at 3:52 AM Petro Karashchenko <
> petro.karashchenko@gmail.com> wrote:
>
>> Xiang Xiao, is that still true for the latest code in master branch?
>> And by "system will
>> crash if  the priority inheritance enabled semaphore is waited or posted
>> from different threads" do you mean at the point of sem_post/sem_wait or
>> some system instability in general?
>>
>> Best regards,
>> Petro
>>
>> On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao <xi...@gmail.com>
>> wrote:
>>
>>> BTW, https://github.com/apache/nuttx/pull/5070 report that the system
>> will
>>> crash if  the priority inheritance enabled semaphore is waited or posted
>>> from different threads.
>>>
>>> On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao <xi...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <sp...@gmail.com>
>>> wrote:
>>>>> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>>>>>>> Even more. In my previous example if semaphore is posted from the
>>>>>>> interrupt
>>>>>>> we do not know which of TaskA or TaskB is no longer a "holder l"
>> of a
>>>>>>> semaphore.
>>>>>>>
>>>>>> You are right.  In this usage case, the counting semaphore is not
>>>>>> being used for locking; it is being used for signalling an event per
>>>>>>
>> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>>>>>> In that case, priority inheritance must be turned off.
>>>>>>
>>>>> You example is really confusing because you are mixing two different
>>>>> concepts, just subtly enough to be really confusing.  If a counting
>>>>> semaphore is used for locking a set of multiple resources, the posting
>>>>> the semaphore does not release the resource.  That is not the way that
>>>>> it is done.  Here is a more believable example of how this would work:
>>>>>
>>>>>   1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>>>>>      channel allocation function.  If a DMA channel is available, it is
>>>>>      allocated and the allocation function takes the semaphore. TaskA
>>>>>      then starts DMA activity.
>>>>>   2. TaskA waits on another signaling semaphore for the DMA to
>> complete.
>>>>>   3. TaskB with priority 20 does the same.
>>>>>   4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>>>>>      the channel allocation function which waits on the sempahore for a
>>>>>      count to be available.  This boost the priority of TaskA and TaskB
>>>>>      to 30.
>>>>>   5. When the DMA started by TaskA completes, it signals TaskA which
>>>>>      calls the resource deallocation function which increments the
>>>>>      counting semaphore, giving the count to TaskC and storing the base
>>>>>      priorities.
>>>>>
>>>>>
>>>> Normally, the resource(dma channel here) is allocated from one
>>>> thread/task, but may be freed in another thread/task. Please consider
>> how
>>>> we malloc and free memory.
>>>>
>>>>
>>>>> The confusion arises because you are mixing the signaling logic with
>> the
>>>>> resource deallocation logic.
>>>>>
>>>>> The mm/iob logic works basically this way.  The logic more complex
>> then
>>>>> you would think from above.  IOBs is an example of a critical system
>>>>> resource that has multiple copies and utilizes a counting semaphore
>> with
>>>>> priority inheritance to achieve good real time performance.   IOB
>>>>> handling is key logic for the correct real time operation of the
>> overall
>>>>> system.  Nothing we do must risk this.
>>>>>
>>>>>
>>>> IOB is a very good example to demonstrate why it's a bad and dangerous
>>>> idea to enable priority inheritance for the counting semaphore. IOB is
>>>> normally allocated in the send thread but free in the work thread. If
>> we
>>>> want the priority inheritance to work as expected instead of crashing
>> the
>>>> system, sem_wait/sem_post must come from the same thread, which is a
>> kind
>>>> of lock.
>>>>
>>>>
>>>>> Other places where this logic is (probably) used:
>>>>>
>>>>>      arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
>>>>>      CONFIG_RP2040_I2S_MAXINFLIGHT);
>>>>>      arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
>>>>>      init_val))
>>>>>      arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>>>>>      CONFIG_SAMA5_SSC_MAXINFLIGHT);
>>>>>      arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>>>>>      CONFIG_SAMV7_SSC_MAXINFLIGHT);
>>>>>      arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
>>>>>      CONFIG_STM32_I2S_MAXINFLIGHT);
>>>>>      drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
>>>>>      MCP2515_NUM_TX_BUFFERS);
>>>>>      drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
>>>>>      CONFIG_VNCSERVER_NUPDATES);
>>>>>      sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
>>>>>      0, (ntasks_waiting + 1));
>>>>>      wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem,
>> 0,
>>>>>      g_btdev.le_pkts);
>>>>>      wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0,
>>> 1);
>>>>>      wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
>>>>>      CONFIG_MAC802154_NTXDESC);
>>>>>      wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
>>>>>
>>>>> Maybe:
>>>>>
>>>>>      arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0,
>>> init);
>>>>>      arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
>>>>>      sem_init(&bt_sem->sem, 0, init);
>>>>>      arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
>>>>>      nxsem_init(sem, 0, init);
>>>>>      arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
>>>>> init);
>>>>>      arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem,
>> 0,
>>>>>      init);
>>>>>      arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
>>>>>      nxsem_init(sem, 0, init);
>>>>>
>>>>>
>>>>
>>>>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

>
>> BTW, https://github.com/apache/nuttx/pull/5070 report that the system 
>> will
>> crash if  the priority inheritance enabled semaphore is waited or posted
>> from different threads.
> True.  sem_post should fail if priority inheritance is enabled and the 
> caller is not a holder of a semaphore count.  That check should be 
> added.  Certainly it is not a justification for eliminating core 
> functionality.
>
PR https://github.com/apache/nuttx/pull/8938 adds that check.

If you merge this PR and enable CONFIG_DEBUG_ASSERTIONS, then the 
condition that causes the error mentioned  in the PR will always be 
detected and will cause an assertion.  This error condition has been 
known and was documented six years ago in the confluence Wiki: 
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance

It is not a new or recent issue.

Re: [Breaking change] Move nxmutex to sched

Posted by Fotis Panagiotopoulos <f....@gmail.com>.

There are two issues that I would like to bring to your attention.

#7393 - https://github.com/apache/nuttx/issues/7393
#5973 - https://github.com/apache/nuttx/issues/5973

On Sat, Apr 1, 2023 at 12:07 AM Gregory Nutt <sp...@gmail.com> wrote:

>
> > BTW, https://github.com/apache/nuttx/pull/5070 report that the system
> will
> > crash if  the priority inheritance enabled semaphore is waited or posted
> > from different threads.
> True.  sem_post should fail if priority inheritance is enabled and the
> caller is not a holder of a semaphore count.  That check should be
> added.  Certainly it is not a justification for eliminating core
> functionality.
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
> crash if  the priority inheritance enabled semaphore is waited or posted
> from different threads.
True.  sem_post should fail if priority inheritance is enabled and the 
caller is not a holder of a semaphore count.  That check should be 
added.  Certainly it is not a justification for eliminating core 
functionality.

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 4/6/2023 9:09 AM, Petro Karashchenko wrote:
> Also pthread_mutex is a solution for user space, but would be good to 
> have something similar for kernel space.
> During adding nxmutex wrapper to the code many misuse issues were 
> identified, so it is adding safety during coding and does not allow 
> misuse. Of course a binary semaphore is a solid building block, but 
> leaving it alone just increases the probability of errors in code. 
> Otherwise POSIX would not add pthread_mutex and would extend 
> semaphores. I mean that the need of it seems to be obvious since 
> pthread_mutex are a part of POSIX side by side to semaphores and it 
> would be good to have something like pthread_mutex for kernel.

That is not a very compelling argument.

All of the pthread interfaces exist only because they operate on 
pthreads and require pthread_t to identify the pthread.  The pthread IDs 
only exist within a "process."

semaphores have no inherent identification associated with them and have 
a global scope that can work across different "processes."

There are no "processes" in NuttX; we emulate them in the FLAT build 
with tasks.  A task is just a thread.

In NuttX, we use "heavy weight" threads:  A pthread is really the same 
as a task; it bound to the task group only via the shared group_s 
structure.  It is all smoke and mirrors to get POSIX compatibility.  
Underneath, a pthread and a task is the same; a mutex and a semaphore is 
the same.  Trying to make distinctions internally when there are none 
cannot lead to good design decisions.

[Linux, uses light weight threads which are something very different]

Trying to force them to be different is very artificial.  It is similar 
to the artificial designation of some task as threads. But at least that 
was done to meet the functional interface specification.  Mutexes are, 
similarly, just semaphores.  Again the exist only to meet POSIX 
requirements for pthread interfaces to go with the fake pthreads.  There 
is absolutely no reason to make some implementation distinction between 
them other than to meet the letter of the POSIX requirement.

Building another fake internal infrastructure does not seem like 
something that is in our best interest.

Re: [Breaking change] Move nxmutex to sched

Posted by Tomek CEDRO <to...@cedro.info>.

On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt  wrote:
>
> Oh my God!  That sounds terrible!  Does this change actually do
> /anything /positive.
>
> Duplicating the internal implementation of Linux is /NOT /positive.  It
> is irrelevant.
>
> Adopting GNU/Linux interfaces as a functional specification made since.
> But duplicating the ibnternal implementation of Linux is the dumbest
> thing I have ever heard.
>
> Losing any real time performance is /catastrophic /for an RTOS.
>
> Significantly increasing code size is /catastrophic /for an embedded OS.
>
> This is a very bad change that should never come into the repository.
> My recommendation is that you quietly close the PR without merge and be
> done with it.
>
> The solution that we want is:
>
>   * One that conforms to interface standards
>   * Results in the smallest footprint possible
>   * Gives the best real time behavior possible.
>
> Nothing else should be accepted.

Very strong +1 :-)

No long stories here, but I know Linux is usually a first contact with
the Open-Source world, on the other hand its also known to be a
self-incompatible long term maintenance nightmare mainly because of
"enforced changes ideology" (that comes from Microsoft) or "new is
better" approach, so following Linux is like never ending chasing the
rabbit story (i.e. look at changing kernel API every minor release
that forces you to constantly update something that is already out
there working fine, or lots of Linuxisms in the open source software
that makes it hardly portable along other POSIX OS while you will
always hear "works for me" or "not my fault" or "switch to Linux"
viral mantras). Its like JavaScript frameworks that have lifespan
shorter than a yogurt, would you build on  top of that amazingly
popular "solutions" yourself?

Lets stick to good old Unix fashion that works for for decades already
and will hopefully fork for decades ahead providing simple coherent
long term maintainable building blocks. Lets keep NuttX Unix in the
RTOS world :-)

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Also pthread_mutex is a solution for user space, but would be good to have
something similar for kernel space.
During adding nxmutex wrapper to the code many misuse issues were
identified, so it is adding safety during coding and does not allow misuse.
Of course a binary semaphore is a solid building block, but leaving it
alone just increases the probability of errors in code. Otherwise POSIX
would not add pthread_mutex and would extend semaphores. I mean that the
need of it seems to be obvious since pthread_mutex are a part of POSIX side
by side to semaphores and it would be good to have something like
pthread_mutex for kernel.

чт, 6 квіт. 2023 р. о 18:03 Petro Karashchenko <pe...@gmail.com>
пише:

> Yes. The main "mutex" attribute differentiating it from a binary semaphore
> is that it can be released only by a holder thread (even if priority
> inheritance is not enabled). And that is for the basic mutex, but recursive
> mutex also allows nested "obtain". So "recursive mutex" is not 1-to-1 the
> same as binary semaphore.
>
> чт, 6 квіт. 2023 р. о 17:36 Tomek CEDRO <to...@cedro.info> пише:
>
>> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
>> > Oh my God!  That sounds terrible!  Does this change actually do
>> > /anything /positive.
>>
>> Look Zyfeier, its not that we oppose development, we want the development
>> to done the right way that will bring elegant coherent standard compliant
>> solution as a result :-)
>>
>> Aside from my previous remark on Linux (along with other commercial OS)
>> "enforced changes", lets think about Greg's "does this change actually do
>> /anything /positive" question with another example.
>>
>> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an
>> innovation" by changing the Pin1 marking on the casing and put that mark on
>> pin 3 instead. Whole world use Pin1 marking to quickly align a component
>> pinout, so at first glance you can see where is the pin 1 of the component,
>> also most chips use VCC there so you can quickly measure things, nothing
>> fancy, everyone knows that. Now take a look at the pcb design footprint
>> (bottom layer mirrored) and the led datasheet.
>>
>> [image: shot-2023-04-06_16-09-06.jpg]
>> [image: shot-2023-04-06_16-09-23.jpg]
>> You can clearly see that putting Pin1 casing mark on pin 3 is a terrible
>> idea, even more that chip is symmetrical, so it will lead to bad placing
>> and reversed power supply. Sure, this is some innovation, but world does
>> not work that way and everyone just gets confused. When you make such
>> changes to other components a design becomes incoherent and no one will
>> then know anything, but look how many (fake) "innovations" just showed up.
>>
>> This is why solid coherent standardized fundamentals / foundations of
>> technology is so important. So we "just know" things intuitively, and we
>> can work together to improve things worldwide in a systematic fashion,
>> solid brick after solid brick, evolution not revolution. You cannot just
>> delete / replace what is already out there working fine.
>>
>> Example above is about electronic component, but with the software is
>> exactly the same, it is good to stick to well adapted standards, add your
>> own brick on top of solid inviolable fundamentals / fundation, not
>> necessarily following the quickly changing fashions and trends with a
>> lifespan of a yogurt, not spreading bad habits from other environments,
>> that will result in far better solution that is coherent and long term
>> maintainable. That results in a solid foundation for a good system / device
>> / solution / product.
>>
>> --
>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 4/6/2023 9:03 AM, Petro Karashchenko wrote:
> Yes. The main "mutex" attribute differentiating it from a binary 
> semaphore is that it can be released only by a holder thread (even if 
> priority inheritance is not enabled). And that is for the basic mutex, 
> but recursive mutex also allows nested "obtain". So "recursive mutex" 
> is not 1-to-1 the same as binary semaphore.

It is implemented some very simple, trivial logic on top of a binary 
semaphore.  I see that more as an add-on behavior to the basic locking 
operation.  I don't think it generates a different class of lock.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Yes. The main "mutex" attribute differentiating it from a binary semaphore
is that it can be released only by a holder thread (even if priority
inheritance is not enabled). And that is for the basic mutex, but recursive
mutex also allows nested "obtain". So "recursive mutex" is not 1-to-1 the
same as binary semaphore.

чт, 6 квіт. 2023 р. о 17:36 Tomek CEDRO <to...@cedro.info> пише:

> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> > Oh my God!  That sounds terrible!  Does this change actually do
> > /anything /positive.
>
> Look Zyfeier, its not that we oppose development, we want the development
> to done the right way that will bring elegant coherent standard compliant
> solution as a result :-)
>
> Aside from my previous remark on Linux (along with other commercial OS)
> "enforced changes", lets think about Greg's "does this change actually do
> /anything /positive" question with another example.
>
> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an
> innovation" by changing the Pin1 marking on the casing and put that mark on
> pin 3 instead. Whole world use Pin1 marking to quickly align a component
> pinout, so at first glance you can see where is the pin 1 of the component,
> also most chips use VCC there so you can quickly measure things, nothing
> fancy, everyone knows that. Now take a look at the pcb design footprint
> (bottom layer mirrored) and the led datasheet.
>
> [image: shot-2023-04-06_16-09-06.jpg]
> [image: shot-2023-04-06_16-09-23.jpg]
> You can clearly see that putting Pin1 casing mark on pin 3 is a terrible
> idea, even more that chip is symmetrical, so it will lead to bad placing
> and reversed power supply. Sure, this is some innovation, but world does
> not work that way and everyone just gets confused. When you make such
> changes to other components a design becomes incoherent and no one will
> then know anything, but look how many (fake) "innovations" just showed up.
>
> This is why solid coherent standardized fundamentals / foundations of
> technology is so important. So we "just know" things intuitively, and we
> can work together to improve things worldwide in a systematic fashion,
> solid brick after solid brick, evolution not revolution. You cannot just
> delete / replace what is already out there working fine.
>
> Example above is about electronic component, but with the software is
> exactly the same, it is good to stick to well adapted standards, add your
> own brick on top of solid inviolable fundamentals / fundation, not
> necessarily following the quickly changing fashions and trends with a
> lifespan of a yogurt, not spreading bad habits from other environments,
> that will result in far better solution that is coherent and long term
> maintainable. That results in a solid foundation for a good system / device
> / solution / product.
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

>> I believe that all of the lists and structures that you are concerned
>> with are already allocated from  kernel space memory, but I have not
> The lists do exist in user memory, although I agree that they should not.

Then that is a bug.  It is a serious security error if OS internal data 
structures are exposed and accessible in user space.  That must be fixed 
in my opinion.

People who only work in the FLAT address environment violate basic OS 
security like this all over the code.  And reviews who only use FLAT 
addressing cannot spot the security errors.

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> Hi, just to add that in the normal case obviously the correct process *is*
> running when the semaphore is accessed (sem_wait(), sem_post()), but there
> is one exception: sem_waitirq() which is called either by the watchdog ISR
> or by the signal dispatch logic. In this case most likely the correct
> process is *not* running, meaning access to the sem fields needed by the
> kernel either go to the wrong address environment, or cause a page fault.
>
> So the control fields in sem_t are tied to the process that instantiates
> the semaphore, and if it messes the fields up only the process itself is
> affected (the process can be killed for being naughty but the kernel
> survives). But in those two exception cases, the kernel needs access to the
> semaphore.

This is the best argument that I have heard for splitting the semaphore 
data.

Another option would be keep the kernel addressing within the sem_t.

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi, just to add that in the normal case obviously the correct process *is*
running when the semaphore is accessed (sem_wait(), sem_post()), but there
is one exception: sem_waitirq() which is called either by the watchdog ISR
or by the signal dispatch logic. In this case most likely the correct
process is *not* running, meaning access to the sem fields needed by the
kernel either go to the wrong address environment, or cause a page fault.

So the control fields in sem_t are tied to the process that instantiates
the semaphore, and if it messes the fields up only the process itself is
affected (the process can be killed for being naughty but the kernel
survives). But in those two exception cases, the kernel needs access to the
semaphore.

Br,
Ville Juven

On Wed, Apr 19, 2023 at 5:42 PM Ville Juven <vi...@gmail.com> wrote:

> Hi, thanks again for the responses and bearing with me
>
> > I believe that all of the lists and structures that you are concerned
> > with are already allocated from  kernel space memory, but I have not
>
> The lists do exist in user memory, although I agree that they should not.
>
> > verified that it all cases.  Certainly, they are things that would
> > never be accessed from user space
>
> Absolutely correct, the user does not need them nor should they be
> accessed/accessible by the user. However, they can be accessed, either
> intentionally, or by accident (e.g. via memcpy, memset, etc). Which means
> the user really can mess the lists up, causing the kernel to crash.
>
> > I would say that, as a rule, nothing in user space should have any
> > knowledge or access to the internals of a sem_t.  Looking at
> > libs/libc/semaphore, I don't believe that there is any,.
>
> You are correct, the API does not access or provide access to those kernel
> fields, but when a user instantiates sem_t, those kernel control fields
> also go to user space. They are also accessible to the user process via the
> named struct fields e.g. sem->waitlist.
>
> The following stupid user space example will compile and run just fine as
> the full composition of sem_t is visible to the user. The worker thread can
> access sem->waitlist and completely mess up the doubly linked queue, from
> user space.
>
> #include <semaphore.h>
>
> sem_t sem;
>
> void worker(void *arg)
> {
>   sem.waitlist = (void *)0xDEADBEEF;
>   sem_post(&sem);
> }
>
> int main(int argc, char *argv[])
> {
>   sem_init(&sem, 0, 0);
>   ...
>   spawn the worker thread here!
>   ...
>   sem_wait(&sem);
>   return 0;
> }
>
> The hhead objects are an exception, they are allocated from kernel space.
> But struct semholder_s holder[2]; is defined inside sem_t, thus if
> CONFIG_SEM_PREALLOCHOLDERS == 0, whenever sem_t is instantiated those go to
> user space too.
> hhead is not safe either, the user can mess up the priority inheritance
> list as well, by setting sem.hhead = (void *)0x592959;
>
> If there is something I'm missing I apologize for the spam, but I have
> indeed verified that the lists are not accessible by the kernel unless the
> correct process is running, which means the correct user mappings are
> active and the user memory is visible to the kernel.
>
> Br,
> Ville Juven
>
> On Wed, Apr 19, 2023 at 4:53 PM Gregory Nutt <sp...@gmail.com> wrote:
>
>>
>> >> However, as I said, there are some things in the sem_t structure whose
>> >> usage is ubiquitous and I don't know if the scalar access macros will
>> be
>> >> enough, i.e. I don't fully understand how they work / are supposed to
>> >> work.
>> >> ...
>> >
>> > I believe that all of the lists and structures that you are concerned
>> > with are already allocated from  kernel space memory, but I have not
>> > verified that it all cases.  Certainly, they are things that would
>> > never be accessed from user space.  The user memory sem_t "container"
>> > just holds references to kernel space allocations.
>> >
>> > If I am correct, then you should not have any problems.  But, as I
>> > said, I did not verify all of that.
>> I would say that, as a rule, nothing in user space should have any
>> knowledge or access to the internals of a sem_t.  Looking at
>> libs/libc/semaphore, I don't believe that there is any,.
>>
>>
>>

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi, thanks again for the responses and bearing with me

> I believe that all of the lists and structures that you are concerned
> with are already allocated from  kernel space memory, but I have not

The lists do exist in user memory, although I agree that they should not.

> verified that it all cases.  Certainly, they are things that would
> never be accessed from user space

Absolutely correct, the user does not need them nor should they be
accessed/accessible by the user. However, they can be accessed, either
intentionally, or by accident (e.g. via memcpy, memset, etc). Which means
the user really can mess the lists up, causing the kernel to crash.

> I would say that, as a rule, nothing in user space should have any
> knowledge or access to the internals of a sem_t.  Looking at
> libs/libc/semaphore, I don't believe that there is any,.

You are correct, the API does not access or provide access to those kernel
fields, but when a user instantiates sem_t, those kernel control fields
also go to user space. They are also accessible to the user process via the
named struct fields e.g. sem->waitlist.

The following stupid user space example will compile and run just fine as
the full composition of sem_t is visible to the user. The worker thread can
access sem->waitlist and completely mess up the doubly linked queue, from
user space.

#include <semaphore.h>

sem_t sem;

void worker(void *arg)
{
  sem.waitlist = (void *)0xDEADBEEF;
  sem_post(&sem);
}

int main(int argc, char *argv[])
{
  sem_init(&sem, 0, 0);
  ...
  spawn the worker thread here!
  ...
  sem_wait(&sem);
  return 0;
}

The hhead objects are an exception, they are allocated from kernel space.
But struct semholder_s holder[2]; is defined inside sem_t, thus if
CONFIG_SEM_PREALLOCHOLDERS == 0, whenever sem_t is instantiated those go to
user space too.
hhead is not safe either, the user can mess up the priority inheritance
list as well, by setting sem.hhead = (void *)0x592959;

If there is something I'm missing I apologize for the spam, but I have
indeed verified that the lists are not accessible by the kernel unless the
correct process is running, which means the correct user mappings are
active and the user memory is visible to the kernel.

Br,
Ville Juven

On Wed, Apr 19, 2023 at 4:53 PM Gregory Nutt <sp...@gmail.com> wrote:

>
> >> However, as I said, there are some things in the sem_t structure whose
> >> usage is ubiquitous and I don't know if the scalar access macros will be
> >> enough, i.e. I don't fully understand how they work / are supposed to
> >> work.
> >> ...
> >
> > I believe that all of the lists and structures that you are concerned
> > with are already allocated from  kernel space memory, but I have not
> > verified that it all cases.  Certainly, they are things that would
> > never be accessed from user space.  The user memory sem_t "container"
> > just holds references to kernel space allocations.
> >
> > If I am correct, then you should not have any problems.  But, as I
> > said, I did not verify all of that.
> I would say that, as a rule, nothing in user space should have any
> knowledge or access to the internals of a sem_t.  Looking at
> libs/libc/semaphore, I don't believe that there is any,.
>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

>> However, as I said, there are some things in the sem_t structure whose
>> usage is ubiquitous and I don't know if the scalar access macros will be
>> enough, i.e. I don't fully understand how they work / are supposed to 
>> work.
>> ...
>
> I believe that all of the lists and structures that you are concerned 
> with are already allocated from  kernel space memory, but I have not 
> verified that it all cases.  Certainly, they are things that would 
> never be accessed from user space.  The user memory sem_t "container" 
> just holds references to kernel space allocations.
>
> If I am correct, then you should not have any problems.  But, as I 
> said, I did not verify all of that.
I would say that, as a rule, nothing in user space should have any 
knowledge or access to the internals of a sem_t.  Looking at 
libs/libc/semaphore, I don't believe that there is any,.

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 4/19/2023 3:33 AM, Ville Juven wrote:
> However, as I said, there are some things in the sem_t structure whose
> usage is ubiquitous and I don't know if the scalar access macros will be
> enough, i.e. I don't fully understand how they work / are supposed to work.
> ...

I believe that all of the lists and structures that you are concerned 
with are already allocated from  kernel space memory, but I have not 
verified that it all cases.  Certainly, they are things that would never 
be accessed from user space.  The user memory sem_t "container" just 
holds references to kernel space allocations.

If I am correct, then you should not have any problems.  But, as I said, 
I did not verify all of that.

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi,

Thanks for the responses. The scalar read / write functions will work when
accessing singular fields from the semaphore, and such an access will (or
at least should be) always be correctly aligned so there is no risk in the
scalar value being split by a page boundary -> no extra work is required,
simply obtaining the kernel virtual address for the page + offset will
suffice.

However, as I said, there are some things in the sem_t structure whose
usage is ubiquitous and I don't know if the scalar access macros will be
enough, i.e. I don't fully understand how they work / are supposed to work.

struct sem_s
{
  volatile int16_t semcount;
  uint8_t flags;
  dq_queue_t waitlist; // This is accessed by the system scheduler when
traversing TSTATE_WAIT_SEM-list

// These are also accessed from all over
#ifdef CONFIG_PRIORITY_INHERITANCE
# if CONFIG_SEM_PREALLOCHOLDERS > 0
  FAR struct semholder_s *hhead;
# else
  struct semholder_s holder[2];
# endif
#endif
};

The waitlist doubly linked queue is used by the scheduler to implement the
TSTATE_WAIT_SEM list. This means the semaphore object is sporadically
accessed when the scheduler accesses tasks in that list. This means that
ANY user semaphore from any process must be accessible by the kernel to
traverse the list. Ok, we can access dqueue->head via pointer aligned
access, so that can be patched.

The next is the semholder list. I'm not even sure how much patching that
will need to work with MMU. The static holder[] allocations come from user
memory, while the "hhead" allocations come from a kernel pool. Ok, maybe
the test for CONFIG_SEM_PREALLOCHOLDERS can also test for MMU, if MMU is in
use, do not use the static holder[] list.

Finally, we have a reference to a (I don't know which, yet) semaphore in
struct semholder_s too. Maybe someone can help me understand what this
reference is for ?

A more detailed description of my findings is in the github issue I created
for this: https://github.com/apache/nuttx/issues/8917

Also, what Jukka said above about the security aspect is a perfectly valid
point. Now the dq waitlist and the semaphore holder list are in user
memory, which means the user process can mess those lists up, crashing the
kernel.

Br,
Ville Juven

PS. Whatever solution is implemented shall in no way increase code size or
modify functionality in the flat addressing mode(s). This is a requirement
for us as well, we have flat targets that have very limited memory as well.

On Tue, Apr 18, 2023 at 4:23 PM Gregory Nutt <sp...@gmail.com> wrote:

> On 4/18/2023 6:58 AM, Nathan Hartman wrote:
> > On Tue, Apr 18, 2023 at 8:52 AM spudaneco <sp...@gmail.com> wrote:
> >
> >> Perhaps you could use accessor functions to read/write 8, 16 32, 64 bit
> >> values.  Each scalar value will be properly aligned and, hence, will lie
> >> within a page.
> >
> >
> > And those accessor functions could be macros in FLAT builds where
> functions
> > are not required, so no memory or code size increase.
> >
> > Cheers
> > Nathan
> >
> The inefficiency is that each access would require a virtual to physical
> address look-up.  That could be alleviated by providing the base
> physical address of the the page and the offset of the sem_t iin the
> page as a parameter, maybe like:
>
>      struct paginfo_t pageinfo;
>
>      get_pageinfo(sem, &pageinfo);
>
>      value = get_pagevalue16(&pageinfo, &sem->field);
>
> If the field like in the following page, then get_pagevalue16 would have
> to work harder.  But the usual case, where the sem_t lies wholly in the
> page would be relatively fast.
>
> This, however, would not macro-ize as well.
>
> Not that no special alignment of the sem_t is required if you access by
> scalar field.
>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 4/18/2023 6:58 AM, Nathan Hartman wrote:
> On Tue, Apr 18, 2023 at 8:52 AM spudaneco <sp...@gmail.com> wrote:
>
>> Perhaps you could use accessor functions to read/write 8, 16 32, 64 bit
>> values.  Each scalar value will be properly aligned and, hence, will lie
>> within a page.
>
>
> And those accessor functions could be macros in FLAT builds where functions
> are not required, so no memory or code size increase.
>
> Cheers
> Nathan
>
The inefficiency is that each access would require a virtual to physical 
address look-up.  That could be alleviated by providing the base 
physical address of the the page and the offset of the sem_t iin the 
page as a parameter, maybe like:

     struct paginfo_t pageinfo;

     get_pageinfo(sem, &pageinfo);

     value = get_pagevalue16(&pageinfo, &sem->field);

If the field like in the following page, then get_pagevalue16 would have 
to work harder.  But the usual case, where the sem_t lies wholly in the 
page would be relatively fast.

This, however, would not macro-ize as well.

Not that no special alignment of the sem_t is required if you access by 
scalar field.

Re: [Breaking change] Move nxmutex to sched

Posted by Nathan Hartman <ha...@gmail.com>.

On Tue, Apr 18, 2023 at 8:52 AM spudaneco <sp...@gmail.com> wrote:

> Perhaps you could use accessor functions to read/write 8, 16 32, 64 bit
> values.  Each scalar value will be properly aligned and, hence, will lie
> within a page.

And those accessor functions could be macros in FLAT builds where functions
are not required, so no memory or code size increase.

Cheers
Nathan

Re: [Breaking change] Move nxmutex to sched

Posted by spudaneco <sp...@gmail.com>.

Perhaps you could use accessor functions to read/write 8, 16 32, 64 bit values.  Each scalar value will be properly aligned and, hence, will lie within a page.Sent from my Galaxy
-------- Original message --------From: Ville Juven <vi...@gmail.com> Date: 4/18/23  3:48 AM  (GMT-06:00) To: dev@nuttx.apache.org Subject: Re: [Breaking change] Move nxmutex to sched Hi,Sorry for the spamfest. Forget what I said about the alignment being asolution. It is not. Any object allocated from heap that contains a sem_tstructure cannot guarantee that the alignment of sem_t is correct.So I was just being dumb..Br,Ville JuvenOn Tue, Apr 18, 2023 at 11:39 AM Jukka Laitinen <ju...@iki.fi>wrote:> Hi,>> I like the alignment idea, thanks for bringing it up!>> I think forcing the alignment for the semaphore, and accessing it> directly via page pool from the kernel is the simplest and most trivial> thing. It implements what also Greg suggested + fixes the issue of> semaphore being on the page boundary.>> Since the semaphores in CONFIG_BUILD_KERNEL just don't work at all> currently, the best option for now, IMHO, is to implement this solution> and always align the semaphores as suggested in (and only in)> CONFIG_BUILD_KERNEL case. This just makes the semaphores work with> minimal code changes.>> I still wouldn't bury the idea of splitting the semaphore into user and> kernel parts entirely. To me it would make perfect sense to cleanly> separate things which belong to kernel from the things that belong to> user space. That solution just needs to maintain the real time> properties, as discussed before, and not break existing functionality or> increase memory consumption.>> Current semaphore solution, even with the page-pool access, still has> the problem that user side code has got access to structures belonging> to the kernel (lists which kernel loops through while scheduling &> managing priority inheritance). So a user side process corrupting it's> own semaphore structure can at least crash or jam the kernel.>> But, first things first, the solution which Ville suggested below would> fix the most burning issue for us for now.>> Thanks,>> Jukka>>> On 18.4.2023 10.26, Sebastien Lorquet wrote:> > Hi all,> >> > As Tomek said, whatever you choose to do, please make sure everything> > of this is absolutely kept optional, at least during the merge and> > community validation phase.> >> > I dont think connecting these proposals to CONFIG_BUILD_KERNEL is> > granular enough.> >> > Thanks,> >> > Sebastien> >> > Le 18/04/2023 à 09:18, Ville Juven a écrit :> >> Hi all,> >>> >> Sorry, this is going a bit off topic.> >>> >> One wild solution specifically to solve my/our problem> >> (CONFIG_BUILD_KERNEL) might be to force a "next power-of-two> >> alignment" for> >> the semaphore. This would ensure that the semaphore ALWAYS fits within a> >> single page and remove the need for cross page checks / mappings in this> >> specific case.> >>> >> Although I will still implement a more generic map function (it's almost> >> done anyway), in the case of semaphores this would simply mean that no> >> semaphore will ever need to be explicitly mapped into kernel memory,> >> fetching the page pool mapping will be enough.> >>> >> For our case the size of sem_t is 32 or 40 bytes (depending on> >> configuration), this would be aligned to 32 ot 64-bytes which ensures> >> that> >> the entire structure fits in a single page. The alignment requirement> >> can> >> also be flagged behind CONFIG_BUILD_KERNEL, as such a requirement is not> >> necessary for the flat addressing modes.> >>> >> What do you think? Too wild, or something worth considering /> >> acceptable ?> >>> >> Br,> >> Ville Juven / pussuw on github> >>> >> On Mon, Apr 17, 2023 at 8:58 PM Tomek CEDRO <to...@cedro.info> wrote:> >>> >>> if it possible to add new functionality as optional feature?> >>>> >>> --> >>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info> >>>>

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi,

Sorry for the spamfest. Forget what I said about the alignment being a
solution. It is not. Any object allocated from heap that contains a sem_t
structure cannot guarantee that the alignment of sem_t is correct.

So I was just being dumb..

Br,
Ville Juven

On Tue, Apr 18, 2023 at 11:39 AM Jukka Laitinen <ju...@iki.fi>
wrote:

> Hi,
>
> I like the alignment idea, thanks for bringing it up!
>
> I think forcing the alignment for the semaphore, and accessing it
> directly via page pool from the kernel is the simplest and most trivial
> thing. It implements what also Greg suggested + fixes the issue of
> semaphore being on the page boundary.
>
> Since the semaphores in CONFIG_BUILD_KERNEL just don't work at all
> currently, the best option for now, IMHO, is to implement this solution
> and always align the semaphores as suggested in (and only in)
> CONFIG_BUILD_KERNEL case. This just makes the semaphores work with
> minimal code changes.
>
> I still wouldn't bury the idea of splitting the semaphore into user and
> kernel parts entirely. To me it would make perfect sense to cleanly
> separate things which belong to kernel from the things that belong to
> user space. That solution just needs to maintain the real time
> properties, as discussed before, and not break existing functionality or
> increase memory consumption.
>
> Current semaphore solution, even with the page-pool access, still has
> the problem that user side code has got access to structures belonging
> to the kernel (lists which kernel loops through while scheduling &
> managing priority inheritance). So a user side process corrupting it's
> own semaphore structure can at least crash or jam the kernel.
>
> But, first things first, the solution which Ville suggested below would
> fix the most burning issue for us for now.
>
> Thanks,
>
> Jukka
>
>
> On 18.4.2023 10.26, Sebastien Lorquet wrote:
> > Hi all,
> >
> > As Tomek said, whatever you choose to do, please make sure everything
> > of this is absolutely kept optional, at least during the merge and
> > community validation phase.
> >
> > I dont think connecting these proposals to CONFIG_BUILD_KERNEL is
> > granular enough.
> >
> > Thanks,
> >
> > Sebastien
> >
> > Le 18/04/2023 à 09:18, Ville Juven a écrit :
> >> Hi all,
> >>
> >> Sorry, this is going a bit off topic.
> >>
> >> One wild solution specifically to solve my/our problem
> >> (CONFIG_BUILD_KERNEL) might be to force a "next power-of-two
> >> alignment" for
> >> the semaphore. This would ensure that the semaphore ALWAYS fits within a
> >> single page and remove the need for cross page checks / mappings in this
> >> specific case.
> >>
> >> Although I will still implement a more generic map function (it's almost
> >> done anyway), in the case of semaphores this would simply mean that no
> >> semaphore will ever need to be explicitly mapped into kernel memory,
> >> fetching the page pool mapping will be enough.
> >>
> >> For our case the size of sem_t is 32 or 40 bytes (depending on
> >> configuration), this would be aligned to 32 ot 64-bytes which ensures
> >> that
> >> the entire structure fits in a single page. The alignment requirement
> >> can
> >> also be flagged behind CONFIG_BUILD_KERNEL, as such a requirement is not
> >> necessary for the flat addressing modes.
> >>
> >> What do you think? Too wild, or something worth considering /
> >> acceptable ?
> >>
> >> Br,
> >> Ville Juven / pussuw on github
> >>
> >> On Mon, Apr 17, 2023 at 8:58 PM Tomek CEDRO <to...@cedro.info> wrote:
> >>
> >>> if it possible to add new functionality as optional feature?
> >>>
> >>> --
> >>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
> >>>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Jukka Laitinen <ju...@iki.fi>.

Hi,

I like the alignment idea, thanks for bringing it up!

I think forcing the alignment for the semaphore, and accessing it 
directly via page pool from the kernel is the simplest and most trivial 
thing. It implements what also Greg suggested + fixes the issue of 
semaphore being on the page boundary.

Since the semaphores in CONFIG_BUILD_KERNEL just don't work at all 
currently, the best option for now, IMHO, is to implement this solution 
and always align the semaphores as suggested in (and only in) 
CONFIG_BUILD_KERNEL case. This just makes the semaphores work with 
minimal code changes.

I still wouldn't bury the idea of splitting the semaphore into user and 
kernel parts entirely. To me it would make perfect sense to cleanly 
separate things which belong to kernel from the things that belong to 
user space. That solution just needs to maintain the real time 
properties, as discussed before, and not break existing functionality or 
increase memory consumption.

Current semaphore solution, even with the page-pool access, still has 
the problem that user side code has got access to structures belonging 
to the kernel (lists which kernel loops through while scheduling & 
managing priority inheritance). So a user side process corrupting it's 
own semaphore structure can at least crash or jam the kernel.

But, first things first, the solution which Ville suggested below would 
fix the most burning issue for us for now.

Thanks,

Jukka

On 18.4.2023 10.26, Sebastien Lorquet wrote:
> Hi all,
>
> As Tomek said, whatever you choose to do, please make sure everything 
> of this is absolutely kept optional, at least during the merge and 
> community validation phase.
>
> I dont think connecting these proposals to CONFIG_BUILD_KERNEL is 
> granular enough.
>
> Thanks,
>
> Sebastien
>
> Le 18/04/2023 à 09:18, Ville Juven a écrit :
>> Hi all,
>>
>> Sorry, this is going a bit off topic.
>>
>> One wild solution specifically to solve my/our problem
>> (CONFIG_BUILD_KERNEL) might be to force a "next power-of-two 
>> alignment" for
>> the semaphore. This would ensure that the semaphore ALWAYS fits within a
>> single page and remove the need for cross page checks / mappings in this
>> specific case.
>>
>> Although I will still implement a more generic map function (it's almost
>> done anyway), in the case of semaphores this would simply mean that no
>> semaphore will ever need to be explicitly mapped into kernel memory,
>> fetching the page pool mapping will be enough.
>>
>> For our case the size of sem_t is 32 or 40 bytes (depending on
>> configuration), this would be aligned to 32 ot 64-bytes which ensures 
>> that
>> the entire structure fits in a single page. The alignment requirement 
>> can
>> also be flagged behind CONFIG_BUILD_KERNEL, as such a requirement is not
>> necessary for the flat addressing modes.
>>
>> What do you think? Too wild, or something worth considering / 
>> acceptable ?
>>
>> Br,
>> Ville Juven / pussuw on github
>>
>> On Mon, Apr 17, 2023 at 8:58 PM Tomek CEDRO <to...@cedro.info> wrote:
>>
>>> if it possible to add new functionality as optional feature?
>>>
>>> -- 
>>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>>>

Re: [Breaking change] Move nxmutex to sched

Posted by Sebastien Lorquet <se...@lorquet.fr>.

Hi all,

As Tomek said, whatever you choose to do, please make sure everything of 
this is absolutely kept optional, at least during the merge and 
community validation phase.

I dont think connecting these proposals to CONFIG_BUILD_KERNEL is 
granular enough.

Thanks,

Sebastien

Le 18/04/2023 à 09:18, Ville Juven a écrit :
> Hi all,
>
> Sorry, this is going a bit off topic.
>
> One wild solution specifically to solve my/our problem
> (CONFIG_BUILD_KERNEL) might be to force a "next power-of-two alignment" for
> the semaphore. This would ensure that the semaphore ALWAYS fits within a
> single page and remove the need for cross page checks / mappings in this
> specific case.
>
> Although I will still implement a more generic map function (it's almost
> done anyway), in the case of semaphores this would simply mean that no
> semaphore will ever need to be explicitly mapped into kernel memory,
> fetching the page pool mapping will be enough.
>
> For our case the size of sem_t is 32 or 40 bytes (depending on
> configuration), this would be aligned to 32 ot 64-bytes which ensures that
> the entire structure fits in a single page. The alignment requirement can
> also be flagged behind CONFIG_BUILD_KERNEL, as such a requirement is not
> necessary for the flat addressing modes.
>
> What do you think? Too wild, or something worth considering / acceptable ?
>
> Br,
> Ville Juven / pussuw on github
>
> On Mon, Apr 17, 2023 at 8:58 PM Tomek CEDRO <to...@cedro.info> wrote:
>
>> if it possible to add new functionality as optional feature?
>>
>> --
>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>>

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi all,

Sorry, this is going a bit off topic.

One wild solution specifically to solve my/our problem
(CONFIG_BUILD_KERNEL) might be to force a "next power-of-two alignment" for
the semaphore. This would ensure that the semaphore ALWAYS fits within a
single page and remove the need for cross page checks / mappings in this
specific case.

Although I will still implement a more generic map function (it's almost
done anyway), in the case of semaphores this would simply mean that no
semaphore will ever need to be explicitly mapped into kernel memory,
fetching the page pool mapping will be enough.

For our case the size of sem_t is 32 or 40 bytes (depending on
configuration), this would be aligned to 32 ot 64-bytes which ensures that
the entire structure fits in a single page. The alignment requirement can
also be flagged behind CONFIG_BUILD_KERNEL, as such a requirement is not
necessary for the flat addressing modes.

What do you think? Too wild, or something worth considering / acceptable ?

Br,
Ville Juven / pussuw on github

On Mon, Apr 17, 2023 at 8:58 PM Tomek CEDRO <to...@cedro.info> wrote:

> if it possible to add new functionality as optional feature?
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>

Re: [Breaking change] Move nxmutex to sched

Posted by Tomek CEDRO <to...@cedro.info>.

if it possible to add new functionality as optional feature?

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

Posted by Jukka Laitinen <jl...@gmail.com>.

Hi, thanks for your contribution to the discussion!

Gregory Nutt kirjoitti maanantai 17. huhtikuuta 2023:
> Linux uses functions like copytouser() and copyfromuser() to get 
> information to/from user space.
>

I'm afraid that using this for semaphores would become quite heavy...
 
> But I think there is an easier way in NuttX.  All user memory comes from 
> a poll of pages that are also mapped in kernel space.  I think that is 
> true for all architectures.  And there should be a function to convert a 
> user virtual address to a user virtual address to physical address.  I 
> am not sure what all is in place.
> 
> Couldn't accessing user memory from the kernel address alias avoid the 
> problem you describe.  Of course, you would have to be careful at page 
> boundaries because contiguous virtual pages may not be physically 
> contiguous.

Yes, this is exactly the workaround which we have used. And as you stated, it breaks when the needed structure happens to be on the page boundary. Another way around would be mapping the pages for kernel. This is not complex, but still requires finding free virtual memory area for the purpose. And, mapping two full pages for just one semaphore. So it is not so nice.

> 
> On 4/14/2023 6:18 AM, Jukka Laitinen wrote:
> > Hi,
> >
> > I am not sure whether it is necessary to separate mutex and semaphore 
> > (although I do see the performance gain that it would give for mutex), 
> > but there is another related topic I would like to raise.
> >
> > Currently, the semaphores don't work (at all) for CONFIG_BUILD_KERNEL. 
> > The simple reason is that the semaphores are allocated from the 
> > user-mapped memory, which is not always available for the kernel while 
> > scheduling or in interrupts. At the time when it is needed, there may 
> > be another memory map active for mmu.
> >
> > There is also an issue with performance; every semaphore access needs 
> > to go to the kernel through syscall, although in principle the 
> > semaphore counter handling alone doesn't need that if the compiler & 
> > hw has the necessary atomic support.
> >
> > We are especially interested in having real-time behaviour (priority 
> > based scheduling, priority inheritance...) AND running 
> > CONFIG_BUILD_KERNEL. We have used some methods to circumvent the 
> > issue, but for those I am not going into details as we don't have a 
> > publishable implementation ready.
> >
> > A tempting way to fix the problem (which we didn't try out yet) would 
> > be separating the semaphores in two parts, kernel side structure and 
> > the user side structure. Something that zyfeier also did with the 
> > "futex" linux-like implementation. But, also this kind of 
> > implementation should be real-time - so when there is access to the 
> > semaphore via syscall (e.g. when the semaphore blocks), or when 
> > scheduling, the kernel must have O(1) access to the kernel side 
> > structure - no hashing / allocating etc. at runtime.
> >
> > So to summarize, for CONFIG_BUILD_KERNEL the semaphores could 
> > *perhaps* work like this (this is not yet tried out, so please forgive 
> > me if something is forgotten):
> > - User-side semaphore handle would have the counter and a direct 
> > pointer (handle) to the kernel side structure (which can be passed to 
> > kernel in syscall).
> > - Kernel side structure would have the needed wait queue and sem 
> > holder structures (and flags?)
> > - Kernel side structure would be allocated at sem_init (AND if it was 
> > not initialized, allocate it at the time when it is needed?). To 
> > achieve real-time behaviour one should just call sem_init properly at 
> > startup of the application.
> > - Kernel side structures would be listed in tcb and cleaned up at 
> > task_group exit. Also some hard limit/management for how much kernel 
> > memory can one process eat from kernel heap is needed.
> > - Counter manipulation can be handled directly in libc in case 
> > compiler supports proper atomic operations, or syscall to kernel when 
> > there is no support available (this would be just performance 
> > optimization - next phase)
> >
> > Whether it is feasible to do it only for CONFIG_BUILD_KERNEL, or as a 
> > common implementation for all  build modes, I didn't think of yet. I 
> > am also not sure whether the re-design of semaphore could also lead to 
> > better wrapping of it for mutex use, but this is also possible. In 
> > that case it could *maybe* solve the performance issue zyfeier tried 
> > to tackle.
> >
> > This is just one idea, but somehow the problem of not working 
> > semaphores in CONFIG_BUILD_KERNEL should be tackled. I wonder if this 
> > is something we should experiment with? If someone is interested in 
> > such an experiment, please let me know. Or if someone is interested in 
> > doing this experiment, please let me know as well, so we don't end up 
> > doing duplicate work :)
> >
> > Br,
> > Jukka
> >
> > Ps. I think that in the current implementation the nxmutex code is 
> > inlined everywhere, increasing code size. Not a huge issue for me, but 
> > increasing code size should be managed....
> >
> > On 7.4.2023 5.18, zyfeier wrote:
> >>
> >> Thank you very much for the example you provided. What I want to 
> >> point out is that this is not just about " just delete / replace what 
> >> is already out there working fine ". Due to the multi-holder of the 
> >> count semaphore, the performance of the mutex is much worse than 
> >> other RTOS (with a performance gap of 10%), but these operations are 
> >> not necessary for the mutex. That's why there is an idea to separate 
> >> the mutex and semaphore.
> >>
> >> However, if everyone thinks that separating the mutex and semaphore 
> >> is a bad idea, then we need to think of other methods. Do you have 
> >> any better methods to offer?
> >>
> >> 从Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
> >>
> >> *发件人: *Tomek CEDRO <ma...@cedro.info>
> >> *发送时间: *2023年4月6日22:36
> >> *收件人: *dev@nuttx.apache.org
> >> *主题: *Re: [Breaking change] Move nxmutex to sched
> >>
> >> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> >> > Oh my God!  That sounds terrible!  Does this change actually do
> >> > /anything /positive.
> >>
> >> Look Zyfeier, its not that we oppose development, we want the 
> >> development to done the right way that will bring elegant coherent 
> >> standard compliant solution as a result :-)
> >>
> >> Aside from my previous remark on Linux (along with other commercial 
> >> OS) "enforced changes", lets think about Greg's "does this change 
> >> actually do /anything /positive" question with another example.
> >>
> >> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an 
> >> innovation" by changing the Pin1 marking on the casing and put that 
> >> mark on pin 3 instead. Whole world use Pin1 marking to quickly align 
> >> a component pinout, so at first glance you can see where is the pin 1 
> >> of the component, also most chips use VCC there so you can quickly 
> >> measure things, nothing fancy, everyone knows that. Now take a look 
> >> at the pcb design footprint (bottom layer mirrored) and the led 
> >> datasheet.
> >>
> >>
> >> You can clearly see that putting Pin1 casing mark on pin 3 is a 
> >> terrible idea, even more that chip is symmetrical, so it will lead to 
> >> bad placing and reversed power supply. Sure, this is some innovation, 
> >> but world does not work that way and everyone just gets confused. 
> >> When you make such changes to other components a design becomes 
> >> incoherent and no one will then know anything, but look how many 
> >> (fake) "innovations" just showed up.
> >>
> >> This is why solid coherent standardized fundamentals / foundations of 
> >> technology is so important. So we "just know" things intuitively, and 
> >> we can work together to improve things worldwide in a systematic 
> >> fashion, solid brick after solid brick, evolution not revolution. You 
> >> cannot just delete / replace what is already out there working fine.
> >>
> >> Example above is about electronic component, but with the software is 
> >> exactly the same, it is good to stick to well adapted standards, add 
> >> your own brick on top of solid inviolable fundamentals / fundation, 
> >> not necessarily following the quickly changing fashions and trends 
> >> with a lifespan of a yogurt, not spreading bad habits from other 
> >> environments, that will result in far better solution that is 
> >> coherent and long term maintainable. That results in a solid 
> >> foundation for a good system / device / solution / product.
> >>
> >>
> >> -- 
> >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
> >>
> 
>

Re: [Breaking change] Move nxmutex to sched

Posted by Ville Juven <vi...@gmail.com>.

Hi all.

Greg, yes the page pool is virtually addressable by the kernel, but as you
said, the page boundaries are an issue. I am in fact exploring this route
right now, but Jukka's suggestion is also a valid topic to discuss, what
I'm doing does not exclude what Jukka and Zyfeier are proposing.

In order to get cross page access I need to implement a dynamic vma area
for the kernel. This requires a bit of infrastructure but I already have
most of it up and running. What I have now is a mechanism to obtain the
kernel addressable pointer for the semaphore and this really does fix the
semaphore issue. But as you mentioned, the page boundaries are a problem,
and getting that to work is not trivial, but doable.

This "dynamic kernel mapping"-API is generally useful e.g. for I/O remap,
so I will implement it regardless.

However I would still like to discuss the option to split the semaphore
structure, as it avoids doing a page directory walk every time the
semaphore is accessed.

Btw, the semaphore memory must be mapped when calling sem_wait(), because
tcb->waitobj is used by the scheduler and using dynamic mappings or
put/get_user() (because they are what they say, the COPY data, not refer to
it) for that will just simply destroy the scheduling performance.
I also thought that mqueue:s would need a similar fix but I think I'm wrong
on that one, the mqueue memory is kernel memory and the user accesses it
via a file descriptor handle only.

Br,
Ville Juven / pussuw on github

On Mon, Apr 17, 2023 at 6:50 PM Gregory Nutt <sp...@gmail.com> wrote:

> Linux uses functions like copytouser() and copyfromuser() to get
> information to/from user space.
>
> But I think there is an easier way in NuttX.  All user memory comes from
> a poll of pages that are also mapped in kernel space.  I think that is
> true for all architectures.  And there should be a function to convert a
> user virtual address to a user virtual address to physical address.  I
> am not sure what all is in place.
>
> Couldn't accessing user memory from the kernel address alias avoid the
> problem you describe.  Of course, you would have to be careful at page
> boundaries because contiguous virtual pages may not be physically
> contiguous.
>
> On 4/14/2023 6:18 AM, Jukka Laitinen wrote:
> > Hi,
> >
> > I am not sure whether it is necessary to separate mutex and semaphore
> > (although I do see the performance gain that it would give for mutex),
> > but there is another related topic I would like to raise.
> >
> > Currently, the semaphores don't work (at all) for CONFIG_BUILD_KERNEL.
> > The simple reason is that the semaphores are allocated from the
> > user-mapped memory, which is not always available for the kernel while
> > scheduling or in interrupts. At the time when it is needed, there may
> > be another memory map active for mmu.
> >
> > There is also an issue with performance; every semaphore access needs
> > to go to the kernel through syscall, although in principle the
> > semaphore counter handling alone doesn't need that if the compiler &
> > hw has the necessary atomic support.
> >
> > We are especially interested in having real-time behaviour (priority
> > based scheduling, priority inheritance...) AND running
> > CONFIG_BUILD_KERNEL. We have used some methods to circumvent the
> > issue, but for those I am not going into details as we don't have a
> > publishable implementation ready.
> >
> > A tempting way to fix the problem (which we didn't try out yet) would
> > be separating the semaphores in two parts, kernel side structure and
> > the user side structure. Something that zyfeier also did with the
> > "futex" linux-like implementation. But, also this kind of
> > implementation should be real-time - so when there is access to the
> > semaphore via syscall (e.g. when the semaphore blocks), or when
> > scheduling, the kernel must have O(1) access to the kernel side
> > structure - no hashing / allocating etc. at runtime.
> >
> > So to summarize, for CONFIG_BUILD_KERNEL the semaphores could
> > *perhaps* work like this (this is not yet tried out, so please forgive
> > me if something is forgotten):
> > - User-side semaphore handle would have the counter and a direct
> > pointer (handle) to the kernel side structure (which can be passed to
> > kernel in syscall).
> > - Kernel side structure would have the needed wait queue and sem
> > holder structures (and flags?)
> > - Kernel side structure would be allocated at sem_init (AND if it was
> > not initialized, allocate it at the time when it is needed?). To
> > achieve real-time behaviour one should just call sem_init properly at
> > startup of the application.
> > - Kernel side structures would be listed in tcb and cleaned up at
> > task_group exit. Also some hard limit/management for how much kernel
> > memory can one process eat from kernel heap is needed.
> > - Counter manipulation can be handled directly in libc in case
> > compiler supports proper atomic operations, or syscall to kernel when
> > there is no support available (this would be just performance
> > optimization - next phase)
> >
> > Whether it is feasible to do it only for CONFIG_BUILD_KERNEL, or as a
> > common implementation for all  build modes, I didn't think of yet. I
> > am also not sure whether the re-design of semaphore could also lead to
> > better wrapping of it for mutex use, but this is also possible. In
> > that case it could *maybe* solve the performance issue zyfeier tried
> > to tackle.
> >
> > This is just one idea, but somehow the problem of not working
> > semaphores in CONFIG_BUILD_KERNEL should be tackled. I wonder if this
> > is something we should experiment with? If someone is interested in
> > such an experiment, please let me know. Or if someone is interested in
> > doing this experiment, please let me know as well, so we don't end up
> > doing duplicate work :)
> >
> > Br,
> > Jukka
> >
> > Ps. I think that in the current implementation the nxmutex code is
> > inlined everywhere, increasing code size. Not a huge issue for me, but
> > increasing code size should be managed....
> >
> > On 7.4.2023 5.18, zyfeier wrote:
> >>
> >> Thank you very much for the example you provided. What I want to
> >> point out is that this is not just about " just delete / replace what
> >> is already out there working fine ". Due to the multi-holder of the
> >> count semaphore, the performance of the mutex is much worse than
> >> other RTOS (with a performance gap of 10%), but these operations are
> >> not necessary for the mutex. That's why there is an idea to separate
> >> the mutex and semaphore.
> >>
> >> However, if everyone thinks that separating the mutex and semaphore
> >> is a bad idea, then we need to think of other methods. Do you have
> >> any better methods to offer?
> >>
> >> 从Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
> >>
> >> *发件人: *Tomek CEDRO <ma...@cedro.info>
> >> *发送时间: *2023年4月6日22:36
> >> *收件人: *dev@nuttx.apache.org
> >> *主题: *Re: [Breaking change] Move nxmutex to sched
> >>
> >> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> >> > Oh my God!  That sounds terrible!  Does this change actually do
> >> > /anything /positive.
> >>
> >> Look Zyfeier, its not that we oppose development, we want the
> >> development to done the right way that will bring elegant coherent
> >> standard compliant solution as a result :-)
> >>
> >> Aside from my previous remark on Linux (along with other commercial
> >> OS) "enforced changes", lets think about Greg's "does this change
> >> actually do /anything /positive" question with another example.
> >>
> >> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an
> >> innovation" by changing the Pin1 marking on the casing and put that
> >> mark on pin 3 instead. Whole world use Pin1 marking to quickly align
> >> a component pinout, so at first glance you can see where is the pin 1
> >> of the component, also most chips use VCC there so you can quickly
> >> measure things, nothing fancy, everyone knows that. Now take a look
> >> at the pcb design footprint (bottom layer mirrored) and the led
> >> datasheet.
> >>
> >>
> >> You can clearly see that putting Pin1 casing mark on pin 3 is a
> >> terrible idea, even more that chip is symmetrical, so it will lead to
> >> bad placing and reversed power supply. Sure, this is some innovation,
> >> but world does not work that way and everyone just gets confused.
> >> When you make such changes to other components a design becomes
> >> incoherent and no one will then know anything, but look how many
> >> (fake) "innovations" just showed up.
> >>
> >> This is why solid coherent standardized fundamentals / foundations of
> >> technology is so important. So we "just know" things intuitively, and
> >> we can work together to improve things worldwide in a systematic
> >> fashion, solid brick after solid brick, evolution not revolution. You
> >> cannot just delete / replace what is already out there working fine.
> >>
> >> Example above is about electronic component, but with the software is
> >> exactly the same, it is good to stick to well adapted standards, add
> >> your own brick on top of solid inviolable fundamentals / fundation,
> >> not necessarily following the quickly changing fashions and trends
> >> with a lifespan of a yogurt, not spreading bad habits from other
> >> environments, that will result in far better solution that is
> >> coherent and long term maintainable. That results in a solid
> >> foundation for a good system / device / solution / product.
> >>
> >>
> >> --
> >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
> >>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

Linux uses functions like copytouser() and copyfromuser() to get 
information to/from user space.

But I think there is an easier way in NuttX.  All user memory comes from 
a poll of pages that are also mapped in kernel space.  I think that is 
true for all architectures.  And there should be a function to convert a 
user virtual address to a user virtual address to physical address.  I 
am not sure what all is in place.

Couldn't accessing user memory from the kernel address alias avoid the 
problem you describe.  Of course, you would have to be careful at page 
boundaries because contiguous virtual pages may not be physically 
contiguous.

On 4/14/2023 6:18 AM, Jukka Laitinen wrote:
> Hi,
>
> I am not sure whether it is necessary to separate mutex and semaphore 
> (although I do see the performance gain that it would give for mutex), 
> but there is another related topic I would like to raise.
>
> Currently, the semaphores don't work (at all) for CONFIG_BUILD_KERNEL. 
> The simple reason is that the semaphores are allocated from the 
> user-mapped memory, which is not always available for the kernel while 
> scheduling or in interrupts. At the time when it is needed, there may 
> be another memory map active for mmu.
>
> There is also an issue with performance; every semaphore access needs 
> to go to the kernel through syscall, although in principle the 
> semaphore counter handling alone doesn't need that if the compiler & 
> hw has the necessary atomic support.
>
> We are especially interested in having real-time behaviour (priority 
> based scheduling, priority inheritance...) AND running 
> CONFIG_BUILD_KERNEL. We have used some methods to circumvent the 
> issue, but for those I am not going into details as we don't have a 
> publishable implementation ready.
>
> A tempting way to fix the problem (which we didn't try out yet) would 
> be separating the semaphores in two parts, kernel side structure and 
> the user side structure. Something that zyfeier also did with the 
> "futex" linux-like implementation. But, also this kind of 
> implementation should be real-time - so when there is access to the 
> semaphore via syscall (e.g. when the semaphore blocks), or when 
> scheduling, the kernel must have O(1) access to the kernel side 
> structure - no hashing / allocating etc. at runtime.
>
> So to summarize, for CONFIG_BUILD_KERNEL the semaphores could 
> *perhaps* work like this (this is not yet tried out, so please forgive 
> me if something is forgotten):
> - User-side semaphore handle would have the counter and a direct 
> pointer (handle) to the kernel side structure (which can be passed to 
> kernel in syscall).
> - Kernel side structure would have the needed wait queue and sem 
> holder structures (and flags?)
> - Kernel side structure would be allocated at sem_init (AND if it was 
> not initialized, allocate it at the time when it is needed?). To 
> achieve real-time behaviour one should just call sem_init properly at 
> startup of the application.
> - Kernel side structures would be listed in tcb and cleaned up at 
> task_group exit. Also some hard limit/management for how much kernel 
> memory can one process eat from kernel heap is needed.
> - Counter manipulation can be handled directly in libc in case 
> compiler supports proper atomic operations, or syscall to kernel when 
> there is no support available (this would be just performance 
> optimization - next phase)
>
> Whether it is feasible to do it only for CONFIG_BUILD_KERNEL, or as a 
> common implementation for all  build modes, I didn't think of yet. I 
> am also not sure whether the re-design of semaphore could also lead to 
> better wrapping of it for mutex use, but this is also possible. In 
> that case it could *maybe* solve the performance issue zyfeier tried 
> to tackle.
>
> This is just one idea, but somehow the problem of not working 
> semaphores in CONFIG_BUILD_KERNEL should be tackled. I wonder if this 
> is something we should experiment with? If someone is interested in 
> such an experiment, please let me know. Or if someone is interested in 
> doing this experiment, please let me know as well, so we don't end up 
> doing duplicate work :)
>
> Br,
> Jukka
>
> Ps. I think that in the current implementation the nxmutex code is 
> inlined everywhere, increasing code size. Not a huge issue for me, but 
> increasing code size should be managed....
>
> On 7.4.2023 5.18, zyfeier wrote:
>>
>> Thank you very much for the example you provided. What I want to 
>> point out is that this is not just about " just delete / replace what 
>> is already out there working fine ". Due to the multi-holder of the 
>> count semaphore, the performance of the mutex is much worse than 
>> other RTOS (with a performance gap of 10%), but these operations are 
>> not necessary for the mutex. That's why there is an idea to separate 
>> the mutex and semaphore.
>>
>> However, if everyone thinks that separating the mutex and semaphore 
>> is a bad idea, then we need to think of other methods. Do you have 
>> any better methods to offer?
>>
>> 从Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>>
>> *发件人: *Tomek CEDRO <ma...@cedro.info>
>> *发送时间: *2023年4月6日22:36
>> *收件人: *dev@nuttx.apache.org
>> *主题: *Re: [Breaking change] Move nxmutex to sched
>>
>> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
>> > Oh my God!  That sounds terrible!  Does this change actually do
>> > /anything /positive.
>>
>> Look Zyfeier, its not that we oppose development, we want the 
>> development to done the right way that will bring elegant coherent 
>> standard compliant solution as a result :-)
>>
>> Aside from my previous remark on Linux (along with other commercial 
>> OS) "enforced changes", lets think about Greg's "does this change 
>> actually do /anything /positive" question with another example.
>>
>> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an 
>> innovation" by changing the Pin1 marking on the casing and put that 
>> mark on pin 3 instead. Whole world use Pin1 marking to quickly align 
>> a component pinout, so at first glance you can see where is the pin 1 
>> of the component, also most chips use VCC there so you can quickly 
>> measure things, nothing fancy, everyone knows that. Now take a look 
>> at the pcb design footprint (bottom layer mirrored) and the led 
>> datasheet.
>>
>>
>> You can clearly see that putting Pin1 casing mark on pin 3 is a 
>> terrible idea, even more that chip is symmetrical, so it will lead to 
>> bad placing and reversed power supply. Sure, this is some innovation, 
>> but world does not work that way and everyone just gets confused. 
>> When you make such changes to other components a design becomes 
>> incoherent and no one will then know anything, but look how many 
>> (fake) "innovations" just showed up.
>>
>> This is why solid coherent standardized fundamentals / foundations of 
>> technology is so important. So we "just know" things intuitively, and 
>> we can work together to improve things worldwide in a systematic 
>> fashion, solid brick after solid brick, evolution not revolution. You 
>> cannot just delete / replace what is already out there working fine.
>>
>> Example above is about electronic component, but with the software is 
>> exactly the same, it is good to stick to well adapted standards, add 
>> your own brick on top of solid inviolable fundamentals / fundation, 
>> not necessarily following the quickly changing fashions and trends 
>> with a lifespan of a yogurt, not spreading bad habits from other 
>> environments, that will result in far better solution that is 
>> coherent and long term maintainable. That results in a solid 
>> foundation for a good system / device / solution / product.
>>
>>
>> -- 
>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>>

Re: [Breaking change] Move nxmutex to sched

Posted by Jukka Laitinen <ju...@iki.fi>.

Hi,

I am not sure whether it is necessary to separate mutex and semaphore 
(although I do see the performance gain that it would give for mutex), 
but there is another related topic I would like to raise.

Currently, the semaphores don't work (at all) for CONFIG_BUILD_KERNEL. 
The simple reason is that the semaphores are allocated from the 
user-mapped memory, which is not always available for the kernel while 
scheduling or in interrupts. At the time when it is needed, there may be 
another memory map active for mmu.

There is also an issue with performance; every semaphore access needs to 
go to the kernel through syscall, although in principle the semaphore 
counter handling alone doesn't need that if the compiler & hw has the 
necessary atomic support.

We are especially interested in having real-time behaviour (priority 
based scheduling, priority inheritance...) AND running 
CONFIG_BUILD_KERNEL. We have used some methods to circumvent the issue, 
but for those I am not going into details as we don't have a publishable 
implementation ready.

A tempting way to fix the problem (which we didn't try out yet) would be 
separating the semaphores in two parts, kernel side structure and the 
user side structure. Something that zyfeier also did with the "futex" 
linux-like implementation. But, also this kind of implementation should 
be real-time - so when there is access to the semaphore via syscall 
(e.g. when the semaphore blocks), or when scheduling, the kernel must 
have O(1) access to the kernel side structure - no hashing / allocating 
etc. at runtime.

So to summarize, for CONFIG_BUILD_KERNEL the semaphores could *perhaps* 
work like this (this is not yet tried out, so please forgive me if 
something is forgotten):
- User-side semaphore handle would have the counter and a direct pointer 
(handle) to the kernel side structure (which can be passed to kernel in 
syscall).
- Kernel side structure would have the needed wait queue and sem holder 
structures (and flags?)
- Kernel side structure would be allocated at sem_init (AND if it was 
not initialized, allocate it at the time when it is needed?). To achieve 
real-time behaviour one should just call sem_init properly at startup of 
the application.
- Kernel side structures would be listed in tcb and cleaned up at 
task_group exit. Also some hard limit/management for how much kernel 
memory can one process eat from kernel heap is needed.
- Counter manipulation can be handled directly in libc in case compiler 
supports proper atomic operations, or syscall to kernel when there is no 
support available (this would be just performance optimization - next phase)

Whether it is feasible to do it only for CONFIG_BUILD_KERNEL, or as a 
common implementation for all  build modes, I didn't think of yet. I am 
also not sure whether the re-design of semaphore could also lead to 
better wrapping of it for mutex use, but this is also possible. In that 
case it could *maybe* solve the performance issue zyfeier tried to tackle.

This is just one idea, but somehow the problem of not working semaphores 
in CONFIG_BUILD_KERNEL should be tackled. I wonder if this is something 
we should experiment with? If someone is interested in such an 
experiment, please let me know. Or if someone is interested in doing 
this experiment, please let me know as well, so we don't end up doing 
duplicate work :)

Br,
Jukka

Ps. I think that in the current implementation the nxmutex code is 
inlined everywhere, increasing code size. Not a huge issue for me, but 
increasing code size should be managed....

On 7.4.2023 5.18, zyfeier wrote:
>
> Thank you very much for the example you provided. What I want to point 
> out is that this is not just about " just delete / replace what is 
> already out there working fine ". Due to the multi-holder of the count 
> semaphore, the performance of the mutex is much worse than other RTOS 
> (with a performance gap of 10%), but these operations are not 
> necessary for the mutex. That's why there is an idea to separate the 
> mutex and semaphore.
>
> However, if everyone thinks that separating the mutex and semaphore is 
> a bad idea, then we need to think of other methods. Do you have any 
> better methods to offer?
>
> 从Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>
> *发件人: *Tomek CEDRO <ma...@cedro.info>
> *发送时间: *2023年4月6日22:36
> *收件人: *dev@nuttx.apache.org
> *主题: *Re: [Breaking change] Move nxmutex to sched
>
> On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> > Oh my God!  That sounds terrible!  Does this change actually do
> > /anything /positive.
>
> Look Zyfeier, its not that we oppose development, we want the 
> development to done the right way that will bring elegant coherent 
> standard compliant solution as a result :-)
>
> Aside from my previous remark on Linux (along with other commercial 
> OS) "enforced changes", lets think about Greg's "does this change 
> actually do /anything /positive" question with another example.
>
> Take a looks at WS2812 RGB Smart LED. They decided to introduce "an 
> innovation" by changing the Pin1 marking on the casing and put that 
> mark on pin 3 instead. Whole world use Pin1 marking to quickly align a 
> component pinout, so at first glance you can see where is the pin 1 of 
> the component, also most chips use VCC there so you can quickly 
> measure things, nothing fancy, everyone knows that. Now take a look at 
> the pcb design footprint (bottom layer mirrored) and the led datasheet.
>
>
> You can clearly see that putting Pin1 casing mark on pin 3 is a 
> terrible idea, even more that chip is symmetrical, so it will lead to 
> bad placing and reversed power supply. Sure, this is some innovation, 
> but world does not work that way and everyone just gets confused. When 
> you make such changes to other components a design becomes incoherent 
> and no one will then know anything, but look how many (fake) 
> "innovations" just showed up.
>
> This is why solid coherent standardized fundamentals / foundations of 
> technology is so important. So we "just know" things intuitively, and 
> we can work together to improve things worldwide in a systematic 
> fashion, solid brick after solid brick, evolution not revolution. You 
> cannot just delete / replace what is already out there working fine.
>
> Example above is about electronic component, but with the software is 
> exactly the same, it is good to stick to well adapted standards, add 
> your own brick on top of solid inviolable fundamentals / fundation, 
> not necessarily following the quickly changing fashions and trends 
> with a lifespan of a yogurt, not spreading bad habits from other 
> environments, that will result in far better solution that is coherent 
> and long term maintainable. That results in a solid foundation for a 
> good system / device / solution / product.
>
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>

Re: [Breaking change] Move nxmutex to sched

Posted by zyfeier <zy...@126.com>.

Thank you very much for the example you provided. What I want to point out is that this is not just about " just delete / replace what is already out there working fine ". Due to the multi-holder of the count semaphore, the performance of the mutex is much worse than other RTOS (with a performance gap of 10%), but these operations are not necessary for the mutex. That's why there is an idea to separate the mutex and semaphore.

However, if everyone thinks that separating the mutex and semaphore is a bad idea, then we need to think of other methods. Do you have any better methods to offer?

从 Windows 版邮件发送

发件人: Tomek CEDRO
发送时间: 2023年4月6日 22:36
收件人: dev@nuttx.apache.org
主题: Re: [Breaking change] Move nxmutex to sched

On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> Oh my God! That sounds terrible! Does this change actually do
> /anything /positive.

Look Zyfeier, its not that we oppose development, we want the development to done the right way that will bring elegant coherent standard compliant solution as a result :-)
Aside from my previous remark on Linux (along with other commercial OS) "enforced changes", lets think about Greg's "does this change actually do /anything /positive" question with another example.

Take a looks at WS2812 RGB Smart LED. They decided to introduce "an innovation" by changing the Pin1 marking on the casing and put that mark on pin 3 instead. Whole world use Pin1 marking to quickly align a component pinout, so at first glance you can see where is the pin 1 of the component, also most chips use VCC there so you can quickly measure things, nothing fancy, everyone knows that. Now take a look at the pcb design footprint (bottom layer mirrored) and the led datasheet.

You can clearly see that putting Pin1 casing mark on pin 3 is a terrible idea, even more that chip is symmetrical, so it will lead to bad placing and reversed power supply. Sure, this is some innovation, but world does not work that way and everyone just gets confused. When you make such changes to other components a design becomes incoherent and no one will then know anything, but look how many (fake) "innovations" just showed up.

This is why solid coherent standardized fundamentals / foundations of technology is so important. So we "just know" things intuitively, and we can work together to improve things worldwide in a systematic fashion, solid brick after solid brick, evolution not revolution. You cannot just delete / replace what is already out there working fine.

Example above is about electronic component, but with the software is exactly the same, it is good to stick to well adapted standards, add your own brick on top of solid inviolable fundamentals / fundation, not necessarily following the quickly changing fashions and trends with a lifespan of a yogurt, not spreading bad habits from other environments, that will result in far better solution that is coherent and long term maintainable. That results in a solid foundation for a good system / device / solution / product.

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

Posted by Tomek CEDRO <to...@cedro.info>.

On Thu, Apr 6, 2023 at 2:58 PM Gregory Nutt wrote:
> Oh my God!  That sounds terrible!  Does this change actually do
> /anything /positive.

Look Zyfeier, its not that we oppose development, we want the development
to done the right way that will bring elegant coherent standard compliant
solution as a result :-)

Aside from my previous remark on Linux (along with other commercial OS)
"enforced changes", lets think about Greg's "does this change actually do
/anything /positive" question with another example.

Take a looks at WS2812 RGB Smart LED. They decided to introduce "an
innovation" by changing the Pin1 marking on the casing and put that mark on
pin 3 instead. Whole world use Pin1 marking to quickly align a component
pinout, so at first glance you can see where is the pin 1 of the component,
also most chips use VCC there so you can quickly measure things, nothing
fancy, everyone knows that. Now take a look at the pcb design footprint
(bottom layer mirrored) and the led datasheet.

[image: shot-2023-04-06_16-09-06.jpg]
[image: shot-2023-04-06_16-09-23.jpg]
You can clearly see that putting Pin1 casing mark on pin 3 is a terrible
idea, even more that chip is symmetrical, so it will lead to bad placing
and reversed power supply. Sure, this is some innovation, but world does
not work that way and everyone just gets confused. When you make such
changes to other components a design becomes incoherent and no one will
then know anything, but look how many (fake) "innovations" just showed up.

This is why solid coherent standardized fundamentals / foundations of
technology is so important. So we "just know" things intuitively, and we
can work together to improve things worldwide in a systematic fashion,
solid brick after solid brick, evolution not revolution. You cannot just
delete / replace what is already out there working fine.

Example above is about electronic component, but with the software is
exactly the same, it is good to stick to well adapted standards, add your
own brick on top of solid inviolable fundamentals / fundation, not
necessarily following the quickly changing fashions and trends with a
lifespan of a yogurt, not spreading bad habits from other environments,
that will result in far better solution that is coherent and long term
maintainable. That results in a solid foundation for a good system / device
/ solution / product.

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> If we can have stable operation of priority inheritance with semaphores
> then we can build mutex as a macro based wrapper (mostly as it is now).
>
> One thing that I can think of is adding a "mutex" attribute to a semaphore
> and creating a separate task list for faster scheduling, but even that is
> up to discussion with the community.

I don't see "mutex-ness" as being fundamental, discriminating 
difference.  A mutex is, after all, just a binary counting semaphore:  
It only counts 0 and 1, or true and false.

I do see the difference between locking behavior and the IPC signaling 
behavior as being fundamental.  A mutex or binary semaphore protecting a 
resource or a counting semaphore protecting multiple resources are 
really logically the same; the range of the count does not make them 
inherently different.  The common locking functionality is more fundamental.

A semaphore can be used to restrict access across different tasks;  a 
pthread mutex can only used within a signle task.  That is a bigger 
difference than the range of the count.

semaphores when used as IPCs to signal events between tasks is, however, 
a very different usage.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

I agree with Greg!

If we can have stable operation of priority inheritance with semaphores
then we can build mutex as a macro based wrapper (mostly as it is now).

One thing that I can think of is adding a "mutex" attribute to a semaphore
and creating a separate task list for faster scheduling, but even that is
up to discussion with the community.

Best regards,
Petro

чт, 6 квіт. 2023 р. о 15:58 Gregory Nutt <sp...@gmail.com> пише:

> Oh my God!  That sounds terrible!  Does this change actually do
> /anything /positive.
>
> Duplicating the internal implementation of Linux is /NOT /positive.  It
> is irrelevant.
>
> Adopting GNU/Linux interfaces as a functional specification made since.
> But duplicating the ibnternal implementation of Linux is the dumbest
> thing I have ever heard.
>
> Losing any real time performance is /catastrophic /for an RTOS.
>
> Significantly increasing code size is /catastrophic /for an embedded OS.
>
> This is a very bad change that should never come into the repository.
> My recommendation is that you quietly close the PR without merge and be
> done with it.
>
> The solution that we want is:
>
>   * One that conforms to interface standards
>   * Results in the smallest footprint possible
>   * Gives the best real time behavior possible.
>
> Nothing else should be accepted.
>
> On 4/6/2023 5:27 AM, zyfeier wrote:
> > Hi, All:
> >
> > The main purpose of this PR is to separate mutex and semaphore and to
> improve the performance of mutex.
> > Regarding the issues mentioned in the email , here is a summary:
> >
> > 1. The number of system calls will increase.
> >
> > After separating mutex and semaphore, futex support will be added in the
> next step, which can reduce the number of system calls.
> >
> > 2. Code size will increase.
> >
> > The increase in code size is a drawback of this modification, but the
> mutex performance has improved by 200 cycles. I  think code size can be
> considered as a potential optimization item in the future.
> >
> > 3. Removing semaphore priority inheritance will affect real-time
> performance.
> >
> > Semaphore priority inheritance will be retained. After separating mutex
> and semaphore, they will have their own priority inheritance handling.
> Choose which one to enable according to your needs.
> >
> >
> > Thanks.
> > Yuan.
> >
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

Oh my God!  That sounds terrible!  Does this change actually do 
/anything /positive.

Duplicating the internal implementation of Linux is /NOT /positive.  It 
is irrelevant.

Adopting GNU/Linux interfaces as a functional specification made since.  
But duplicating the ibnternal implementation of Linux is the dumbest 
thing I have ever heard.

Losing any real time performance is /catastrophic /for an RTOS.

Significantly increasing code size is /catastrophic /for an embedded OS.

This is a very bad change that should never come into the repository.  
My recommendation is that you quietly close the PR without merge and be 
done with it.

The solution that we want is:

  * One that conforms to interface standards
  * Results in the smallest footprint possible
  * Gives the best real time behavior possible.

Nothing else should be accepted.

On 4/6/2023 5:27 AM, zyfeier wrote:
> Hi, All:
>
> The main purpose of this PR is to separate mutex and semaphore and to  improve the performance of mutex.
> Regarding the issues mentioned in the email , here is a summary:
>
> 1. The number of system calls will increase.
>
> After separating mutex and semaphore, futex support will be added in the next step, which can reduce the number of system calls.
>
> 2. Code size will increase.
>
> The increase in code size is a drawback of this modification, but the mutex performance has improved by 200 cycles. I  think code size can be considered as a potential optimization item in the future.
>
> 3. Removing semaphore priority inheritance will affect real-time performance.
>
> Semaphore priority inheritance will be retained. After separating mutex and semaphore, they will have their own priority inheritance handling. Choose which one to enable according to your needs.
>
>
> Thanks.
> Yuan.
>

Re: [Breaking change] Move nxmutex to sched

Posted by zyfeier <zy...@126.com>.

Hi, All:

The main purpose of this PR is to separate mutex and semaphore and to  improve the performance of mutex.
Regarding the issues mentioned in the email , here is a summary:

1. The number of system calls will increase.

After separating mutex and semaphore, futex support will be added in the next step, which can reduce the number of system calls.

2. Code size will increase.

The increase in code size is a drawback of this modification, but the mutex performance has improved by 200 cycles. I  think code size can be considered as a potential optimization item in the future.

3. Removing semaphore priority inheritance will affect real-time performance.

Semaphore priority inheritance will be retained. After separating mutex and semaphore, they will have their own priority inheritance handling. Choose which one to enable according to your needs.


Thanks.
Yuan.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

I drafted some optimization in https://github.com/apache/nuttx/pull/8951
I would appreciate some support with that if possible as these days I'm a
bit overloaded.

Best regards,
Petro

сб, 1 квіт. 2023 р. о 23:35 David S. Alessio <da...@gmail.com>
пише:

>
>
> > On Apr 1, 2023, at 12:54 PM, David S. Alessio <da...@gmail.com>
> wrote:
> >
> >
> >
> >> On Mar 31, 2023, at 9:34 AM, Gregory Nutt <spudaneco@gmail.com <mailto:
> spudaneco@gmail.com>> wrote:
> >>
> >> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> >>>
> >>>> Even more. In my previous example if semaphore is posted from the
> interrupt
> >>>> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> >>>> semaphore.
> >>>>
> >>> You are right.  In this usage case, the counting semaphore is not
> being used for locking; it is being used for signalling an event per
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> <
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> >
> >>>
> >>> In that case, priority inheritance must be turned off.
> >>>
> >> You example is really confusing because you are mixing two different
> concepts, just subtly enough to be really confusing.  If a counting
> semaphore is used for locking a set of multiple resources, the posting the
> semaphore does not release the resource.  That is not the way that it is
> done.  Here is a more believable example of how this would work:
> >>
> >> 1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> >>   channel allocation function.  If a DMA channel is available, it is
> >>   allocated and the allocation function takes the semaphore. TaskA
> >>   then starts DMA activity.
> >> 2. TaskA waits on another signaling semaphore for the DMA to complete.
> >> 3. TaskB with priority 20 does the same.
> >> 4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> >>   the channel allocation function which waits on the sempahore for a
> >>   count to be available.  This boost the priority of TaskA and TaskB
> >>   to 30.
> >> 5. When the DMA started by TaskA completes, it signals TaskA which
> >>   calls the resource deallocation function which increments the
> >>   counting semaphore, giving the count to TaskC and storing the base
> >>   priorities.
> >>
> >> The confusion arises because you are mixing the signaling logic with
> the resource deallocation logic.
> >
> >
> > Greg, quite right.  But Petro himself pointed out:
> >> TaskC with priority 30 wants to allocate a DMA channel, so boosts
> priority
> >> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
> >> operation completion).
> >
> >
> > So what is the problem here for which Priority Inheritance is the
> solution?
> >
> > Technically, there is priority inversion here, but 1) it’s not
> unbounded, the channel will be free as soon as DMA completes; and 2)
> boosting task priority will not reduce the latency of freeing a DMA channel.
> >
> > This scenario requires a counting semaphore that’s used as for signaling
> HW resource availability; and if there are any blocked callers, the highest
> priority caller is unblocked first.
>
> I should add: the DMA channel should be freed either 1) in the ISR
> handling DMA interrupts after cleaning up HW, or if there’s a bit more work
> to do in cleanup up after DMA, in a HP worker thread.
>
> In this way the handing of the DMA channel(s), counting semaphore; and
> their blockers is completely independent of the priority of the caller.
>
>
>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by "David S. Alessio" <da...@gmail.com>.


> On Apr 1, 2023, at 12:54 PM, David S. Alessio <da...@gmail.com> wrote:
> 
> 
> 
>> On Mar 31, 2023, at 9:34 AM, Gregory Nutt <spudaneco@gmail.com <ma...@gmail.com>> wrote:
>> 
>> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>>> 
>>>> Even more. In my previous example if semaphore is posted from the interrupt
>>>> we do not know which of TaskA or TaskB is no longer a "holder l" of a
>>>> semaphore.
>>>> 
>>> You are right.  In this usage case, the counting semaphore is not being used for locking; it is being used for signalling an event per https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance <https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance>
>>> 
>>> In that case, priority inheritance must be turned off.
>>> 
>> You example is really confusing because you are mixing two different concepts, just subtly enough to be really confusing.  If a counting semaphore is used for locking a set of multiple resources, the posting the semaphore does not release the resource.  That is not the way that it is done.  Here is a more believable example of how this would work:
>> 
>> 1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>>   channel allocation function.  If a DMA channel is available, it is
>>   allocated and the allocation function takes the semaphore. TaskA
>>   then starts DMA activity.
>> 2. TaskA waits on another signaling semaphore for the DMA to complete.
>> 3. TaskB with priority 20 does the same.
>> 4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>>   the channel allocation function which waits on the sempahore for a
>>   count to be available.  This boost the priority of TaskA and TaskB
>>   to 30.
>> 5. When the DMA started by TaskA completes, it signals TaskA which
>>   calls the resource deallocation function which increments the
>>   counting semaphore, giving the count to TaskC and storing the base
>>   priorities.
>> 
>> The confusion arises because you are mixing the signaling logic with the resource deallocation logic.
> 
> 
> Greg, quite right.  But Petro himself pointed out:
>> TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
>> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
>> operation completion).
> 
> 
> So what is the problem here for which Priority Inheritance is the solution?
> 
> Technically, there is priority inversion here, but 1) it’s not unbounded, the channel will be free as soon as DMA completes; and 2) boosting task priority will not reduce the latency of freeing a DMA channel.
> 
> This scenario requires a counting semaphore that’s used as for signaling HW resource availability; and if there are any blocked callers, the highest priority caller is unblocked first.

I should add: the DMA channel should be freed either 1) in the ISR handling DMA interrupts after cleaning up HW, or if there’s a bit more work to do in cleanup up after DMA, in a HP worker thread.

In this way the handing of the DMA channel(s), counting semaphore; and their blockers is completely independent of the priority of the caller.

Re: [Breaking change] Move nxmutex to sched

Posted by "David S. Alessio" <da...@gmail.com>.


> On Mar 31, 2023, at 9:34 AM, Gregory Nutt <sp...@gmail.com> wrote:
> 
> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>> 
>>> Even more. In my previous example if semaphore is posted from the interrupt
>>> we do not know which of TaskA or TaskB is no longer a "holder l" of a
>>> semaphore.
>>> 
>> You are right.  In this usage case, the counting semaphore is not being used for locking; it is being used for signalling an event per https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>> 
>> In that case, priority inheritance must be turned off.
>> 
> You example is really confusing because you are mixing two different concepts, just subtly enough to be really confusing.  If a counting semaphore is used for locking a set of multiple resources, the posting the semaphore does not release the resource.  That is not the way that it is done.  Here is a more believable example of how this would work:
> 
> 1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>   channel allocation function.  If a DMA channel is available, it is
>   allocated and the allocation function takes the semaphore. TaskA
>   then starts DMA activity.
> 2. TaskA waits on another signaling semaphore for the DMA to complete.
> 3. TaskB with priority 20 does the same.
> 4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>   the channel allocation function which waits on the sempahore for a
>   count to be available.  This boost the priority of TaskA and TaskB
>   to 30.
> 5. When the DMA started by TaskA completes, it signals TaskA which
>   calls the resource deallocation function which increments the
>   counting semaphore, giving the count to TaskC and storing the base
>   priorities.
> 
> The confusion arises because you are mixing the signaling logic with the resource deallocation logic.


Greg, quite right.  But Petro himself pointed out:
> TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
> operation completion).


So what is the problem here for which Priority Inheritance is the solution?

Technically, there is priority inversion here, but 1) it’s not unbounded, the channel will be free as soon as DMA completes; and 2) boosting task priority will not reduce the latency of freeing a DMA channel.

This scenario requires a counting semaphore that’s used as for signaling HW resource availability; and if there are any blocked callers, the highest priority caller is unblocked first.

Re: [Breaking change] Move nxmutex to sched

Posted by Xiang Xiao <xi...@gmail.com>.

When the priority inheritance is enabled on a semaphore, sem_wait will add
the current thread to the semaphore's holder table and expect that the same
thread will call sem_post later to remove it from the holder table.
If we mess this fundamental assumption by waiting/posting from different
threads, many strange things will happen. For example, let's consider
what's happen when a program send a TCP packet:

   1. The send task call sem_wait to become a holder and get IOB
   2. Network subsystem copy the user buffer into IOB and add IOB to the
   queue
   3. The send task exit and then semphare contain a dangling pointer to
   the sending tcb
   4. After network subsystem send IOB to the wire and return it  the pool,
   sem_post is called and will touch the dangling pointer

Zeng Zhaoxiu provides a patch(https://github.com/apache/nuttx/pull/5171) to
workaround this issue.
But, the semaphore holder tracking can't work as we expect anymore.

On Sat, Apr 1, 2023 at 3:52 AM Petro Karashchenko <
petro.karashchenko@gmail.com> wrote:

> Xiang Xiao, is that still true for the latest code in master branch?
> And by "system will
> crash if  the priority inheritance enabled semaphore is waited or posted
> from different threads" do you mean at the point of sem_post/sem_wait or
> some system instability in general?
>
> Best regards,
> Petro
>
> On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao <xi...@gmail.com>
> wrote:
>
> > BTW, https://github.com/apache/nuttx/pull/5070 report that the system
> will
> > crash if  the priority inheritance enabled semaphore is waited or posted
> > from different threads.
> >
> > On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao <xi...@gmail.com>
> > wrote:
> >
> > >
> > >
> > > On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <sp...@gmail.com>
> > wrote:
> > >
> > >> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> > >> >
> > >> >> Even more. In my previous example if semaphore is posted from the
> > >> >> interrupt
> > >> >> we do not know which of TaskA or TaskB is no longer a "holder l"
> of a
> > >> >> semaphore.
> > >> >>
> > >> > You are right.  In this usage case, the counting semaphore is not
> > >> > being used for locking; it is being used for signalling an event per
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> > >> >
> > >> > In that case, priority inheritance must be turned off.
> > >> >
> > >> You example is really confusing because you are mixing two different
> > >> concepts, just subtly enough to be really confusing.  If a counting
> > >> semaphore is used for locking a set of multiple resources, the posting
> > >> the semaphore does not release the resource.  That is not the way that
> > >> it is done.  Here is a more believable example of how this would work:
> > >>
> > >>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> > >>     channel allocation function.  If a DMA channel is available, it is
> > >>     allocated and the allocation function takes the semaphore. TaskA
> > >>     then starts DMA activity.
> > >>  2. TaskA waits on another signaling semaphore for the DMA to
> complete.
> > >>  3. TaskB with priority 20 does the same.
> > >>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> > >>     the channel allocation function which waits on the sempahore for a
> > >>     count to be available.  This boost the priority of TaskA and TaskB
> > >>     to 30.
> > >>  5. When the DMA started by TaskA completes, it signals TaskA which
> > >>     calls the resource deallocation function which increments the
> > >>     counting semaphore, giving the count to TaskC and storing the base
> > >>     priorities.
> > >>
> > >>
> > >
> > > Normally, the resource(dma channel here) is allocated from one
> > > thread/task, but may be freed in another thread/task. Please consider
> how
> > > we malloc and free memory.
> > >
> > >
> > >> The confusion arises because you are mixing the signaling logic with
> the
> > >> resource deallocation logic.
> > >>
> > >> The mm/iob logic works basically this way.  The logic more complex
> then
> > >> you would think from above.  IOBs is an example of a critical system
> > >> resource that has multiple copies and utilizes a counting semaphore
> with
> > >> priority inheritance to achieve good real time performance.   IOB
> > >> handling is key logic for the correct real time operation of the
> overall
> > >> system.  Nothing we do must risk this.
> > >>
> > >>
> > > IOB is a very good example to demonstrate why it's a bad and dangerous
> > > idea to enable priority inheritance for the counting semaphore. IOB is
> > > normally allocated in the send thread but free in the work thread. If
> we
> > > want the priority inheritance to work as expected instead of crashing
> the
> > > system, sem_wait/sem_post must come from the same thread, which is a
> kind
> > > of lock.
> > >
> > >
> > >> Other places where this logic is (probably) used:
> > >>
> > >>     arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
> > >>     CONFIG_RP2040_I2S_MAXINFLIGHT);
> > >>     arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
> > >>     init_val))
> > >>     arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> > >>     CONFIG_SAMA5_SSC_MAXINFLIGHT);
> > >>     arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> > >>     CONFIG_SAMV7_SSC_MAXINFLIGHT);
> > >>     arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
> > >>     CONFIG_STM32_I2S_MAXINFLIGHT);
> > >>     drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
> > >>     MCP2515_NUM_TX_BUFFERS);
> > >>     drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
> > >>     CONFIG_VNCSERVER_NUPDATES);
> > >>     sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
> > >>     0, (ntasks_waiting + 1));
> > >>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem,
> 0,
> > >>     g_btdev.le_pkts);
> > >>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0,
> > 1);
> > >>     wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
> > >>     CONFIG_MAC802154_NTXDESC);
> > >>     wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
> > >>
> > >> Maybe:
> > >>
> > >>     arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0,
> > init);
> > >>     arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
> > >>     sem_init(&bt_sem->sem, 0, init);
> > >>     arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
> > >>     nxsem_init(sem, 0, init);
> > >>     arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
> > >> init);
> > >>     arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem,
> 0,
> > >>     init);
> > >>     arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
> > >>     nxsem_init(sem, 0, init);
> > >>
> > >>
> > >
> > >
> > >
> >
>

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Xiang Xiao, is that still true for the latest code in master branch?
And by "system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads" do you mean at the point of sem_post/sem_wait or
some system instability in general?

Best regards,
Petro

On Fri, Mar 31, 2023, 10:39 PM Xiang Xiao <xi...@gmail.com> wrote:

> BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
> crash if  the priority inheritance enabled semaphore is waited or posted
> from different threads.
>
> On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao <xi...@gmail.com>
> wrote:
>
> >
> >
> > On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <sp...@gmail.com>
> wrote:
> >
> >> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> >> >
> >> >> Even more. In my previous example if semaphore is posted from the
> >> >> interrupt
> >> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> >> >> semaphore.
> >> >>
> >> > You are right.  In this usage case, the counting semaphore is not
> >> > being used for locking; it is being used for signalling an event per
> >> >
> >>
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> >> >
> >> > In that case, priority inheritance must be turned off.
> >> >
> >> You example is really confusing because you are mixing two different
> >> concepts, just subtly enough to be really confusing.  If a counting
> >> semaphore is used for locking a set of multiple resources, the posting
> >> the semaphore does not release the resource.  That is not the way that
> >> it is done.  Here is a more believable example of how this would work:
> >>
> >>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
> >>     channel allocation function.  If a DMA channel is available, it is
> >>     allocated and the allocation function takes the semaphore. TaskA
> >>     then starts DMA activity.
> >>  2. TaskA waits on another signaling semaphore for the DMA to complete.
> >>  3. TaskB with priority 20 does the same.
> >>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
> >>     the channel allocation function which waits on the sempahore for a
> >>     count to be available.  This boost the priority of TaskA and TaskB
> >>     to 30.
> >>  5. When the DMA started by TaskA completes, it signals TaskA which
> >>     calls the resource deallocation function which increments the
> >>     counting semaphore, giving the count to TaskC and storing the base
> >>     priorities.
> >>
> >>
> >
> > Normally, the resource(dma channel here) is allocated from one
> > thread/task, but may be freed in another thread/task. Please consider how
> > we malloc and free memory.
> >
> >
> >> The confusion arises because you are mixing the signaling logic with the
> >> resource deallocation logic.
> >>
> >> The mm/iob logic works basically this way.  The logic more complex then
> >> you would think from above.  IOBs is an example of a critical system
> >> resource that has multiple copies and utilizes a counting semaphore with
> >> priority inheritance to achieve good real time performance.   IOB
> >> handling is key logic for the correct real time operation of the overall
> >> system.  Nothing we do must risk this.
> >>
> >>
> > IOB is a very good example to demonstrate why it's a bad and dangerous
> > idea to enable priority inheritance for the counting semaphore. IOB is
> > normally allocated in the send thread but free in the work thread. If we
> > want the priority inheritance to work as expected instead of crashing the
> > system, sem_wait/sem_post must come from the same thread, which is a kind
> > of lock.
> >
> >
> >> Other places where this logic is (probably) used:
> >>
> >>     arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
> >>     CONFIG_RP2040_I2S_MAXINFLIGHT);
> >>     arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
> >>     init_val))
> >>     arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> >>     CONFIG_SAMA5_SSC_MAXINFLIGHT);
> >>     arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
> >>     CONFIG_SAMV7_SSC_MAXINFLIGHT);
> >>     arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
> >>     CONFIG_STM32_I2S_MAXINFLIGHT);
> >>     drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
> >>     MCP2515_NUM_TX_BUFFERS);
> >>     drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
> >>     CONFIG_VNCSERVER_NUPDATES);
> >>     sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
> >>     0, (ntasks_waiting + 1));
> >>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
> >>     g_btdev.le_pkts);
> >>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0,
> 1);
> >>     wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
> >>     CONFIG_MAC802154_NTXDESC);
> >>     wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
> >>
> >> Maybe:
> >>
> >>     arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0,
> init);
> >>     arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
> >>     sem_init(&bt_sem->sem, 0, init);
> >>     arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
> >>     nxsem_init(sem, 0, init);
> >>     arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
> >> init);
> >>     arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
> >>     init);
> >>     arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
> >>     nxsem_init(sem, 0, init);
> >>
> >>
> >
> >
> >
>

Re: [Breaking change] Move nxmutex to sched

Posted by Xiang Xiao <xi...@gmail.com>.

BTW, https://github.com/apache/nuttx/pull/5070 report that the system will
crash if  the priority inheritance enabled semaphore is waited or posted
from different threads.

On Sat, Apr 1, 2023 at 3:20 AM Xiang Xiao <xi...@gmail.com> wrote:

>
>
> On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <sp...@gmail.com> wrote:
>
>> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>> >
>> >> Even more. In my previous example if semaphore is posted from the
>> >> interrupt
>> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
>> >> semaphore.
>> >>
>> > You are right.  In this usage case, the counting semaphore is not
>> > being used for locking; it is being used for signalling an event per
>> >
>> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>> >
>> > In that case, priority inheritance must be turned off.
>> >
>> You example is really confusing because you are mixing two different
>> concepts, just subtly enough to be really confusing.  If a counting
>> semaphore is used for locking a set of multiple resources, the posting
>> the semaphore does not release the resource.  That is not the way that
>> it is done.  Here is a more believable example of how this would work:
>>
>>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>>     channel allocation function.  If a DMA channel is available, it is
>>     allocated and the allocation function takes the semaphore. TaskA
>>     then starts DMA activity.
>>  2. TaskA waits on another signaling semaphore for the DMA to complete.
>>  3. TaskB with priority 20 does the same.
>>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>>     the channel allocation function which waits on the sempahore for a
>>     count to be available.  This boost the priority of TaskA and TaskB
>>     to 30.
>>  5. When the DMA started by TaskA completes, it signals TaskA which
>>     calls the resource deallocation function which increments the
>>     counting semaphore, giving the count to TaskC and storing the base
>>     priorities.
>>
>>
>
> Normally, the resource(dma channel here) is allocated from one
> thread/task, but may be freed in another thread/task. Please consider how
> we malloc and free memory.
>
>
>> The confusion arises because you are mixing the signaling logic with the
>> resource deallocation logic.
>>
>> The mm/iob logic works basically this way.  The logic more complex then
>> you would think from above.  IOBs is an example of a critical system
>> resource that has multiple copies and utilizes a counting semaphore with
>> priority inheritance to achieve good real time performance.   IOB
>> handling is key logic for the correct real time operation of the overall
>> system.  Nothing we do must risk this.
>>
>>
> IOB is a very good example to demonstrate why it's a bad and dangerous
> idea to enable priority inheritance for the counting semaphore. IOB is
> normally allocated in the send thread but free in the work thread. If we
> want the priority inheritance to work as expected instead of crashing the
> system, sem_wait/sem_post must come from the same thread, which is a kind
> of lock.
>
>
>> Other places where this logic is (probably) used:
>>
>>     arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
>>     CONFIG_RP2040_I2S_MAXINFLIGHT);
>>     arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
>>     init_val))
>>     arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>>     CONFIG_SAMA5_SSC_MAXINFLIGHT);
>>     arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>>     CONFIG_SAMV7_SSC_MAXINFLIGHT);
>>     arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
>>     CONFIG_STM32_I2S_MAXINFLIGHT);
>>     drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
>>     MCP2515_NUM_TX_BUFFERS);
>>     drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
>>     CONFIG_VNCSERVER_NUPDATES);
>>     sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
>>     0, (ntasks_waiting + 1));
>>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
>>     g_btdev.le_pkts);
>>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
>>     wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
>>     CONFIG_MAC802154_NTXDESC);
>>     wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
>>
>> Maybe:
>>
>>     arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
>>     arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
>>     sem_init(&bt_sem->sem, 0, init);
>>     arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
>>     nxsem_init(sem, 0, init);
>>     arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
>> init);
>>     arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
>>     init);
>>     arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
>>     nxsem_init(sem, 0, init);
>>
>>
>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Xiang Xiao <xi...@gmail.com>.

On Sat, Apr 1, 2023 at 12:34 AM Gregory Nutt <sp...@gmail.com> wrote:

> On 3/31/2023 8:56 AM, Gregory Nutt wrote:
> >
> >> Even more. In my previous example if semaphore is posted from the
> >> interrupt
> >> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> >> semaphore.
> >>
> > You are right.  In this usage case, the counting semaphore is not
> > being used for locking; it is being used for signalling an event per
> >
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> >
> > In that case, priority inheritance must be turned off.
> >
> You example is really confusing because you are mixing two different
> concepts, just subtly enough to be really confusing.  If a counting
> semaphore is used for locking a set of multiple resources, the posting
> the semaphore does not release the resource.  That is not the way that
> it is done.  Here is a more believable example of how this would work:
>
>  1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
>     channel allocation function.  If a DMA channel is available, it is
>     allocated and the allocation function takes the semaphore. TaskA
>     then starts DMA activity.
>  2. TaskA waits on another signaling semaphore for the DMA to complete.
>  3. TaskB with priority 20 does the same.
>  4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
>     the channel allocation function which waits on the sempahore for a
>     count to be available.  This boost the priority of TaskA and TaskB
>     to 30.
>  5. When the DMA started by TaskA completes, it signals TaskA which
>     calls the resource deallocation function which increments the
>     counting semaphore, giving the count to TaskC and storing the base
>     priorities.
>
>

Normally, the resource(dma channel here) is allocated from one thread/task,
but may be freed in another thread/task. Please consider how we malloc and
free memory.


> The confusion arises because you are mixing the signaling logic with the
> resource deallocation logic.
>
> The mm/iob logic works basically this way.  The logic more complex then
> you would think from above.  IOBs is an example of a critical system
> resource that has multiple copies and utilizes a counting semaphore with
> priority inheritance to achieve good real time performance.   IOB
> handling is key logic for the correct real time operation of the overall
> system.  Nothing we do must risk this.
>
>
IOB is a very good example to demonstrate why it's a bad and dangerous idea
to enable priority inheritance for the counting semaphore. IOB is normally
allocated in the send thread but free in the work thread. If we want the
priority inheritance to work as expected instead of crashing the system,
sem_wait/sem_post must come from the same thread, which is a kind of lock.


> Other places where this logic is (probably) used:
>
>     arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
>     CONFIG_RP2040_I2S_MAXINFLIGHT);
>     arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
>     init_val))
>     arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>     CONFIG_SAMA5_SSC_MAXINFLIGHT);
>     arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
>     CONFIG_SAMV7_SSC_MAXINFLIGHT);
>     arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
>     CONFIG_STM32_I2S_MAXINFLIGHT);
>     drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
>     MCP2515_NUM_TX_BUFFERS);
>     drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
>     CONFIG_VNCSERVER_NUPDATES);
>     sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
>     0, (ntasks_waiting + 1));
>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
>     g_btdev.le_pkts);
>     wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
>     wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
>     CONFIG_MAC802154_NTXDESC);
>     wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);
>
> Maybe:
>
>     arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
>     arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
>     sem_init(&bt_sem->sem, 0, init);
>     arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
>     nxsem_init(sem, 0, init);
>     arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0,
> init);
>     arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
>     init);
>     arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
>     nxsem_init(sem, 0, init);
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 3/31/2023 8:56 AM, Gregory Nutt wrote:
>
>> Even more. In my previous example if semaphore is posted from the 
>> interrupt
>> we do not know which of TaskA or TaskB is no longer a "holder l" of a
>> semaphore.
>>
> You are right.  In this usage case, the counting semaphore is not 
> being used for locking; it is being used for signalling an event per 
> https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
>
> In that case, priority inheritance must be turned off.
>
You example is really confusing because you are mixing two different 
concepts, just subtly enough to be really confusing.  If a counting 
semaphore is used for locking a set of multiple resources, the posting 
the semaphore does not release the resource.  That is not the way that 
it is done.  Here is a more believable example of how this would work:

 1. TaskA with priority 10 allocates DMA0 channel by calling a DMA
    channel allocation function.  If a DMA channel is available, it is
    allocated and the allocation function takes the semaphore. TaskA
    then starts DMA activity.
 2. TaskA waits on another signaling semaphore for the DMA to complete.
 3. TaskB with priority 20 does the same.
 4. TaskC with priority 30 wants to allocate a DMA channel.  It calls
    the channel allocation function which waits on the sempahore for a
    count to be available.  This boost the priority of TaskA and TaskB
    to 30.
 5. When the DMA started by TaskA completes, it signals TaskA which
    calls the resource deallocation function which increments the
    counting semaphore, giving the count to TaskC and storing the base
    priorities.

The confusion arises because you are mixing the signaling logic with the 
resource deallocation logic.

The mm/iob logic works basically this way.  The logic more complex then 
you would think from above.  IOBs is an example of a critical system 
resource that has multiple copies and utilizes a counting semaphore with 
priority inheritance to achieve good real time performance.   IOB 
handling is key logic for the correct real time operation of the overall 
system.  Nothing we do must risk this.

Other places where this logic is (probably) used:

    arch/arm/src/rp2040/rp2040_i2s.c: nxsem_init(&priv->bufsem, 0,
    CONFIG_RP2040_I2S_MAXINFLIGHT);
    arch/arm/src/rtl8720c/amebaz_depend.c:  if (sem_init(_sema, 0,
    init_val))
    arch/arm/src/sama5/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
    CONFIG_SAMA5_SSC_MAXINFLIGHT);
    arch/arm/src/samv7/sam_ssc.c: nxsem_init(&priv->bufsem, 0,
    CONFIG_SAMV7_SSC_MAXINFLIGHT);
    arch/arm/src/stm32/stm32_i2s.c: nxsem_init(&priv->bufsem, 0,
    CONFIG_STM32_I2S_MAXINFLIGHT);
    drivers/can/mcp2515.c: nxsem_init(&priv->txfsem, 0,
    MCP2515_NUM_TX_BUFFERS);
    drivers/video/vnc/vnc_server.c: nxsem_init(&session->freesem, 0,
    CONFIG_VNCSERVER_NUPDATES);
    sched/pthread/pthread_completejoin.c: nxsem_init(&pjoin->data_sem,
    0, (ntasks_waiting + 1));
    wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.le_pkts_sem, 0,
    g_btdev.le_pkts);
    wireless/bluetooth/bt_hcicore.c: nxsem_init(&g_btdev.ncmd_sem, 0, 1);
    wireless/ieee802154/mac802154.c: nxsem_init(&priv->txdesc_sem, 0,
    CONFIG_MAC802154_NTXDESC);
    wireless/ieee802154/mac802154.c: nxsem_init(&mac->opsem, 0, 1);

Maybe:

    arch/risc-v/src/bl602/bl602_os_hal.c: ret = nxsem_init(sem, 0, init);
    arch/risc-v/src/esp32c3/esp32c3_ble_adapter.c: ret =
    sem_init(&bt_sem->sem, 0, init);
    arch/risc-v/src/esp32c3/esp32c3_wifi_adapter.c: ret =
    nxsem_init(sem, 0, init);
    arch/xtensa/src/esp32/esp32_ble_adapter.c: ret = sem_init(sem, 0, init);
    arch/xtensa/src/esp32/esp32_wifi_adapter.c: ret = nxsem_init(sem, 0,
    init);
    arch/xtensa/src/esp32s3/esp32s3_wifi_adapter.c: ret =
    nxsem_init(sem, 0, init);

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> Even more. In my previous example if semaphore is posted from the interrupt
> we do not know which of TaskA or TaskB is no longer a "holder l" of a
> semaphore.
>
You are right.  In this usage case, the counting semaphore is not being 
used for locking; it is being used for signalling an event per 
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance

In that case, priority inheritance must be turned off.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Even more. In my previous example if semaphore is posted from the interrupt
we do not know which of TaskA or TaskB is no longer a "holder l" of a
semaphore.

Best regards,
Petro

On Fri, Mar 31, 2023, 5:39 PM Petro Karashchenko <
petro.karashchenko@gmail.com> wrote:

> Sorry for been not clear. Here is a better description:
> 2 DMA channels accountable by a counting semaphore.
> The semaphore is posted by DMA completion interrupt.
> TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
> TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
> TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
> operation completion).
> DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
> priorities are restored.
>
> Best regards,
> Petro
>
> On Fri, Mar 31, 2023, 5:26 PM Gregory Nutt <sp...@gmail.com> wrote:
>
>>
>> > I still see more questions than answers. As semaphores can be posted
>> from
>> > the interrupt level. Let's take next example:
>> > The counting semaphore manages DMA channels.
>> > Task allocates a DMA channels and takes counting semaphore (becomes a
>> > holder), but posting a semaphore is done from DMA completion can back as
>> > channel is freed there. The holder task may still do some activities on
>> the
>> > background while DMA is working. But current priority boost schema will
>> > rise it's priority (even if boost will not lead to faster posting of a
>> > semaphore). This is more theoretical description, but describes the
>> state
>> > of problem.
>> >
>> > I think we can task about inheritance only if take/post are done from
>> task
>> > level and currently only mutex ensure that.
>>
>> That is not true.  Posting from an interrupt never boosts priority and,
>> hence, never causes inheritance of priority.  It can only cause a drop /
>> restoration in priority.  That may result in context switches which can
>> be done from the interrupt level with no problem.  I don't see any
>> issue.  Certainly this works, it is done often and works very well.
>>
>> This is an important feature of the real time behavior.  We can't lose
>> this behavior.
>>
>>
>>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> Sorry for been not clear. Here is a better description:
> 2 DMA channels accountable by a counting semaphore.
> The semaphore is posted by DMA completion interrupt.
> TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
> TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
> TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
> of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
> operation completion).

No, but it will result in a more real time, deterministic to the 
completion of a DMA which is a critical event to the healthy behavior of 
the system.  That is the gold of an RTOS -- NOT faster response, but a 
deterministic response.  That is the meaing of "real time"

This is EXTREMELY important to the viability of NuttX as an RTOS.  If 
the OS cannot respond deterministically in cases like this then the RTOS 
is a total failure as an RTOS.  Might as well remove the RT from the 
beginning.

This is key.  This is absolutely critical to the existence of NuttX as 
an RTOS.  If we remove this capability then the OS is a pile of shit and 
never be used by anyone.

> DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
> priorities are restored.
Yes, that sounds correct.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Sorry for been not clear. Here is a better description:
2 DMA channels accountable by a counting semaphore.
The semaphore is posted by DMA completion interrupt.
TaskA with priority 10 allocates DMA0 channel and starts DMA activity.
TaskB with priority 20 allocates DMA1 channel and starts DMA activity.
TaskC with priority 30 wants to allocate a DMA channel, so boosts priority
of TaskA and TaskB to 30 (even if that will not lead to fasted DMA
operation completion).
DMA1 completes and posts semaphore, so TaskC gets it and TaskA and TaskB
priorities are restored.

Best regards,
Petro

On Fri, Mar 31, 2023, 5:26 PM Gregory Nutt <sp...@gmail.com> wrote:

>
> > I still see more questions than answers. As semaphores can be posted from
> > the interrupt level. Let's take next example:
> > The counting semaphore manages DMA channels.
> > Task allocates a DMA channels and takes counting semaphore (becomes a
> > holder), but posting a semaphore is done from DMA completion can back as
> > channel is freed there. The holder task may still do some activities on
> the
> > background while DMA is working. But current priority boost schema will
> > rise it's priority (even if boost will not lead to faster posting of a
> > semaphore). This is more theoretical description, but describes the state
> > of problem.
> >
> > I think we can task about inheritance only if take/post are done from
> task
> > level and currently only mutex ensure that.
>
> That is not true.  Posting from an interrupt never boosts priority and,
> hence, never causes inheritance of priority.  It can only cause a drop /
> restoration in priority.  That may result in context switches which can
> be done from the interrupt level with no problem.  I don't see any
> issue.  Certainly this works, it is done often and works very well.
>
> This is an important feature of the real time behavior.  We can't lose
> this behavior.
>
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> I still see more questions than answers. As semaphores can be posted from
> the interrupt level. Let's take next example:
> The counting semaphore manages DMA channels.
> Task allocates a DMA channels and takes counting semaphore (becomes a
> holder), but posting a semaphore is done from DMA completion can back as
> channel is freed there. The holder task may still do some activities on the
> background while DMA is working. But current priority boost schema will
> rise it's priority (even if boost will not lead to faster posting of a
> semaphore). This is more theoretical description, but describes the state
> of problem.
>
> I think we can task about inheritance only if take/post are done from task
> level and currently only mutex ensure that.

That is not true.  Posting from an interrupt never boosts priority and, 
hence, never causes inheritance of priority.  It can only cause a drop / 
restoration in priority.  That may result in context switches which can 
be done from the interrupt level with no problem.  I don't see any 
issue.  Certainly this works, it is done often and works very well.

This is an important feature of the real time behavior.  We can't lose 
this behavior.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

I still see more questions than answers. As semaphores can be posted from
the interrupt level. Let's take next example:
The counting semaphore manages DMA channels.
Task allocates a DMA channels and takes counting semaphore (becomes a
holder), but posting a semaphore is done from DMA completion can back as
channel is freed there. The holder task may still do some activities on the
background while DMA is working. But current priority boost schema will
rise it's priority (even if boost will not lead to faster posting of a
semaphore). This is more theoretical description, but describes the state
of problem.

I think we can task about inheritance only if take/post are done from task
level and currently only mutex ensure that.

Best regards,
Petro

On Fri, Mar 31, 2023, 4:30 PM Gregory Nutt <sp...@gmail.com> wrote:

>
> > Yes and No. For counting semaphores we have multiple holders for example
> > with priorities 10, 50 and 90. Now a task with priority 100 comes and
> wants
> > to take a semaphore. Priority which of the holders should be increased?
> The
> > lowest or the highest holder? With a real-time point of view it should be
> > 90 boosted to 100, but is the current implementation doing that?
> Currently ALL holders are boosted to the priority of the highest
> priority waiter.
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> Yes and No. For counting semaphores we have multiple holders for example
> with priorities 10, 50 and 90. Now a task with priority 100 comes and wants
> to take a semaphore. Priority which of the holders should be increased? The
> lowest or the highest holder? With a real-time point of view it should be
> 90 boosted to 100, but is the current implementation doing that?
Currently ALL holders are boosted to the priority of the highest 
priority waiter.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Yes and No. For counting semaphores we have multiple holders for example
with priorities 10, 50 and 90. Now a task with priority 100 comes and wants
to take a semaphore. Priority which of the holders should be increased? The
lowest or the highest holder? With a real-time point of view it should be
90 boosted to 100, but is the current implementation doing that?

Let's see how the other RTOS-s are doing that. I do not think that we are
the first who meet this design question.

Best regards,
Petro

пт, 31 бер. 2023 р. о 16:04 Gregory Nutt <sp...@gmail.com> пише:

>
> > Migration to nxmutex is quite a big effort and unfortunately recently I
> > didn't have much time to deep dive into this. In general I support an
> > initiative and do not see a use case for priority inheritance for regular
> > semaphores, so I think we should clean-up priority inheritance code for
> the
> > regular signalling semaphores and introduce a new kernel object (mutex)
> > instead. This is surely valid for the kernel, but not for the user space
> > that already has pthread_mutex with priority inheritance option, so I do
> > not see anything is needed for user space.
>
> Not all locking is binary.  The are cases in the OS where there a
> multiple instances of a resource that are protected with a counting
> semaphore.  If there are N things available, N attempts will return a
> thing, but the N+1th attempt blocks.  If the requester of the N+1th
> thing is high priority then priority inversion can occur.
>
> This is a large change and can affect many people.  This appears to be
> controversial.  Controversial change require a vote of the PMC to
> continue.  Different rules for "Votes on Code Modification" appear
> here:  https://www.apache.org/foundation/voting.html
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

> Migration to nxmutex is quite a big effort and unfortunately recently I
> didn't have much time to deep dive into this. In general I support an
> initiative and do not see a use case for priority inheritance for regular
> semaphores, so I think we should clean-up priority inheritance code for the
> regular signalling semaphores and introduce a new kernel object (mutex)
> instead. This is surely valid for the kernel, but not for the user space
> that already has pthread_mutex with priority inheritance option, so I do
> not see anything is needed for user space.

Not all locking is binary.  The are cases in the OS where there a 
multiple instances of a resource that are protected with a counting 
semaphore.  If there are N things available, N attempts will return a 
thing, but the N+1th attempt blocks.  If the requester of the N+1th 
thing is high priority then priority inversion can occur.

This is a large change and can affect many people.  This appears to be 
controversial.  Controversial change require a vote of the PMC to 
continue.  Different rules for "Votes on Code Modification" appear 
here:  https://www.apache.org/foundation/voting.html

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Hello,

Migration to nxmutex is quite a big effort and unfortunately recently I
didn't have much time to deep dive into this. In general I support an
initiative and do not see a use case for priority inheritance for regular
semaphores, so I think we should clean-up priority inheritance code for the
regular signalling semaphores and introduce a new kernel object (mutex)
instead. This is surely valid for the kernel, but not for the user space
that already has pthread_mutex with priority inheritance option, so I do
not see anything is needed for user space.

I see a few areas where the problems may appear and just want to make sure
that next areas are analyzed and tested with each migration step:
1. Cancellation points and FLAT mode. Since we are having only one copy of
a LIBC here we need to decide what object to use, so all interfaces that
are a cancellation point will still work as before
2. If we start to base pthread objects on nxmutex then we need to make sure
that cancellation point operation is not broken, For example rw_lock API
are cancellation points.

The optimization of priority inheritance operation is welcomed.

Best regards,
Petro

пт, 31 бер. 2023 р. о 07:10 Tomek CEDRO <to...@cedro.info> пише:

> On Fri, Mar 31, 2023 at 12:23 AM Gregory Nutt  wrote:
> >
> >  > In his Confluence paper on "Signaling Semaphores and Priority
> > Inheritance”, Brennan Ashton’s analysis is both thorough and accurate;
> ...
> >
> > Minor fix.  I wrote the paper, Brennan converted the Confluence page
> > from an older DocuWiki page
>
> Respect :-)
>
>
> >  > The solution Brennan suggests is to initialize semaphores used as
> > signaling events as follows:
> >  > sem_init(&sem, 0, 0);
> >  > sem_setprotocol(&sem, SEM_PRIO_NONE);
> >  >
> >  > this is, of course, correct, but retains the dual purpose of
> > sem_wait() and sem_post().  I believe this can be confusing and will
> > continue to be a source of subtle errors.  I suggest going a step
> > further and isolate the two use cases.  Let the current sem_init,
> > sem_wait, sem_post be used only for the resource locking use case.
> >  >
> >  > For the signaling use case, create a new API for event signaling
> > within the kernel: nxev_init, nxev_wait, nxev_post where: nxev_init is
> > simply:
> >  > sem_init(&nxev, 0, 0);
> >  > sem_setprotocol(&nxev, SEM_PRIO_NONE);
> >  >
> >  > and:
> >  >      #define nxev_wait    sem_wait
> >  >      #define nxev_post    sem_post
> >  >
> >  > In the case were PRIORITY_INHERITANCE is not configured,
> > sem_setprotocol() does nothing and the nxev_*() API is still used for
> > event notification.
> >  >
> >  > This may seem a trivial change, but having specific API function
> > names for the two specific use cases would, I believe, all but eliminate
> > future confusion; especially given that most people look to existing
> > drivers to use as a template.  Finally, enabling or disabling
> > PRIORITY_INHERITANCE would not introduce the subtle error Brennan
> > documented.
> >
> > Your suggestion would clarify the usage.
> >
> > I was thinking out a conceptually simple solution that should also
> > resolve the risk in usage:  Just change the default state to
> > SEM_PRIO_NONE for all semaphores.  That would make the default protocol
> > for semaphore be no priority inheritance.
> >
> > This would be a lot of work, however.  All occurrences of sem_init()
> > would have to be examined:
> >
> >      For the signaling use case, you would do nothing.  We would have to
> > remove all sem_setprotocol(&nxev, SEM_PRIO_NONE);
> >      For the resource locking use case, you would have to add
> > sem_setprotocol(&nxev, SEM_PRIO_INHERIT); assuming priority inheritance
> > is enabled.
> >
> > The eliminates the risk of inappropriately using priority inheritance on
> > a locking semaphore.  The consequences of doing that are very bad:
> >
> https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance
> >
> > Then the only error that the user can make is to fail to select priority
> > inheritance when it would do good.  That is a lesser error and, as you
> > note, usually OK.
>
> +1 :-)
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>

Re: [Breaking change] Move nxmutex to sched

Posted by Tomek CEDRO <to...@cedro.info>.

On Fri, Mar 31, 2023 at 12:23 AM Gregory Nutt  wrote:
>
>  > In his Confluence paper on "Signaling Semaphores and Priority
> Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; ...
>
> Minor fix.  I wrote the paper, Brennan converted the Confluence page
> from an older DocuWiki page

Respect :-)


>  > The solution Brennan suggests is to initialize semaphores used as
> signaling events as follows:
>  > sem_init(&sem, 0, 0);
>  > sem_setprotocol(&sem, SEM_PRIO_NONE);
>  >
>  > this is, of course, correct, but retains the dual purpose of
> sem_wait() and sem_post().  I believe this can be confusing and will
> continue to be a source of subtle errors.  I suggest going a step
> further and isolate the two use cases.  Let the current sem_init,
> sem_wait, sem_post be used only for the resource locking use case.
>  >
>  > For the signaling use case, create a new API for event signaling
> within the kernel: nxev_init, nxev_wait, nxev_post where: nxev_init is
> simply:
>  > sem_init(&nxev, 0, 0);
>  > sem_setprotocol(&nxev, SEM_PRIO_NONE);
>  >
>  > and:
>  >      #define nxev_wait    sem_wait
>  >      #define nxev_post    sem_post
>  >
>  > In the case were PRIORITY_INHERITANCE is not configured,
> sem_setprotocol() does nothing and the nxev_*() API is still used for
> event notification.
>  >
>  > This may seem a trivial change, but having specific API function
> names for the two specific use cases would, I believe, all but eliminate
> future confusion; especially given that most people look to existing
> drivers to use as a template.  Finally, enabling or disabling
> PRIORITY_INHERITANCE would not introduce the subtle error Brennan
> documented.
>
> Your suggestion would clarify the usage.
>
> I was thinking out a conceptually simple solution that should also
> resolve the risk in usage:  Just change the default state to
> SEM_PRIO_NONE for all semaphores.  That would make the default protocol
> for semaphore be no priority inheritance.
>
> This would be a lot of work, however.  All occurrences of sem_init()
> would have to be examined:
>
>      For the signaling use case, you would do nothing.  We would have to
> remove all sem_setprotocol(&nxev, SEM_PRIO_NONE);
>      For the resource locking use case, you would have to add
> sem_setprotocol(&nxev, SEM_PRIO_INHERIT); assuming priority inheritance
> is enabled.
>
> The eliminates the risk of inappropriately using priority inheritance on
> a locking semaphore.  The consequences of doing that are very bad:
> https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance
>
> Then the only error that the user can make is to fail to select priority
> inheritance when it would do good.  That is a lesser error and, as you
> note, usually OK.

+1 :-)

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 3/31/2023 12:53 PM, Petro Karashchenko wrote:
> Hello Greg,
>
> I already wrote that my example is more theoretical and may be a "bad
> design", but it illustrates the issue in the current code.
> Please understand me right, I'm not against having priority inheritance
> available for semaphores. I just want to have things well defined and
> possibly prohibit / catch a bad usage. Like mutex usually have a holder
> field and only a holder of a mutex can release it. That is basically not
> true for semaphores. I would be happy if:
> 1. sem_post() would assert on an attempt to call it from interrupt with a
> semaphore that has priority inheritance enabled.
> 2. sem_post() would assert if the caller task is not in a holder list of
> a semaphore that has priority inheritance enabled.
> But unfortunately that is not true now and currently "nxsem_release_holder"
> just decrements a holder count of a random holder if caller TCB is not in a
> holder list.
>
> What I'm trying to say is that calling sem_wait/sem_post from task context
> is essential for priority inheritance and there is a hole here currently.
> I'm ok with building a kernel mutex as a wrapper on top of the "fixed"
> semaphores with priority inheritance that will do some additional checks if
> needed.
>
> Regarding a "pile of shit" and real-time behavior I agree that determinism
> is a key, but want to use some terms here. The RTOS vs OS difference is
> that RTOS implements scheduling using rate monotonic scheduling (RMS) and
> that is mathematically proven by RMA. That is how determinism is achieved.
> If we get back to the example with counting semaphore and the case of "ALL
> holders are boosted to the priority of the highest priority waiter". What
> is the impact on RMS here? The priority inheritance for mutex (a binary
> semaphore case) was intended to minimize impact on RMS when an exceptional
> situation happens. So in the case of a few holders, how does creating a
> pool of concurrent tasks with the same priority improve determinism? Or
> maybe exactly this makes RTOS a "pile of shit"? Personally I do not have a
> clear answer and this is more expressing my thoughts here. I really think
> that at the end of this discussion we will have some good results that will
> lead to an improvement.
>
> Best regards,
> Petro
>
> пт, 31 бер. 2023 р. о 20:56 David S. Alessio <da...@gmail.com>
> пише:

And my concern is that we don't "throw the baby out with the bathwater", 
as they say

I would rather have to be careful, then lose core functionality. Or 
perhaps just re-design or re-name interfaces to avoid the problem.  The 
root of the probably was mucking around with the semantics of the 
semaphore, not with any technical issue.

Re: [Breaking change] Move nxmutex to sched

Posted by Petro Karashchenko <pe...@gmail.com>.

Hello Greg,

I already wrote that my example is more theoretical and may be a "bad
design", but it illustrates the issue in the current code.
Please understand me right, I'm not against having priority inheritance
available for semaphores. I just want to have things well defined and
possibly prohibit / catch a bad usage. Like mutex usually have a holder
field and only a holder of a mutex can release it. That is basically not
true for semaphores. I would be happy if:
1. sem_post() would assert on an attempt to call it from interrupt with a
semaphore that has priority inheritance enabled.
2. sem_post() would assert if the caller task is not in a holder list of
a semaphore that has priority inheritance enabled.
But unfortunately that is not true now and currently "nxsem_release_holder"
just decrements a holder count of a random holder if caller TCB is not in a
holder list.

What I'm trying to say is that calling sem_wait/sem_post from task context
is essential for priority inheritance and there is a hole here currently.
I'm ok with building a kernel mutex as a wrapper on top of the "fixed"
semaphores with priority inheritance that will do some additional checks if
needed.

Regarding a "pile of shit" and real-time behavior I agree that determinism
is a key, but want to use some terms here. The RTOS vs OS difference is
that RTOS implements scheduling using rate monotonic scheduling (RMS) and
that is mathematically proven by RMA. That is how determinism is achieved.
If we get back to the example with counting semaphore and the case of "ALL
holders are boosted to the priority of the highest priority waiter". What
is the impact on RMS here? The priority inheritance for mutex (a binary
semaphore case) was intended to minimize impact on RMS when an exceptional
situation happens. So in the case of a few holders, how does creating a
pool of concurrent tasks with the same priority improve determinism? Or
maybe exactly this makes RTOS a "pile of shit"? Personally I do not have a
clear answer and this is more expressing my thoughts here. I really think
that at the end of this discussion we will have some good results that will
lead to an improvement.

Best regards,
Petro

пт, 31 бер. 2023 р. о 20:56 David S. Alessio <da...@gmail.com>
пише:

>
>
> > On Mar 30, 2023, at 3:23 PM, Gregory Nutt <sp...@gmail.com> wrote:
> >
> > > In his Confluence paper on "Signaling Semaphores and Priority
> Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; ...
> >
> > Minor fix.  I wrote the paper, Brennan converted the Confluence page
> from an older DocuWiki page
>
> Hi, Greg, I should have known!
> Cheers,
> -david
>
>

Re: [Breaking change] Move nxmutex to sched

Posted by "David S. Alessio" <da...@gmail.com>.


> On Mar 30, 2023, at 3:23 PM, Gregory Nutt <sp...@gmail.com> wrote:
> 
> > In his Confluence paper on "Signaling Semaphores and Priority Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; ...
> 
> Minor fix.  I wrote the paper, Brennan converted the Confluence page from an older DocuWiki page

Hi, Greg, I should have known!
Cheers,
-david

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

 > In his Confluence paper on "Signaling Semaphores and Priority 
Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; ...

Minor fix.  I wrote the paper, Brennan converted the Confluence page 
from an older DocuWiki page

 > In any RTOS, priority inversion happens, and it’s OK, and a 
well-designed RTOS can be configured to meet its timing constraints 
given CPU performance is adequate for the application.  Point: priority 
inversion is not intrinsically a “bad thing,” it’s going to happen.

I read that priority inheritance does reduce the variability in response 
times.  I am not ready to defend that statement, but it is a common belief.

 > It seems to me that “Signaling Semaphores” are only used within the 
NuttX kernel (is this correct?).  For intertask communication POSIX 
pthread mechanisms could and should be used.

That is probably true.  Application designers, of course, can do as they 
wish and could use a counting semaphore for signaling between 
processes.  But I haven't seen that done.

 > The solution Brennan suggests is to initialize semaphores used as 
signaling events as follows:
 > sem_init(&sem, 0, 0);
 > sem_setprotocol(&sem, SEM_PRIO_NONE);
 >
 > this is, of course, correct, but retains the dual purpose of 
sem_wait() and sem_post().  I believe this can be confusing and will 
continue to be a source of subtle errors.  I suggest going a step 
further and isolate the two use cases.  Let the current sem_init, 
sem_wait, sem_post be used only for the resource locking use case.
 >
 > For the signaling use case, create a new API for event signaling 
within the kernel: nxev_init, nxev_wait, nxev_post where: nxev_init is 
simply:
 > sem_init(&nxev, 0, 0);
 > sem_setprotocol(&nxev, SEM_PRIO_NONE);
 >
 > and:
 >      #define nxev_wait    sem_wait
 >      #define nxev_post    sem_post
 >
 > In the case were PRIORITY_INHERITANCE is not configured, 
sem_setprotocol() does nothing and the nxev_*() API is still used for 
event notification.
 >
 > This may seem a trivial change, but having specific API function 
names for the two specific use cases would, I believe, all but eliminate 
future confusion; especially given that most people look to existing 
drivers to use as a template.  Finally, enabling or disabling 
PRIORITY_INHERITANCE would not introduce the subtle error Brennan 
documented.

Your suggestion would clarify the usage.

I was thinking out a conceptually simple solution that should also 
resolve the risk in usage:  Just change the default state to 
SEM_PRIO_NONE for all semaphores.  That would make the default protocol 
for semaphore be no priority inheritance.

This would be a lot of work, however.  All occurrences of sem_init() 
would have to be examined:

     For the signaling use case, you would do nothing.  We would have to 
remove all sem_setprotocol(&nxev, SEM_PRIO_NONE);
     For the resource locking use case, you would have to add 
sem_setprotocol(&nxev, SEM_PRIO_INHERIT); assuming priority inheritance 
is enabled.

The eliminates the risk of inappropriately using priority inheritance on 
a locking semaphore.  The consequences of doing that are very bad: 
https://nuttx.yahoogroups.narkive.com/3hggphCi/problem-related-semaphore-and-priority-inheritance

Then the only error that the user can make is to fail to select priority 
inheritance when it would do good.  That is a lesser error and, as you 
note, usually OK.

Re: [Breaking change] Move nxmutex to sched

Posted by "David S. Alessio" <da...@gmail.com>.

All,

In his Confluence paper on "Signaling Semaphores and Priority Inheritance”, Brennan Ashton’s analysis is both thorough and accurate; and his conclusion that
There should then be no priority inheritance operations on this semaphore that is used for signalling.  
is unassailable.

It’s important to understand a few points here:

In any RTOS, priority inversion happens, and it’s OK, and a well-designed RTOS can be configured to meet its timing constraints given CPU performance is adequate for the application.  Point: priority inversion is not intrinsically a “bad thing,” it’s going to happen.

Point: The “bad thing” about priority inversion happens when it becomes unbounded.  There are plenty of papers on this, but summarily, unbounded priority inversion happens when a thread blocks on a resource held by another thread of a lower priority (so far this is normal priority inversion); and then a third, unrelated thread, with an intermediate priority becomes ready (imagine it decides to calculate pi).  Now we have unbounded priority inversion (UPI) and we no longer have a deterministic realtime system.

There are a number of solutions to the problem of UPI.  In the case of NuttX, a fixed-priority preemptive RTOS, one solution that both solves the problem and minimizes performance penalty is priority inheritance.  Point: priority inheritance is a solution to the problem of unbounded priority inversion.

In the use case described by Brennan as “Signaling Semaphores”, the semaphore is used as a signal to indicate some event has just occurred, i.e a serial driver signaling an incoming character or keystroke.  Since it’s not used to protect a shared resource, there is no contention for a shared resource and “priority inversion” is meaningless in this case (how would you boost the priority of the device or person typing on the far end of a serial cable?).  Point: in the use case of a signaling semaphore, unbounded priority inversion doesn’t happen because there’s no priority inversion to begin with.

Final Point: priority inheritance, in the case of Signaling Semaphores, is the application of a solution to a problem that doesn’t exist; and in fact becomes the problem.

My thoughts:

It seems to me that “Signaling Semaphores” are only used within the NuttX kernel (is this correct?).  For intertask communication POSIX pthread mechanisms could and should be used.

The solution Brennan suggests is to initialize semaphores used as signaling events as follows:
sem_init(&sem, 0, 0);
sem_setprotocol(&sem, SEM_PRIO_NONE);

this is, of course, correct, but retains the dual purpose of sem_wait() and sem_post().  I believe this can be confusing and will continue to be a source of subtle errors.  I suggest going a step further and isolate the two use cases.  Let the current sem_init, sem_wait, sem_post be used only for the resource locking use case.

For the signaling use case, create a new API for event signaling within the kernel: nxev_init, nxev_wait, nxev_post where: nxev_init is simply:
sem_init(&nxev, 0, 0);
sem_setprotocol(&nxev, SEM_PRIO_NONE);

and:
	 #define nxev_wait	sem_wait
	 #define nxev_post	sem_post

In the case were PRIORITY_INHERITANCE is not configured, sem_setprotocol() does nothing and the nxev_*() API is still used for event notification.

This may seem a trivial change, but having specific API function names for the two specific use cases would, I believe, all but eliminate future confusion; especially given that most people look to existing drivers to use as a template.  Finally, enabling or disabling PRIORITY_INHERITANCE would not introduce the subtle error Brennan documented.

Cheers,
-david

> On Mar 30, 2023, at 6:49 AM, Gregory Nutt <sp...@gmail.com> wrote:
> 
> 
>>> 3. Move priority inheritance from nxsem to nxmutex, and optimize the performance of priority inheritance;
>> 
>> That will break every a lot of people's code.  There is probably a lot of application logic that depends on priority inheritance working on counting semaphores.  Removing that will problems for a lot of people.
>> 
>> This is, however, consistent with how Linux works. 
> 
> AFAIK there is nothing that prohibits semaphores from supporting priority inheritance -- other than the fact that there is no "owner" of a semaphore as there is for a mutex.  Priority inheritance is very important for the correct behavior of some real time systems so we were able to use priority inheritance with counting semaphores by adding some non-standard interfaces to control whether a semaphore supports priority inheritance or not by its usage: https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
> 
> I have mixed feelings myself and hope that we get some consensus through dialog.  One one hand, it is important to stay faithful to documented standard and undocumented conventions for the use the a POSIX/Unix systems.  But on the other hand, unlike other OSs that strive toward standard conformance, we are an RTOS and must satisfy certain requirements for deterministic, real time behavior.
> 
> What do you all think?
>

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

>> 3. Move priority inheritance from nxsem to nxmutex, and optimize the 
>> performance of priority inheritance;
>
> That will break every a lot of people's code.  There is probably a lot 
> of application logic that depends on priority inheritance working on 
> counting semaphores.  Removing that will problems for a lot of people.
>
> This is, however, consistent with how Linux works. 

AFAIK there is nothing that prohibits semaphores from supporting 
priority inheritance -- other than the fact that there is no "owner" of 
a semaphore as there is for a mutex.  Priority inheritance is very 
important for the correct behavior of some real time systems so we were 
able to use priority inheritance with counting semaphores by adding some 
non-standard interfaces to control whether a semaphore supports priority 
inheritance or not by its usage: 
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance

I have mixed feelings myself and hope that we get some consensus through 
dialog.  One one hand, it is important to stay faithful to documented 
standard and undocumented conventions for the use the a POSIX/Unix 
systems.  But on the other hand, unlike other OSs that strive toward 
standard conformance, we are an RTOS and must satisfy certain 
requirements for deterministic, real time behavior.

What do you all think?

Re: [Breaking change] Move nxmutex to sched

Posted by Gregory Nutt <sp...@gmail.com>.

On 3/30/2023 1:57 AM, 奇诺 wrote:
> Hi, all:
> I submitted a PR regarding mutex, the changes are as follows: 1. Move some mutex operations to sched;2. Add mutex_clocklock, mutex_set_protocol, and mutex_get_protocol interfaces;3. Remove cancellation point operations in mutex because it is not a cancellation point.
As I recall in Linux, mutex logic is mostly implemented user space using 
atomic test-and-set logic to take the mutex.  The OS is only involved if 
there is a failure to take the mutex.  That greatly reduces the number 
of system calls which are very expensive in KERNEL and PROTECTED modes.
> In addition, there will be a series of modifications based on this PR in the future.
> 1. pthread mutex is changed to depend on nxmutex instead of nxsem;
> 2. nxmutex no longer depends on nxsem and is implemented independently;
The mutex was originally based on counting semaphores to keep the size 
down, rather than duplicating the logic.
> 3. Move priority inheritance from nxsem to nxmutex, and optimize the performance of priority inheritance;

That will break every a lot of people's code.  There is probably a lot 
of application logic that depends on priority inheritance working on 
counting semaphores.  Removing that will problems for a lot of people.

This is, however, consistent with how Linux works.

> If there are any issues with this modification, please inform me, thanks!
>
>
> PR: https://github.com/apache/nuttx/pull/8645
>
>
> BR