You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by "pussuw (via GitHub)" <gi...@apache.org> on 2023/03/28 10:54:27 UTC

[GitHub] [nuttx] pussuw opened a new issue, #8917: nxsem_wait_irq with MMU

pussuw opened a new issue, #8917:
URL: https://github.com/apache/nuttx/issues/8917

   I have encountered a little issue with semaphores, and nxsem_wait_irq which is used to wake the process waiting for a sempahore and do other cleanup i.e. handle priority inheritance etc. 
   
   The issue arises when nxsem_wait_irq is called from ISR or from the signal dispatch logic; while it works like a charm with flat addressing, it fails when virtual memory is in use.
   
   The issue obviously is that sem_t is userspace data, and if the current task is not the one holding / waiting for the semaphore, the clean up fails due to wrong mappings.
   
   Now, the solution might be as simple as changing the mappings temporarily to the semaphore holder's mappings, but this to me seems like quite a heavy operation. At least the TLB is lost, maybe with other side effects as well.
   
   So I created this issue to open discussion about the subject. Is this a more generic problem (affecting more than just the semaphore API), needing a generic solution ? Has anyone ever thought of this issue and how to fix it ? Maybe coming up with a solution but no time to implement it ? If so I'll be more than happy to do the fix and test it.
   
   Regardless, I will present a patch (and reference this issue there) where the mappings are just swapped because that works and is a way to fix the issue. The issue is easier to grasp this way I think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1513086554

   I'll move the discussion back here to avoid spamming the mailing list. 
   
   My understanding of the semaphore handling logic increases bit by bit and thus far I have found the following:
   - `tcb->waitobj` is user by the system scheduler to implement the `TSTATE_WAIT_SEM`-list, it accesses `dq_queue_t waitlist;`
   - This basically means that whenever a semaphore is set into `tcb->waitobj` -> **the memory must be accessible to the kernel, even from other processes** and this access has to be deterministic (preferably also fast).
   - Adding temporary mappings to (or addrenv switch) into `nxsem_wait_irq` does fix the watchdog interrupt and signal dispatch cases, but not the case mentioned above -> **doing any temp mappings in `nxsem_wait_irq` is simply futile**.
   - `nxsem_canceled` is called from `nxsem_wait_irq`, this accesses the priority inheritance list `tcb->holdsem`, which also holds a reference to the semaphore via `holdsem->sem`, and this points to user memory also.
   - This means that not only does the semaphore access not work, priority inheritance is also broken with CONFIG_BUILD_KERNEL. -> **the pointer in `tcb->holdsem->sem` must also be accessible to the kernel in a deterministic (preferably also fast) manner**.
   
   Implementing dynamic mappings for the semaphore works wonderfully when the semaphore structure fits inside a single page. When it crosses a page boundary the problems start. In this case we lose all measures of deterministic behavior, because we need to:
   - Do two table walks to find the physical pages -> this is O(1)
   - Find a virtual memory area where the mapping fits -> this is O(n) orO(log(n)) at best
   - Map the pages into kernel addressable memory -> this is O(1)
   - Mark the mappings into a list, where they can be found later -> this is O(1)
   
   When the semaphore is taken, the process marks itself as the holder. 
   - The mapping needs to exist until the the semaphore is released, because only at this point the holder list can be updated. This means the mapping can live for a LONG time.
   - When the semaphore is released, the virtual memory must be unmapped.
   - This means the mapping list must be traversed to find whether a mapping really exists -> this is O(n)
   - Why do we need a list? Because semaphores are used by the kernel itself too, these don't need to be dynamically mapped.
   
   So to me it seems like mapping the semaphore to kernel memory won't fix anything. Unless there is some obvious solution I'm missing.
   
   Please correct me if I'm typing nonsense, as I'm still trying to grasp the big picture.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1505177124

   I finally have time to dig into this issue. 
   
   I will fix the waitobj issue by mapping any such object into kernel addressable memory. I think semaphores and mqueues qualify as such (more?).
   
    This achieves two things:
   - It once and for all fixes accessing (user) synchronization structures from the kernel
   - It avoids the unnecessary TLB flush / performance penalty when swapping mappings
   
   Patches related to this will start coming soonish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1490218854

   Unfortunately I found another location where tcb->waitobj is accessed from the wrong context (happens when a user process is running and `ps` is executed from the terminal): 
   https://github.com/apache/nuttx/blob/f4683713327aaeab0656e3ea83814b44267bdc4a/sched/sched/sched_get_stateinfo.c#L81-L103
   
   This could be kludged together by creating a getter function for the sem_t object, which would make a copy of it into kernel addressable memory (e.g. a stack variable inside that function). Read only access could then be performed on that. Sound like a dirty workaround to me...
   
   I did one kludge fix, I'm wondering if this is going to be a recursive set of kludges.. Resolving the kernel addressable virtual address for waitobj might be a safer option still. I'll need to think for a moment how this can be achieved. 
   
   Walking the page directory in one option, and this support needs to be implemented for each platform that supports MMU.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] patacongo commented on issue #8917: nxsem_wait_irq with MMU

Posted by "patacongo (via GitHub)" <gi...@apache.org>.
patacongo commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1488747711

   Another option would be logic that finds the kernel space virtual address associated with the user space virtual address.  That would assume that the entire address space is addressable in kernel mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1486666385

   Another option that comes to mind would be to temporarily map the user semaphore data to a kernel scratch area but this is quite a lot of work, as the physical page is needed and this is not easily obtainable. A significant amount of new mm related bookkeeping would be needed for this solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1488805778

   Thanks for the response!
   
   Those APIs are known to me but as far as I can tell they either rely on the correct mappings already being active or on mm_struct and vm_area_struct (which we don't quite have, yet), which contain the necessary info to find the physical page.
   
   The solution I posted is already merged and does of course work, there is just a tiny performance penalty via losing the entire TLB instead of just losing 1 page. 
   
   I was trying to think of alternate solutions to the problem, like releasing the semaphore from the user process instead (in sem_wait() after the context switch when the correct mappings are present) but the problem is that updating the task status LUT becomes impossible, as `dq_rem((FAR dq_entry_t *)wtcb, SEM_WAITLIST(sem));` needs access to sem->waitlist to move the task into the ready-to-run list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] pussuw commented on issue #8917: nxsem_wait_irq with MMU

Posted by "pussuw (via GitHub)" <gi...@apache.org>.
pussuw commented on issue #8917:
URL: https://github.com/apache/nuttx/issues/8917#issuecomment-1488812379

   The inner workings of those copy_from/to_user are basically handling the access rights from/to user space. Normally the kernel cannot access the user's memory and those functions are used to wrap the operation:
   
   - open access to user
   - read/write to user
   - close access to user
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org