You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2017/07/22 01:16:57 UTC

Review Request 61058: Added a lock-free event queue.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/
-----------------------------------------------------------

Review request for mesos and Benjamin Mahler.


Repository: mesos


Description
-------

Added a lock-free event queue.


Diffs
-----

  3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
  3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/61058/diff/1/


Testing
-------

make check


Thanks,

Benjamin Hindman


Re: Review Request 61058: Added a lock-free event queue.

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On July 26, 2017, 2:08 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 127-136 (patched)
> > <https://reviews.apache.org/r/61058/diff/2/?file=1781185#file1781185line127>
> >
> >     Hm.. if a thread is checking `empty()` after the sequence is incremented and before its decremented back in the decomissioned case, it will think incorrectly think it's not empty?
> >     
> >     Shouldn't `empty()` check `decomissioned` first, since once set the queue is empty or being actively emptied by `decomission()`?

This falls into the semantics of the queue being single consumer. The consumer SHOULD NOT be checking if the queue is empty after it has called `decomission()`, hence there shouldn't be a race here. I added quite a few comments to cover this.


> On July 26, 2017, 2:08 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 138-145 (patched)
> > <https://reviews.apache.org/r/61058/diff/2/?file=1781185#file1781185line138>
> >
> >     Interesting that you took a different strategy here compared to the other queue, might be worth a comment saying we have to spin due to the nature of `try_dequeue`, and that's ok since the caller *must* check empty before calling this?

Implemented the TODO I had in the locking implementation and also added more comments to the top of the `EventQueue` class capturing this semantic.


> On July 26, 2017, 2:08 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 147-150 (patched)
> > <https://reviews.apache.org/r/61058/diff/2/?file=1781185#file1781185line147>
> >
> >     There seems to be a requirement that only the single consumer is the one that calls decomission and that once the consumer calls decomission, no more empty->dequeue looping will occur. Otherwise, the consumer can loop:
> >     
> >     (1) decomission, but not drained yet
> >     (2) consumer checks empty, returns true since not yet drained by decomission
> >     (3) consumer then calls dequeue
> >     (4) decomission completes the drain
> >     (5) consumer loops in dequeue
> >     
> >     Or:
> >     
> >     (1) decomission, drained
> >     (2) producer calls enqueue, sequence is incremented
> >     (3) consumer checks empty, returns true since sequence was incremented
> >     (5 optional) producer then finishes call to enqueue by decrementing sequence
> >     (4) consumer calls dequeue, loops

You got it!


> On July 26, 2017, 2:08 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 186-187 (patched)
> > <https://reviews.apache.org/r/61058/diff/2/?file=1781185#file1781185line186>
> >
> >     Why 256 instead of unbounded dequeue?

I don't know how to do that, I have to pass a value into `try_dequeue_bulk`.


> On July 26, 2017, 2:08 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 209-211 (patched)
> > <https://reviews.apache.org/r/61058/diff/2/?file=1781185#file1781185line209>
> >
> >     Shouldn't an item smaller than the next entry we expect be something we CHECK against rather than silently drop?

I added a handful of comments to explain what's going on here!


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/#review181297
-----------------------------------------------------------


On July 22, 2017, 1:16 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61058/
> -----------------------------------------------------------
> 
> (Updated July 22, 2017, 1:16 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7798
>     https://issues.apache.org/jira/browse/MESOS-7798
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a lock-free event queue.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
>   3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61058/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 61058: Added a lock-free event queue.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/#review181297
-----------------------------------------------------------




3rdparty/libprocess/configure.ac
Lines 72-75 (original), 72-80 (patched)
<https://reviews.apache.org/r/61058/#comment256823>

    Some TODOs to make the lock free queues the default configuration would be great.
    
    Also, be sure to add this to `docs/configuration.md`.



3rdparty/libprocess/src/event_queue.hpp
Lines 127-136 (patched)
<https://reviews.apache.org/r/61058/#comment256980>

    Hm.. if a thread is checking `empty()` after the sequence is incremented and before its decremented back in the decomissioned case, it will think incorrectly think it's not empty?
    
    Shouldn't `empty()` check `decomissioned` first, since once set the queue is empty or being actively emptied by `decomission()`?



3rdparty/libprocess/src/event_queue.hpp
Lines 138-145 (patched)
<https://reviews.apache.org/r/61058/#comment256982>

    Interesting that you took a different strategy here compared to the other queue, might be worth a comment saying we have to spin due to the nature of `try_dequeue`, and that's ok since the caller *must* check empty before calling this?



3rdparty/libprocess/src/event_queue.hpp
Lines 147-150 (patched)
<https://reviews.apache.org/r/61058/#comment256986>

    There seems to be a requirement that only the single consumer is the one that calls decomission and that once the consumer calls decomission, no more empty->dequeue looping will occur. Otherwise, the consumer can loop:
    
    (1) decomission, but not drained yet
    (2) consumer checks empty, returns true since not yet drained by decomission
    (3) consumer then calls dequeue
    (4) decomission completes the drain
    (5) consumer loops in dequeue
    
    Or:
    
    (1) decomission, drained
    (2) producer calls enqueue, sequence is incremented
    (3) consumer checks empty, returns true since sequence was incremented
    (5 optional) producer then finishes call to enqueue by decrementing sequence
    (4) consumer calls dequeue, loops



3rdparty/libprocess/src/event_queue.hpp
Lines 149 (patched)
<https://reviews.apache.org/r/61058/#comment256985>

    The minus 1 here and plus 1s below seem odd, can we remove the off by one logic if sequence starts at 0 instead of 1? It looks like it to me.



3rdparty/libprocess/src/event_queue.hpp
Lines 170-182 (patched)
<https://reviews.apache.org/r/61058/#comment256987>

    Shouldn't this trigger a drain into items before we count? As is done with the array operator below.
    
    Also, is the caller of count necessarily the consumer? It seems also that count is only safe to call by a single consumer, which I guess requires that count be called within the context of its owning Process. I seem to have dejavu here, did you already document that requirement somewhere in an earlier change?



3rdparty/libprocess/src/event_queue.hpp
Lines 184-198 (patched)
<https://reviews.apache.org/r/61058/#comment256990>

    Array seems to require that it be called by the single consumer? Otherwise there is concurrent access to items.
    
    It would be great to clarify this, or maybe offer EventQueue::Consumer and EventQueue::Producer to make the consumer interface part of the type system.



3rdparty/libprocess/src/event_queue.hpp
Lines 186-187 (patched)
<https://reviews.apache.org/r/61058/#comment256988>

    Why 256 instead of unbounded dequeue?



3rdparty/libprocess/src/event_queue.hpp
Lines 193 (patched)
<https://reviews.apache.org/r/61058/#comment256989>

    std::move?



3rdparty/libprocess/src/event_queue.hpp
Lines 209-211 (patched)
<https://reviews.apache.org/r/61058/#comment256991>

    Shouldn't an item smaller than the next entry we expect be something we CHECK against rather than silently drop?



3rdparty/libprocess/src/event_queue.hpp
Lines 209-229 (patched)
<https://reviews.apache.org/r/61058/#comment256994>

    It took me a long time to understand this code, I suspect it will be easier to read if we were to store 'next' instead of 'last', since the intent here is to find the next entry and the off by ones go away AFAICT.



3rdparty/libprocess/src/event_queue.hpp
Lines 220-229 (patched)
<https://reviews.apache.org/r/61058/#comment256995>

    I assume the linear search here is fine because the item is likely near the front? Might want to clarify that for the reader?



3rdparty/libprocess/src/event_queue.hpp
Lines 239 (patched)
<https://reviews.apache.org/r/61058/#comment256976>

    



3rdparty/libprocess/src/event_queue.hpp
Lines 239-240 (patched)
<https://reviews.apache.org/r/61058/#comment256978>

    Hm.. what do these represent and why do we need them? (I'm about to find out from reading the code but would be nice to explain with a comment here).



3rdparty/libprocess/src/event_queue.hpp
Lines 240 (patched)
<https://reviews.apache.org/r/61058/#comment256981>

    last is not atomic because there is a single consumer and it's only modified by the consumer? This seems to imply that only the single consumer is allowed to call `empty()`? Seems worth clarifying, since I would expect others to be able to call empty



3rdparty/libprocess/src/event_queue.hpp
Lines 241 (patched)
<https://reviews.apache.org/r/61058/#comment256979>

    Hm.. seeing a deque was pretty surprising, can you explain here what the implementation approach is in general with sequence, last, and items (in addition to just the concurrent queue)?


- Benjamin Mahler


On July 22, 2017, 1:16 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61058/
> -----------------------------------------------------------
> 
> (Updated July 22, 2017, 1:16 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7798
>     https://issues.apache.org/jira/browse/MESOS-7798
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a lock-free event queue.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
>   3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61058/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 61058: Added a lock-free event queue.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/#review181758
-----------------------------------------------------------


Fix it, then Ship it!





3rdparty/libprocess/configure.ac
Lines 72-75 (patched)
<https://reviews.apache.org/r/61058/#comment257430>

    A TODO related to the long term plan for these would be great (i.e. make default and eventually have as the only option?)
    
    Also, looks like you still need to update `docs/configuration.md` per my existing comment?



3rdparty/libprocess/src/event_queue.hpp
Lines 39-44 (patched)
<https://reviews.apache.org/r/61058/#comment257437>

    Thanks!



3rdparty/libprocess/src/event_queue.hpp
Lines 74 (patched)
<https://reviews.apache.org/r/61058/#comment257439>

    I like that there is a producer and consumer directly accessible, should we hide the Producer/Consumer constructors?



3rdparty/libprocess/src/event_queue.hpp
Lines 100 (patched)
<https://reviews.apache.org/r/61058/#comment257436>

    Might be a bit easier to read if you define an EventQueue in each ifdef case rather than split the body of EventQueue across the ifdef?



3rdparty/libprocess/src/event_queue.hpp
Lines 225 (patched)
<https://reviews.apache.org/r/61058/#comment257435>

    s/is/are/



3rdparty/libprocess/src/event_queue.hpp
Lines 256-257 (patched)
<https://reviews.apache.org/r/61058/#comment257434>

    Ditto my comment here and in count() about the number, and should we be consistent about the bulk dequeue size?



3rdparty/libprocess/src/event_queue.hpp
Lines 322 (patched)
<https://reviews.apache.org/r/61058/#comment257433>

    Just curious why we wouldn't just pass the largest number possible here, e.g. SIZE_MAX. Is there a tradeoff here where you're trying to bound the amount of work the consumer has to do upon dequeue? Would be good to clarify in a comment if you have some thoughts on it.



3rdparty/libprocess/src/event_queue.hpp
Lines 333 (patched)
<https://reviews.apache.org/r/61058/#comment257432>

    If you want to be a little more succinct, ~585 years is enough, don't think people need to see the exact number of nanoseconds :)



3rdparty/libprocess/src/event_queue.hpp
Lines 337-338 (patched)
<https://reviews.apache.org/r/61058/#comment257431>

    I read this as "reading or writing" to the queue, did you mean reading or writing to this variable or just that there's only a single consumer?


- Benjamin Mahler


On July 29, 2017, 8:52 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61058/
> -----------------------------------------------------------
> 
> (Updated July 29, 2017, 8:52 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7798
>     https://issues.apache.org/jira/browse/MESOS-7798
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a lock-free event queue.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
>   3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
>   3rdparty/libprocess/src/process.cpp b268cdad776a3ca2a87cbe60eb098bde2a70667c 
> 
> 
> Diff: https://reviews.apache.org/r/61058/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 61058: Added a lock-free event queue.

Posted by Dario Rexin <da...@mesosphere.io>.

> On July 31, 2017, 9:05 p.m., Dario Rexin wrote:
> > 3rdparty/libprocess/src/event_queue.hpp
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/61058/diff/3/?file=1785564#file1785564line331>
> >
> >     I don't think this queue matches the requirements for an actor system. This queue uses sub-queues as a performance optimization and the ordering between the different queues is not guaranteed. The assumption here is, that causal ordering only needs to be maintained per thread. In an actor system however, there is no guarantee on which thread an actor will run. Think of following scenario:
> >     
> >     Actor A receives a burst of messages and takes some time to process them.
> >     
> >     Actor B receives a message gets dispatched to Thread 1 and sends a message to Actor A. The queue of Actor B is now empty, so it donates the thread back to the pool. 
> >     
> >     Actor A is still busy processing the messages it received earlier and did not process the message that Actor B sent.
> >     
> >     Actor B receives another message and gets dispatched to Thread 2. It now sends another message to Actor A. 
> >     
> >     Now Actor A received 2 messages from Actor B from 2 different threads. The moodycamel queue does not guarantee the order of those messages. This violates the arrival ordering defined in the actor model and makes it impossible to reason about message ordering between two actors.
> 
> Benjamin Hindman wrote:
>     For posterity, to account for this "deficiency" in the concurrent queue we've forced the _single_ consumer semantics on the queue and let the consumer reorder any events when the read them out. See the lock-free implementation of `EventQueue::try_dequeue` for more details.
> 
> Dario Rexin wrote:
>     Clarified with benh and benm. There's additional code that restores ordering.

-.- I forgot to click publish


- Dario


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/#review181840
-----------------------------------------------------------


On July 29, 2017, 8:52 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61058/
> -----------------------------------------------------------
> 
> (Updated July 29, 2017, 8:52 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7798
>     https://issues.apache.org/jira/browse/MESOS-7798
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a lock-free event queue.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
>   3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
>   3rdparty/libprocess/src/process.cpp b268cdad776a3ca2a87cbe60eb098bde2a70667c 
> 
> 
> Diff: https://reviews.apache.org/r/61058/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 61058: Added a lock-free event queue.

Posted by Dario Rexin <da...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/#review181840
-----------------------------------------------------------




3rdparty/libprocess/src/event_queue.hpp
Lines 328 (patched)
<https://reviews.apache.org/r/61058/#comment257610>

    I don't think this queue matches the requirements for an actor system. This queue uses sub-queues as a performance optimization and the ordering between the different queues is not guaranteed. The assumption here is, that causal ordering only needs to be maintained per thread. In an actor system however, there is no guarantee on which thread an actor will run. Think of following scenario:
    
    Actor A receives a burst of messages and takes some time to process them.
    
    Actor B receives a message gets dispatched to Thread 1 and sends a message to Actor A. The queue of Actor B is now empty, so it donates the thread back to the pool. 
    
    Actor A is still busy processing the messages it received earlier and did not process the message that Actor B sent.
    
    Actor B receives another message and gets dispatched to Thread 2. It now sends another message to Actor A. 
    
    Now Actor A received 2 messages from Actor B from 2 different threads. The moodycamel queue does not guarantee the order of those messages. This violates the arrival ordering defined in the actor model and makes it impossible to reason about message ordering between two actors.


- Dario Rexin


On July 29, 2017, 8:52 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61058/
> -----------------------------------------------------------
> 
> (Updated July 29, 2017, 8:52 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7798
>     https://issues.apache.org/jira/browse/MESOS-7798
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a lock-free event queue.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
>   3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
>   3rdparty/libprocess/src/process.cpp b268cdad776a3ca2a87cbe60eb098bde2a70667c 
> 
> 
> Diff: https://reviews.apache.org/r/61058/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 61058: Added a lock-free event queue.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61058/
-----------------------------------------------------------

(Updated July 29, 2017, 8:52 p.m.)


Review request for mesos and Benjamin Mahler.


Bugs: MESOS-7798
    https://issues.apache.org/jira/browse/MESOS-7798


Repository: mesos


Description
-------

Added a lock-free event queue.


Diffs (updated)
-----

  3rdparty/libprocess/configure.ac cb2cf4f32be5cbdf9df1e32f9aaf2bbba0a5ae03 
  3rdparty/libprocess/src/event_queue.hpp PRE-CREATION 
  3rdparty/libprocess/src/process.cpp b268cdad776a3ca2a87cbe60eb098bde2a70667c 


Diff: https://reviews.apache.org/r/61058/diff/3/

Changes: https://reviews.apache.org/r/61058/diff/2-3/


Testing
-------

make check


Thanks,

Benjamin Hindman