You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Olabusayo Kilo <ok...@tresys.com> on 2020/08/06 07:32:45 UTC
Re: Coroutines - was Re: Daffodil SAX API Proposal

Hey Mike, can you send a copy for the Thread-based coroutines library 
referenced below?

On 4/24/20 9:28 AM, Beckerle, Mike wrote:
> A further thought on this. The overhead difference between continuations and threads was 1 to 4 (roughly).
> 
> If you add real workload to what happens on either side of that producer-consumer relationship, I bet this difference disappears into the noise, not because it becomes more efficient due to less contention, but because it's such a tiny fraction of the actual work being done.
> 
> The Thread-based coroutines library, I have a copy of in a separate sandbox, so if you want to grab that I'll get it over to you so you don't have to dig for it.
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Friday, April 24, 2020 8:53 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
> 
> That's really informative and confirms intuition that using threads really hurts performance when all you need is a stack switch.
> 
> In this case reducing contention should reduce total work, but that depends on how carefully the queue is implemented. If it is a single lock it may not matter.
> 
> We actually dont care about faster through parallelism because we should assume the machine is already saturated with work. We want to reduce total amount of work done.
> 
> 
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Friday, April 24, 2020 8:02:37 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
> 
> I decided to look at performance of three potential options to see if
> that would rule anything out. I looked at 1) coroutines 2) continuations
> 3) threads with BlockingQueue. For each of these, I modified the gist to
> remove printlns and use a different producer consumer model (which is
> actually very straightforward in we come across other alternatives to
> test). So everything is the same except for how the SAX content handler
> interacts with the custom InfosetInputter. For the performance numbers
> below, I created enough "events" in a loop so that rate of events
> remained roughly the same as I increased the number of events.
> 
> 1) coroutines
> 
> It turns out the coroutines library has a limitation where the
> yieldval() call must be directly inside the coroutine{} block. This is
> basically a non-starter for us, since the entire unparse call needs to
> be a coroutine, and the yieldval call happens way down the stack. So not
> only does this not have any active development, it functionally won't
> even work for us.
> 
> 2) continuations
> 
> 16.50 million events per second
> 
> 3) thread with BlockingQueue
> 
> I think this is similar to the Coroutine library you wrote for Daffodil
> (though it looks like it's been removed, we can probably find it in git
> the history if we want). This runs the unparse method in a thread and
> has a blocking queue that the producer pushes to and the consumer takes
> from. I tested with different queue sizes to see how that affects
> performance:
> 
>    size  rate
>       1  0.14 million events per second
>      10  1.36 million events per second
>     100  3.18 million events per second
>    1000  3.16 million events per second
> 100000  3.09 million events per second
> 
> So this BlockinQueue approach is quite a bit slower, and definitely
> requires batching events to be somewhat performant. I guess this
> slowness makes sens as this approach creates a thread for the unparse,
> has different threads blocking on this queue, and also creates a bunch
> of event objects to put in the queue (the continuation approach just
> mutates state so no extra objects are needed). It is possible that this
> isn't an accurate test since the producer is going crazy fast since I'm
> just incrementing a Long in each loop iteration. In the real world, the
> producer is going to be parsing XML or something, so won't be as fast.
> Perhaps if the producer was actually slower there would be less thread
> contention and actually allow for more parallel work?
> 
> 
> On 4/23/20 5:41 PM, Beckerle, Mike wrote:
>> I am pretty worried about the @suspendable annotation. The way this shift/reset stuff works is it modifies the scala compiler to do something called continuation passing style. aka CPS.
>>
>> I'd be ok if that was isolated to just a segment of the code. Maybe there is some natural way to do that?
>>
>> But it seems to me that all code on the pathway from where a reset block is entered to where a shift is called, all of it has to propagate this @suspendable behavior and be compiled by way of this CPS plug in. That looks ok for the tiny toy examples, but for a giant code base like Daffodil runtime1 unparser, .... that seems fragile, potentially has impact on debugging, memory allocation, and performance of the code, and,... well given the lack of enthusiastic support for shift/reset I think it is risky.
>>
>> The only other option I can think of is to spawn a separate thread, allow true concurrency in a producer-consumer model.
>>
>> We already have a Coroutines library you may recall. We're not using it in the code base now, and it's fairly high-overhead as it is a depth 1 queue, so is constantly switching threads. It might have better performance characteristics if the switching was reduced to once every 100 events or similar. Streaming behavior does not have to switch from events to pull at granularity 1 event per pull, it can be much coarser than that to push overhead down.
>>
>> The limiting thing here really seems to be the JVM. Java virtual machines simply don't support the concept of co-routines in any sensible manner.
>>
>> There are also some coroutine-style libraries for Java that depend on byte-code modification. I suspect those have a similar issue to the CPS transformation, ie., all the code on the way to a suspension requires the byte code modification, but I may be wrong.
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, April 23, 2020 11:21 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>
>> Thanks Mike! Continuations seems like a better alternative, at least
>> from a support point of view. Though, it's a little concerning that no
>> one is really stepping up to port it to 2.13, but I don't think we're in
>> any rush to get to 2.13. And I personally find the reset/shift concept a
>> bit harder to wrap my head around than the co-routine resume/yield, but
>> ultimately it's not too bad.
>>
>> To see how it would work with our DataProcessor/InfosetInputter, I
>> forked and updated your gist to include things like InfosetInputters,
>> DataProcessor, ContentHandler, etc. and added a bunch of println's and
>> comments to make sure things were behaving the way I thought they should.
>>
>> https://gist.github.com/stevedlawrence/5e16081f4690448de6131af02daacea9
>>
>> I think it came out pretty straightforward. I also modified this so that
>> there isn't as much back and forth between hasNext/next like I have in
>> the current proposal. The only time we go back the to
>> ContentHandler/producer is when next() is called, and we only go back to
>> the InfosetInputter/consumer when a complete event is found, including
>> hasNext.
>>
>> I do have one concern with this approach. Scala required the
>> @suspendable annotation on the unparse() method of the DataProcessor and
>> on the next() method of the InfosetInputter for both the abstract class
>> and concrete SAX implementation. I'm not sure if that annotation causes
>> any problems when not used inside a reset block (i.e. old API style), or
>> if that annotation will end up cascading throughout the codebase. Seems
>> like there's a possibility for that to happen. Maybe I just need to
>> reorganize the code a bit, but it's not clear to me how.
>>
>>
>> On 4/22/20 7:18 PM, Beckerle, Mike wrote:
>>> scala continuations is supported on 2.11 and 2.12, but work in progress for 2.13. The main web page for it says it is looking for a lead developer and without that typesafe/lightbeam is doing bare minimum maintenance.
>>>
>>> A producer/consumer idiom like what we need is easily expressed using this shift/reset thing.
>>>
>>> Here's a gist that does a control turnaround from a handler to a pull-oriented while loop. Took me a bit of research to get the build.sbt right so this would "just work"
>>>
>>> https://gist.github.com/mbeckerle/4c1d8f8c365958ef7d01bf770fa6317c
>>>
>>>
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, April 22, 2020 5:01 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Daffodil SAX API Proposal
>>>
>>> Another possibility is scala-asynch which I think can do what we want.
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, April 22, 2020 4:34 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Daffodil SAX API Proposal
>>>
>>> The alternative is probably scala.util.continuations aka "shift and reset".
>>>
>>> It's much harder to understand and use, but at least its in the standard library so is supported. (I think.)
>>>
>>> ________________________________
>>> From: Steve Lawrence <sl...@apache.org>
>>> Sent: Wednesday, April 22, 2020 3:40 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Daffodil SAX API Proposal
>>>
>>> I responded.
>>>
>>> I checked the license to make sure it's compatible (BSD-3), but I didn't
>>> actually check what versions of Scala it works with.
>>>
>>> Looks like it is only published for 2.11, and the repo hasn't been
>>> updated for at least 3 years. There is a 2.12.x branch in their repo,
>>> but it too hasn't been updated in a long time. We might have to see how
>>> much effort it would take to update that library, or perhaps find
>>> another library.
>>>
>>>
>>> On 4/22/20 3:28 PM, Beckerle, Mike wrote:
>>>> I reviewed this and added a comment about the only significant issue, which I think just boils down to trying to keep the coroutining back and forth as simple as possible.
>>>>
>>>> Another thought: Is the scala coroutines library supported in 2.11 and 2.12 (and 2.13 for being future-safe?)
>>>>
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <sl...@apache.org>
>>>> Sent: Wednesday, April 22, 2020 1:06 PM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Daffodil SAX API Proposal
>>>>
>>>> I've added a proposal to add a SAX API support to Daffodil.
>>>>
>>>> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+SAX+API
>>>>
>>>> Many libraries and applications already support SAX, so this should
>>>> provide a means for more seamless integration into different toolsuites,
>>>> opening up the places where Daffodil could be easily integrated.
>>>>
>>>> SAX is also generally viewed as having a lower memory overhead, though
>>>> this does not attempt to solve the memory issues related to Daffodil and
>>>> the internal infoset representation. This essentially just adds a SAX
>>>> compatible API around our existing API. Other changes are needed to
>>>> reduce our memory overhead and truly support a streaming model.
>>>>
>>>> - Steve
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 

-- 
Best Regards
Lola K.