You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Will Lauer <wl...@oath.com.INVALID> on 2018/07/19 18:20:44 UTC

synchronization question about datasketches aggregator

A colleague recently pointed out to me that all the sketch operations that
take place in SketchAggregator (in the datasketches module) use a
SychronizedUnion class that basically wraps a normal sketch Union and
synchronizes all operations. From what I can tell with other aggregators in
the Druid code base, there doesn't appear to be a need to synchronize. It
looks like Aggregators are always processed from within a single thread. Is
it reasonable to remove all the syncrhonizations from the SketchAggregator
and avoid the performance hit that they impose at runtime?

Will

Will Lauer
Senior Principal Architect

Progress requires pain

m: 508.561.6427

o: 217.255.4262

Re: synchronization question about datasketches aggregator

Posted by Roman Leventov <le...@gmail.com>.
There is a race between aggregators and ingestion updates. Actually, many
aggregators are vulnerable now. See this issue:
https://github.com/apache/incubator-druid/pull/3956 and a conversation
starting from this message:
https://github.com/apache/incubator-druid/pull/5148#discussion_r170906998.

However, you could replace a simple synchronized with ReadWriteLock or
Striped<ReadWriteLock> (see ArrayOfDoublesSketchMergeBufferAggregator for
example), that would be a useful contribution to Druid.

On Thu, 19 Jul 2018 at 13:21, Will Lauer <wl...@oath.com.invalid> wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>

Re: synchronization question about datasketches aggregator

Posted by Anastasia Braginsky <an...@oath.com.INVALID>.
 Hi Guys,
Just wanted to pay your attention that once OakIncrementalIndex will be in place there is no need to manage the issue of synchronization between aggregators and ingestions. Part of Oak benefits is the synchronization for the simultaneous writes and reads of the same key in the map.

    On Thursday, July 19, 2018, 10:17:18 PM GMT+3, Gian Merlino <gi...@apache.org> wrote:  
 
 Hi Will,

Check out also this thread for related discussion:

https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E

On Thu, Jul 19, 2018 at 11:21 AM Will Lauer <wl...@oath.com.invalid> wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>
  

Re: synchronization question about datasketches aggregator

Posted by Gian Merlino <gi...@apache.org>.
Hi Will,

Check out also this thread for related discussion:

https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E

On Thu, Jul 19, 2018 at 11:21 AM Will Lauer <wl...@oath.com.invalid> wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>