You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Gian Merlino <gi...@apache.org> on 2020/02/05 06:12:05 UTC

ByteBuffer / Memory / Unsafe et al

Hey Druids,

There has generally been a lot of talk about moving away from ByteBuffer
and towards the DataSketches Memory package (
https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or even
using Unsafe directly. Much of that discussion happened on
https://github.com/apache/druid/issues/3892.

Recently a patch was merged that added datasketches-memory as a dependency
of druid-processing: https://github.com/apache/druid/pull/9308. The reason
was partially due to better performance and partially due to nicer API
(both reasons mentioned in #3892 as well).

JEP 370 is a potential long term solution but it seems a while away from
being ready: https://openjdk.java.net/jeps/370

I wanted to bring the larger discussion back up and see what people think
is a good path forward.

My suggestion is that we migrate the VectorAggregator interface to use
Memory, but keep BufferAggregator the way it is. That way, as we build out
support for vectorization (right now, only timeseries/groupby support it,
and only a few aggregators, but we should be building this out) we'll be
doing it with a nicer and potentially faster API. But we won't need to go
back and redo a bunch of old code, since we'll keep the non-vectorized code
paths the way they are. (And hopefully, one day, delete them all outright.)

Gian

Re: ByteBuffer / Memory / Unsafe et al

Posted by Gian Merlino <gi...@apache.org>.

> Another concern I have is that this current proposal appears quite
cumbersome.  Creating a Memory Segment is easy enough, but to read and
write into that segment appears to require declarations of every different
object type and their offsets using VarHandles.

The way I interpreted it is that if you just wanted to deal with primitive
types, you could get VarHandles for int, long, etc and use those with plain
offsets. So you wouldn't _need_ to declare everything in a super fancy way.
(But the MemoryLayout stuff is trying to make that possible, if it's what
you want.)

There's some example code at
http://cr.openjdk.java.net/~mcimadamore/8235837_javadoc/jdk/incubator/foreign/package-summary.html
.

It looks like it'd be pretty pleasant to use for the simple off-heap data
structures that Druid's query engines need, like hash tables, heaps,
arrays, etc.

But it will be a while before we can depend on it, so I think we need a
shorter term solution too.

On Thu, Feb 6, 2020 at 3:10 PM leerho <le...@gmail.com> wrote:

> I have been studying JEP370 and indeed it looks promising.  However, it is
> only just completed JEP stage and has not been reviewed by the JCP for
> possible creation of a JSR.   Even assuming it flows smoothly through the
> JCP and JSR it is at least 2-3 years away if not longer.
>
> Another concern I have is that this current proposal appears quite
> cumbersome.  Creating a Memory Segment is easy enough, but to read and
> write into that segment appears to require declarations of every different
> object type and their offsets using VarHandles.   Suppose you need to read
> and write a C-like "struct" consisting of tightly packed ints, longs,
> bytes, shorts, doubles, etc.  Just the declarations of that structure would
> take pages of boiler-plate code.  Now, suppose you have a dynamic struct
> that changes over time and evolves (This is exactly the situation we have
> with sketches)!  Yikes! It is possible that the proposed "MemoryLayout" API
> may address this, but the current JEP is rather vague about how flexible it
> can be.
>
> I think that there is also a real risk that even if we wait for JEP370 to
> become reality, it still may not provide the flexibility that we need.  We
> just don't know yet what form it will actually take.
>
> If JEP370 does actually appear, and is workable, we will be strongly
> motivated to replace our current use of Unsafe inside Memory with the newer
> API, and all of that could be behind the current Memory API.
>
>
>
>
> On 2020/02/06 20:33:18, Gian Merlino <gi...@apache.org> wrote:
> > By the way, I just did a quick test of using Memory instead of ByteBuffer
> > in VectorAggregator with groupBy (hash grouping, 8-byte key, two
> > aggregators) and measured a 14% speedup. I think the speedup would be
> > higher with more aggregators or with array-based rather than hash-based
> > grouping (since in those cases, relatively more time is spent doing
> > aggregations).
> >
> > On Thu, Feb 6, 2020 at 11:32 AM Gian Merlino <gi...@apache.org> wrote:
> >
> > > We could make an interface that is a subset of Memory and use that.
> But I
> > > think once the new JEP 370 stuff becomes mainstream it would be best
> to use
> > > it directly, since it offers a richer API and will probably become
> > > considered idiomatic Java over time. So we may still want to drop the
> > > abstraction later, once we drop support for Java pre-14.
> > >
> > > Separately, I think if we do build an abstraction layer here, we need
> to
> > > make sure the performance overhead is zero — it's important that the
> jvm be
> > > able to inline the underlying calls.
> > >
> > > > @Gian Merlino I think i am not 100% sure about the scope you are
> talking
> > > about, thought you are talking about
> > > https://github.com/apache/druid/issues/3892 and doing the work from
> > > bottom up (storage to processing ...)
> > >
> > > For now I'm only proposing using a non-BB API (whether it's Memory or
> our
> > > own abstraction) for the VectorAggregator interface and the query
> engines
> > > that use it. I don't see a strong motivation to tackle the storage
> layer at
> > > this time. The reason is that when I profile Druid queries I usually
> see
> > > the lions share of time being spend in query engines and aggregators,
> so I
> > > think that's what we need to tackle first from a performance
> perspective.
> > >
> > > Btw, a side note, this talk is great:
> > > https://www.youtube.com/watch?v=iwSCtxMbBLI
> > >
> > > On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <
> slim.bouguerra@gmail.com>
> > > wrote:
> > >
> > >> @Jad Sounds good approach was thinking the same as well how can we
> expose
> > >> those using higher level API.
> > >> @Gian Merlino <gi...@apache.org> I think i am not 100% sure about the
> > >> scope you are talking about, thought you are talking about
> > >> https://github.com/apache/druid/issues/3892 and doing the work from
> > >> bottom up (storage to processing ...)
> > >> Can you please highlight the scope and would love to help if you think
> > >> adding such abstraction will add an extra work overhead that you do
> want to
> > >> avoid.
> > >>
> > >>
> > >> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <ja...@imply.io> wrote:
> > >>
> > >>> But then you could never get rid of it... Ideally abstraction layers
> > >>> should
> > >>> be constrained to the actual uses within the system using them
> instead of
> > >>> making something that may be too general and would be hard to
> replace.
> > >>>
> > >>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:
> > >>>
> > >>> > I wonder if Memory itself could be that layer?
> > >>> >
> > >>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io>
> wrote:
> > >>> >
> > >>> > > We could build an abstraction layer on top of the memory
> interface
> > >>> > provided
> > >>> > > by DataSketches. When the JDK gets the new stuff, we can just
> change
> > >>> the
> > >>> > > implementation of the abstraction.
> > >>> > >
> > >>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org>
> wrote:
> > >>> > >
> > >>> > > > The thing that worries me about JEP 370 is that if historical
> Java
> > >>> user
> > >>> > > > migration patterns hold up, we will need to support Java 11
> for a
> > >>> while
> > >>> > > > (probably another 2–3 years), and we would therefore need to
> wait
> > >>> that
> > >>> > > long
> > >>> > > > to use JEP 370. It seems like a long time and until then we
> would
> > >>> be
> > >>> > > stuck
> > >>> > > > with a pretty inferior API.
> > >>> > > >
> > >>> > > > I also would prefer not having to rewrite code a bunch of
> times,
> > >>> but
> > >>> > > that's
> > >>> > > > why I suggested starting by using Memory for the
> VectorAggregator
> > >>> > > interface
> > >>> > > > and stuff that interacts with it. There isn't that much code
> there
> > >>> yet
> > >>> > > > (only a few aggregators implement VectorAggregator). So we will
> > >>> need to
> > >>> > > > write most of it for the first time, and since it is fresh
> code, I
> > >>> > think
> > >>> > > > it'd be nice to use the best API currently available in Java 8
> /
> > >>> 11.
> > >>> > From
> > >>> > > > what I see that is Memory.
> > >>> > > >
> > >>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <
> bslim@apache.org>
> > >>> > wrote:
> > >>> > > >
> > >>> > > > > Hi Gian,
> > >>> > > > > Thanks for bringing this up.
> > >>> > > > > IMO for the long run and looking at how much code will have
> to
> > >>> > change,
> > >>> > > it
> > >>> > > > > makes more sense to rely on JDK based API JEP 370 and have
> this
> > >>> work
> > >>> > > done
> > >>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it
> is
> > >>> far
> > >>> > > away,
> > >>> > > > > seems like there is a good momentum around it.
> > >>> > > > > This does not exclude or means we should not use Memory API
> for
> > >>> other
> > >>> > > > stuff
> > >>> > > > > like sketches et al, in fact i think even for project like
> > >>> Sketches
> > >>> > it
> > >>> > > > > makes more sense to move to newer API offered by the JDK
> rather
> > >>> that
> > >>> > do
> > >>> > > > it
> > >>> > > > > your self.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <
> gian@apache.org>
> > >>> > wrote:
> > >>> > > > >
> > >>> > > > > > Hey Druids,
> > >>> > > > > >
> > >>> > > > > > There has generally been a lot of talk about moving away
> from
> > >>> > > > ByteBuffer
> > >>> > > > > > and towards the DataSketches Memory package (
> > >>> > > > > >
> https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
> > >>> or
> > >>> > > > even
> > >>> > > > > > using Unsafe directly. Much of that discussion happened on
> > >>> > > > > > https://github.com/apache/druid/issues/3892.
> > >>> > > > > >
> > >>> > > > > > Recently a patch was merged that added datasketches-memory
> as a
> > >>> > > > > dependency
> > >>> > > > > > of druid-processing:
> https://github.com/apache/druid/pull/9308
> > >>> .
> > >>> > The
> > >>> > > > > reason
> > >>> > > > > > was partially due to better performance and partially due
> to
> > >>> nicer
> > >>> > > API
> > >>> > > > > > (both reasons mentioned in #3892 as well).
> > >>> > > > > >
> > >>> > > > > > JEP 370 is a potential long term solution but it seems a
> while
> > >>> away
> > >>> > > > from
> > >>> > > > > > being ready: https://openjdk.java.net/jeps/370
> > >>> > > > > >
> > >>> > > > > > I wanted to bring the larger discussion back up and see
> what
> > >>> people
> > >>> > > > think
> > >>> > > > > > is a good path forward.
> > >>> > > > > >
> > >>> > > > > > My suggestion is that we migrate the VectorAggregator
> > >>> interface to
> > >>> > > use
> > >>> > > > > > Memory, but keep BufferAggregator the way it is. That way,
> as
> > >>> we
> > >>> > > build
> > >>> > > > > out
> > >>> > > > > > support for vectorization (right now, only
> timeseries/groupby
> > >>> > support
> > >>> > > > it,
> > >>> > > > > > and only a few aggregators, but we should be building this
> out)
> > >>> > we'll
> > >>> > > > be
> > >>> > > > > > doing it with a nicer and potentially faster API. But we
> won't
> > >>> need
> > >>> > > to
> > >>> > > > go
> > >>> > > > > > back and redo a bunch of old code, since we'll keep the
> > >>> > > non-vectorized
> > >>> > > > > code
> > >>> > > > > > paths the way they are. (And hopefully, one day, delete
> them
> > >>> all
> > >>> > > > > outright.)
> > >>> > > > > >
> > >>> > > > > > Gian
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >> --
> > >>
> > >> B-Slim
> > >>
> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
> > >>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> For additional commands, e-mail: dev-help@druid.apache.org
>
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by leerho <le...@gmail.com>.

I have been studying JEP370 and indeed it looks promising.  However, it is only just completed JEP stage and has not been reviewed by the JCP for possible creation of a JSR.   Even assuming it flows smoothly through the JCP and JSR it is at least 2-3 years away if not longer. 

Another concern I have is that this current proposal appears quite cumbersome.  Creating a Memory Segment is easy enough, but to read and write into that segment appears to require declarations of every different object type and their offsets using VarHandles.   Suppose you need to read and write a C-like "struct" consisting of tightly packed ints, longs, bytes, shorts, doubles, etc.  Just the declarations of that structure would take pages of boiler-plate code.  Now, suppose you have a dynamic struct that changes over time and evolves (This is exactly the situation we have with sketches)!  Yikes! It is possible that the proposed "MemoryLayout" API may address this, but the current JEP is rather vague about how flexible it can be.  

I think that there is also a real risk that even if we wait for JEP370 to become reality, it still may not provide the flexibility that we need.  We just don't know yet what form it will actually take.

If JEP370 does actually appear, and is workable, we will be strongly motivated to replace our current use of Unsafe inside Memory with the newer API, and all of that could be behind the current Memory API.




On 2020/02/06 20:33:18, Gian Merlino <gi...@apache.org> wrote: 
> By the way, I just did a quick test of using Memory instead of ByteBuffer
> in VectorAggregator with groupBy (hash grouping, 8-byte key, two
> aggregators) and measured a 14% speedup. I think the speedup would be
> higher with more aggregators or with array-based rather than hash-based
> grouping (since in those cases, relatively more time is spent doing
> aggregations).
> 
> On Thu, Feb 6, 2020 at 11:32 AM Gian Merlino <gi...@apache.org> wrote:
> 
> > We could make an interface that is a subset of Memory and use that. But I
> > think once the new JEP 370 stuff becomes mainstream it would be best to use
> > it directly, since it offers a richer API and will probably become
> > considered idiomatic Java over time. So we may still want to drop the
> > abstraction later, once we drop support for Java pre-14.
> >
> > Separately, I think if we do build an abstraction layer here, we need to
> > make sure the performance overhead is zero — it's important that the jvm be
> > able to inline the underlying calls.
> >
> > > @Gian Merlino I think i am not 100% sure about the scope you are talking
> > about, thought you are talking about
> > https://github.com/apache/druid/issues/3892 and doing the work from
> > bottom up (storage to processing ...)
> >
> > For now I'm only proposing using a non-BB API (whether it's Memory or our
> > own abstraction) for the VectorAggregator interface and the query engines
> > that use it. I don't see a strong motivation to tackle the storage layer at
> > this time. The reason is that when I profile Druid queries I usually see
> > the lions share of time being spend in query engines and aggregators, so I
> > think that's what we need to tackle first from a performance perspective.
> >
> > Btw, a side note, this talk is great:
> > https://www.youtube.com/watch?v=iwSCtxMbBLI
> >
> > On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <sl...@gmail.com>
> > wrote:
> >
> >> @Jad Sounds good approach was thinking the same as well how can we expose
> >> those using higher level API.
> >> @Gian Merlino <gi...@apache.org> I think i am not 100% sure about the
> >> scope you are talking about, thought you are talking about
> >> https://github.com/apache/druid/issues/3892 and doing the work from
> >> bottom up (storage to processing ...)
> >> Can you please highlight the scope and would love to help if you think
> >> adding such abstraction will add an extra work overhead that you do want to
> >> avoid.
> >>
> >>
> >> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <ja...@imply.io> wrote:
> >>
> >>> But then you could never get rid of it... Ideally abstraction layers
> >>> should
> >>> be constrained to the actual uses within the system using them instead of
> >>> making something that may be too general and would be hard to replace.
> >>>
> >>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:
> >>>
> >>> > I wonder if Memory itself could be that layer?
> >>> >
> >>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:
> >>> >
> >>> > > We could build an abstraction layer on top of the memory interface
> >>> > provided
> >>> > > by DataSketches. When the JDK gets the new stuff, we can just change
> >>> the
> >>> > > implementation of the abstraction.
> >>> > >
> >>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
> >>> > >
> >>> > > > The thing that worries me about JEP 370 is that if historical Java
> >>> user
> >>> > > > migration patterns hold up, we will need to support Java 11 for a
> >>> while
> >>> > > > (probably another 2–3 years), and we would therefore need to wait
> >>> that
> >>> > > long
> >>> > > > to use JEP 370. It seems like a long time and until then we would
> >>> be
> >>> > > stuck
> >>> > > > with a pretty inferior API.
> >>> > > >
> >>> > > > I also would prefer not having to rewrite code a bunch of times,
> >>> but
> >>> > > that's
> >>> > > > why I suggested starting by using Memory for the VectorAggregator
> >>> > > interface
> >>> > > > and stuff that interacts with it. There isn't that much code there
> >>> yet
> >>> > > > (only a few aggregators implement VectorAggregator). So we will
> >>> need to
> >>> > > > write most of it for the first time, and since it is fresh code, I
> >>> > think
> >>> > > > it'd be nice to use the best API currently available in Java 8 /
> >>> 11.
> >>> > From
> >>> > > > what I see that is Memory.
> >>> > > >
> >>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
> >>> > wrote:
> >>> > > >
> >>> > > > > Hi Gian,
> >>> > > > > Thanks for bringing this up.
> >>> > > > > IMO for the long run and looking at how much code will have to
> >>> > change,
> >>> > > it
> >>> > > > > makes more sense to rely on JDK based API JEP 370 and have this
> >>> work
> >>> > > done
> >>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is
> >>> far
> >>> > > away,
> >>> > > > > seems like there is a good momentum around it.
> >>> > > > > This does not exclude or means we should not use Memory API for
> >>> other
> >>> > > > stuff
> >>> > > > > like sketches et al, in fact i think even for project like
> >>> Sketches
> >>> > it
> >>> > > > > makes more sense to move to newer API offered by the JDK rather
> >>> that
> >>> > do
> >>> > > > it
> >>> > > > > your self.
> >>> > > > >
> >>> > > > >
> >>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org>
> >>> > wrote:
> >>> > > > >
> >>> > > > > > Hey Druids,
> >>> > > > > >
> >>> > > > > > There has generally been a lot of talk about moving away from
> >>> > > > ByteBuffer
> >>> > > > > > and towards the DataSketches Memory package (
> >>> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
> >>> or
> >>> > > > even
> >>> > > > > > using Unsafe directly. Much of that discussion happened on
> >>> > > > > > https://github.com/apache/druid/issues/3892.
> >>> > > > > >
> >>> > > > > > Recently a patch was merged that added datasketches-memory as a
> >>> > > > > dependency
> >>> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308
> >>> .
> >>> > The
> >>> > > > > reason
> >>> > > > > > was partially due to better performance and partially due to
> >>> nicer
> >>> > > API
> >>> > > > > > (both reasons mentioned in #3892 as well).
> >>> > > > > >
> >>> > > > > > JEP 370 is a potential long term solution but it seems a while
> >>> away
> >>> > > > from
> >>> > > > > > being ready: https://openjdk.java.net/jeps/370
> >>> > > > > >
> >>> > > > > > I wanted to bring the larger discussion back up and see what
> >>> people
> >>> > > > think
> >>> > > > > > is a good path forward.
> >>> > > > > >
> >>> > > > > > My suggestion is that we migrate the VectorAggregator
> >>> interface to
> >>> > > use
> >>> > > > > > Memory, but keep BufferAggregator the way it is. That way, as
> >>> we
> >>> > > build
> >>> > > > > out
> >>> > > > > > support for vectorization (right now, only timeseries/groupby
> >>> > support
> >>> > > > it,
> >>> > > > > > and only a few aggregators, but we should be building this out)
> >>> > we'll
> >>> > > > be
> >>> > > > > > doing it with a nicer and potentially faster API. But we won't
> >>> need
> >>> > > to
> >>> > > > go
> >>> > > > > > back and redo a bunch of old code, since we'll keep the
> >>> > > non-vectorized
> >>> > > > > code
> >>> > > > > > paths the way they are. (And hopefully, one day, delete them
> >>> all
> >>> > > > > outright.)
> >>> > > > > >
> >>> > > > > > Gian
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >> --
> >>
> >> B-Slim
> >> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
> >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org

Re: ByteBuffer / Memory / Unsafe et al

Posted by Gian Merlino <gi...@apache.org>.

By the way, I just did a quick test of using Memory instead of ByteBuffer
in VectorAggregator with groupBy (hash grouping, 8-byte key, two
aggregators) and measured a 14% speedup. I think the speedup would be
higher with more aggregators or with array-based rather than hash-based
grouping (since in those cases, relatively more time is spent doing
aggregations).

On Thu, Feb 6, 2020 at 11:32 AM Gian Merlino <gi...@apache.org> wrote:

> We could make an interface that is a subset of Memory and use that. But I
> think once the new JEP 370 stuff becomes mainstream it would be best to use
> it directly, since it offers a richer API and will probably become
> considered idiomatic Java over time. So we may still want to drop the
> abstraction later, once we drop support for Java pre-14.
>
> Separately, I think if we do build an abstraction layer here, we need to
> make sure the performance overhead is zero — it's important that the jvm be
> able to inline the underlying calls.
>
> > @Gian Merlino I think i am not 100% sure about the scope you are talking
> about, thought you are talking about
> https://github.com/apache/druid/issues/3892 and doing the work from
> bottom up (storage to processing ...)
>
> For now I'm only proposing using a non-BB API (whether it's Memory or our
> own abstraction) for the VectorAggregator interface and the query engines
> that use it. I don't see a strong motivation to tackle the storage layer at
> this time. The reason is that when I profile Druid queries I usually see
> the lions share of time being spend in query engines and aggregators, so I
> think that's what we need to tackle first from a performance perspective.
>
> Btw, a side note, this talk is great:
> https://www.youtube.com/watch?v=iwSCtxMbBLI
>
> On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <sl...@gmail.com>
> wrote:
>
>> @Jad Sounds good approach was thinking the same as well how can we expose
>> those using higher level API.
>> @Gian Merlino <gi...@apache.org> I think i am not 100% sure about the
>> scope you are talking about, thought you are talking about
>> https://github.com/apache/druid/issues/3892 and doing the work from
>> bottom up (storage to processing ...)
>> Can you please highlight the scope and would love to help if you think
>> adding such abstraction will add an extra work overhead that you do want to
>> avoid.
>>
>>
>> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <ja...@imply.io> wrote:
>>
>>> But then you could never get rid of it... Ideally abstraction layers
>>> should
>>> be constrained to the actual uses within the system using them instead of
>>> making something that may be too general and would be hard to replace.
>>>
>>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:
>>>
>>> > I wonder if Memory itself could be that layer?
>>> >
>>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:
>>> >
>>> > > We could build an abstraction layer on top of the memory interface
>>> > provided
>>> > > by DataSketches. When the JDK gets the new stuff, we can just change
>>> the
>>> > > implementation of the abstraction.
>>> > >
>>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
>>> > >
>>> > > > The thing that worries me about JEP 370 is that if historical Java
>>> user
>>> > > > migration patterns hold up, we will need to support Java 11 for a
>>> while
>>> > > > (probably another 2–3 years), and we would therefore need to wait
>>> that
>>> > > long
>>> > > > to use JEP 370. It seems like a long time and until then we would
>>> be
>>> > > stuck
>>> > > > with a pretty inferior API.
>>> > > >
>>> > > > I also would prefer not having to rewrite code a bunch of times,
>>> but
>>> > > that's
>>> > > > why I suggested starting by using Memory for the VectorAggregator
>>> > > interface
>>> > > > and stuff that interacts with it. There isn't that much code there
>>> yet
>>> > > > (only a few aggregators implement VectorAggregator). So we will
>>> need to
>>> > > > write most of it for the first time, and since it is fresh code, I
>>> > think
>>> > > > it'd be nice to use the best API currently available in Java 8 /
>>> 11.
>>> > From
>>> > > > what I see that is Memory.
>>> > > >
>>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
>>> > wrote:
>>> > > >
>>> > > > > Hi Gian,
>>> > > > > Thanks for bringing this up.
>>> > > > > IMO for the long run and looking at how much code will have to
>>> > change,
>>> > > it
>>> > > > > makes more sense to rely on JDK based API JEP 370 and have this
>>> work
>>> > > done
>>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is
>>> far
>>> > > away,
>>> > > > > seems like there is a good momentum around it.
>>> > > > > This does not exclude or means we should not use Memory API for
>>> other
>>> > > > stuff
>>> > > > > like sketches et al, in fact i think even for project like
>>> Sketches
>>> > it
>>> > > > > makes more sense to move to newer API offered by the JDK rather
>>> that
>>> > do
>>> > > > it
>>> > > > > your self.
>>> > > > >
>>> > > > >
>>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org>
>>> > wrote:
>>> > > > >
>>> > > > > > Hey Druids,
>>> > > > > >
>>> > > > > > There has generally been a lot of talk about moving away from
>>> > > > ByteBuffer
>>> > > > > > and towards the DataSketches Memory package (
>>> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
>>> or
>>> > > > even
>>> > > > > > using Unsafe directly. Much of that discussion happened on
>>> > > > > > https://github.com/apache/druid/issues/3892.
>>> > > > > >
>>> > > > > > Recently a patch was merged that added datasketches-memory as a
>>> > > > > dependency
>>> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308
>>> .
>>> > The
>>> > > > > reason
>>> > > > > > was partially due to better performance and partially due to
>>> nicer
>>> > > API
>>> > > > > > (both reasons mentioned in #3892 as well).
>>> > > > > >
>>> > > > > > JEP 370 is a potential long term solution but it seems a while
>>> away
>>> > > > from
>>> > > > > > being ready: https://openjdk.java.net/jeps/370
>>> > > > > >
>>> > > > > > I wanted to bring the larger discussion back up and see what
>>> people
>>> > > > think
>>> > > > > > is a good path forward.
>>> > > > > >
>>> > > > > > My suggestion is that we migrate the VectorAggregator
>>> interface to
>>> > > use
>>> > > > > > Memory, but keep BufferAggregator the way it is. That way, as
>>> we
>>> > > build
>>> > > > > out
>>> > > > > > support for vectorization (right now, only timeseries/groupby
>>> > support
>>> > > > it,
>>> > > > > > and only a few aggregators, but we should be building this out)
>>> > we'll
>>> > > > be
>>> > > > > > doing it with a nicer and potentially faster API. But we won't
>>> need
>>> > > to
>>> > > > go
>>> > > > > > back and redo a bunch of old code, since we'll keep the
>>> > > non-vectorized
>>> > > > > code
>>> > > > > > paths the way they are. (And hopefully, one day, delete them
>>> all
>>> > > > > outright.)
>>> > > > > >
>>> > > > > > Gian
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>> --
>>
>> B-Slim
>> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
>>
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Gian Merlino <gi...@apache.org>.

We could make an interface that is a subset of Memory and use that. But I
think once the new JEP 370 stuff becomes mainstream it would be best to use
it directly, since it offers a richer API and will probably become
considered idiomatic Java over time. So we may still want to drop the
abstraction later, once we drop support for Java pre-14.

Separately, I think if we do build an abstraction layer here, we need to
make sure the performance overhead is zero — it's important that the jvm be
able to inline the underlying calls.

> @Gian Merlino I think i am not 100% sure about the scope you are talking
about, thought you are talking about
https://github.com/apache/druid/issues/3892 and doing the work from bottom
up (storage to processing ...)

For now I'm only proposing using a non-BB API (whether it's Memory or our
own abstraction) for the VectorAggregator interface and the query engines
that use it. I don't see a strong motivation to tackle the storage layer at
this time. The reason is that when I profile Druid queries I usually see
the lions share of time being spend in query engines and aggregators, so I
think that's what we need to tackle first from a performance perspective.

Btw, a side note, this talk is great:
https://www.youtube.com/watch?v=iwSCtxMbBLI

On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <sl...@gmail.com>
wrote:

> @Jad Sounds good approach was thinking the same as well how can we expose
> those using higher level API.
> @Gian Merlino <gi...@apache.org> I think i am not 100% sure about the
> scope you are talking about, thought you are talking about
> https://github.com/apache/druid/issues/3892 and doing the work from
> bottom up (storage to processing ...)
> Can you please highlight the scope and would love to help if you think
> adding such abstraction will add an extra work overhead that you do want to
> avoid.
>
>
> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <ja...@imply.io> wrote:
>
>> But then you could never get rid of it... Ideally abstraction layers
>> should
>> be constrained to the actual uses within the system using them instead of
>> making something that may be too general and would be hard to replace.
>>
>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:
>>
>> > I wonder if Memory itself could be that layer?
>> >
>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:
>> >
>> > > We could build an abstraction layer on top of the memory interface
>> > provided
>> > > by DataSketches. When the JDK gets the new stuff, we can just change
>> the
>> > > implementation of the abstraction.
>> > >
>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
>> > >
>> > > > The thing that worries me about JEP 370 is that if historical Java
>> user
>> > > > migration patterns hold up, we will need to support Java 11 for a
>> while
>> > > > (probably another 2–3 years), and we would therefore need to wait
>> that
>> > > long
>> > > > to use JEP 370. It seems like a long time and until then we would be
>> > > stuck
>> > > > with a pretty inferior API.
>> > > >
>> > > > I also would prefer not having to rewrite code a bunch of times, but
>> > > that's
>> > > > why I suggested starting by using Memory for the VectorAggregator
>> > > interface
>> > > > and stuff that interacts with it. There isn't that much code there
>> yet
>> > > > (only a few aggregators implement VectorAggregator). So we will
>> need to
>> > > > write most of it for the first time, and since it is fresh code, I
>> > think
>> > > > it'd be nice to use the best API currently available in Java 8 / 11.
>> > From
>> > > > what I see that is Memory.
>> > > >
>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
>> > wrote:
>> > > >
>> > > > > Hi Gian,
>> > > > > Thanks for bringing this up.
>> > > > > IMO for the long run and looking at how much code will have to
>> > change,
>> > > it
>> > > > > makes more sense to rely on JDK based API JEP 370 and have this
>> work
>> > > done
>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is far
>> > > away,
>> > > > > seems like there is a good momentum around it.
>> > > > > This does not exclude or means we should not use Memory API for
>> other
>> > > > stuff
>> > > > > like sketches et al, in fact i think even for project like
>> Sketches
>> > it
>> > > > > makes more sense to move to newer API offered by the JDK rather
>> that
>> > do
>> > > > it
>> > > > > your self.
>> > > > >
>> > > > >
>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org>
>> > wrote:
>> > > > >
>> > > > > > Hey Druids,
>> > > > > >
>> > > > > > There has generally been a lot of talk about moving away from
>> > > > ByteBuffer
>> > > > > > and towards the DataSketches Memory package (
>> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
>> or
>> > > > even
>> > > > > > using Unsafe directly. Much of that discussion happened on
>> > > > > > https://github.com/apache/druid/issues/3892.
>> > > > > >
>> > > > > > Recently a patch was merged that added datasketches-memory as a
>> > > > > dependency
>> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308.
>> > The
>> > > > > reason
>> > > > > > was partially due to better performance and partially due to
>> nicer
>> > > API
>> > > > > > (both reasons mentioned in #3892 as well).
>> > > > > >
>> > > > > > JEP 370 is a potential long term solution but it seems a while
>> away
>> > > > from
>> > > > > > being ready: https://openjdk.java.net/jeps/370
>> > > > > >
>> > > > > > I wanted to bring the larger discussion back up and see what
>> people
>> > > > think
>> > > > > > is a good path forward.
>> > > > > >
>> > > > > > My suggestion is that we migrate the VectorAggregator interface
>> to
>> > > use
>> > > > > > Memory, but keep BufferAggregator the way it is. That way, as we
>> > > build
>> > > > > out
>> > > > > > support for vectorization (right now, only timeseries/groupby
>> > support
>> > > > it,
>> > > > > > and only a few aggregators, but we should be building this out)
>> > we'll
>> > > > be
>> > > > > > doing it with a nicer and potentially faster API. But we won't
>> need
>> > > to
>> > > > go
>> > > > > > back and redo a bunch of old code, since we'll keep the
>> > > non-vectorized
>> > > > > code
>> > > > > > paths the way they are. (And hopefully, one day, delete them all
>> > > > > outright.)
>> > > > > >
>> > > > > > Gian
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
> --
>
> B-Slim
> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Slim Bouguerra <sl...@gmail.com>.

@Jad Sounds good approach was thinking the same as well how can we expose
those using higher level API.
@Gian Merlino <gi...@apache.org> I think i am not 100% sure about the scope
you are talking about, thought you are talking about
https://github.com/apache/druid/issues/3892 and doing the work from bottom
up (storage to processing ...)
Can you please highlight the scope and would love to help if you think
adding such abstraction will add an extra work overhead that you do want to
avoid.


On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <ja...@imply.io> wrote:

> But then you could never get rid of it... Ideally abstraction layers should
> be constrained to the actual uses within the system using them instead of
> making something that may be too general and would be hard to replace.
>
> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:
>
> > I wonder if Memory itself could be that layer?
> >
> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:
> >
> > > We could build an abstraction layer on top of the memory interface
> > provided
> > > by DataSketches. When the JDK gets the new stuff, we can just change
> the
> > > implementation of the abstraction.
> > >
> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
> > >
> > > > The thing that worries me about JEP 370 is that if historical Java
> user
> > > > migration patterns hold up, we will need to support Java 11 for a
> while
> > > > (probably another 2–3 years), and we would therefore need to wait
> that
> > > long
> > > > to use JEP 370. It seems like a long time and until then we would be
> > > stuck
> > > > with a pretty inferior API.
> > > >
> > > > I also would prefer not having to rewrite code a bunch of times, but
> > > that's
> > > > why I suggested starting by using Memory for the VectorAggregator
> > > interface
> > > > and stuff that interacts with it. There isn't that much code there
> yet
> > > > (only a few aggregators implement VectorAggregator). So we will need
> to
> > > > write most of it for the first time, and since it is fresh code, I
> > think
> > > > it'd be nice to use the best API currently available in Java 8 / 11.
> > From
> > > > what I see that is Memory.
> > > >
> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
> > wrote:
> > > >
> > > > > Hi Gian,
> > > > > Thanks for bringing this up.
> > > > > IMO for the long run and looking at how much code will have to
> > change,
> > > it
> > > > > makes more sense to rely on JDK based API JEP 370 and have this
> work
> > > done
> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is far
> > > away,
> > > > > seems like there is a good momentum around it.
> > > > > This does not exclude or means we should not use Memory API for
> other
> > > > stuff
> > > > > like sketches et al, in fact i think even for project like Sketches
> > it
> > > > > makes more sense to move to newer API offered by the JDK rather
> that
> > do
> > > > it
> > > > > your self.
> > > > >
> > > > >
> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org>
> > wrote:
> > > > >
> > > > > > Hey Druids,
> > > > > >
> > > > > > There has generally been a lot of talk about moving away from
> > > > ByteBuffer
> > > > > > and towards the DataSketches Memory package (
> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
> or
> > > > even
> > > > > > using Unsafe directly. Much of that discussion happened on
> > > > > > https://github.com/apache/druid/issues/3892.
> > > > > >
> > > > > > Recently a patch was merged that added datasketches-memory as a
> > > > > dependency
> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308.
> > The
> > > > > reason
> > > > > > was partially due to better performance and partially due to
> nicer
> > > API
> > > > > > (both reasons mentioned in #3892 as well).
> > > > > >
> > > > > > JEP 370 is a potential long term solution but it seems a while
> away
> > > > from
> > > > > > being ready: https://openjdk.java.net/jeps/370
> > > > > >
> > > > > > I wanted to bring the larger discussion back up and see what
> people
> > > > think
> > > > > > is a good path forward.
> > > > > >
> > > > > > My suggestion is that we migrate the VectorAggregator interface
> to
> > > use
> > > > > > Memory, but keep BufferAggregator the way it is. That way, as we
> > > build
> > > > > out
> > > > > > support for vectorization (right now, only timeseries/groupby
> > support
> > > > it,
> > > > > > and only a few aggregators, but we should be building this out)
> > we'll
> > > > be
> > > > > > doing it with a nicer and potentially faster API. But we won't
> need
> > > to
> > > > go
> > > > > > back and redo a bunch of old code, since we'll keep the
> > > non-vectorized
> > > > > code
> > > > > > paths the way they are. (And hopefully, one day, delete them all
> > > > > outright.)
> > > > > >
> > > > > > Gian
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

Re: ByteBuffer / Memory / Unsafe et al

Posted by Jad Naous <ja...@imply.io>.

But then you could never get rid of it... Ideally abstraction layers should
be constrained to the actual uses within the system using them instead of
making something that may be too general and would be hard to replace.

On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <gi...@apache.org> wrote:

> I wonder if Memory itself could be that layer?
>
> On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:
>
> > We could build an abstraction layer on top of the memory interface
> provided
> > by DataSketches. When the JDK gets the new stuff, we can just change the
> > implementation of the abstraction.
> >
> > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
> >
> > > The thing that worries me about JEP 370 is that if historical Java user
> > > migration patterns hold up, we will need to support Java 11 for a while
> > > (probably another 2–3 years), and we would therefore need to wait that
> > long
> > > to use JEP 370. It seems like a long time and until then we would be
> > stuck
> > > with a pretty inferior API.
> > >
> > > I also would prefer not having to rewrite code a bunch of times, but
> > that's
> > > why I suggested starting by using Memory for the VectorAggregator
> > interface
> > > and stuff that interacts with it. There isn't that much code there yet
> > > (only a few aggregators implement VectorAggregator). So we will need to
> > > write most of it for the first time, and since it is fresh code, I
> think
> > > it'd be nice to use the best API currently available in Java 8 / 11.
> From
> > > what I see that is Memory.
> > >
> > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
> wrote:
> > >
> > > > Hi Gian,
> > > > Thanks for bringing this up.
> > > > IMO for the long run and looking at how much code will have to
> change,
> > it
> > > > makes more sense to rely on JDK based API JEP 370 and have this work
> > done
> > > > ONCE as oppose to multiple iteration. FYI i do not think it is far
> > away,
> > > > seems like there is a good momentum around it.
> > > > This does not exclude or means we should not use Memory API for other
> > > stuff
> > > > like sketches et al, in fact i think even for project like Sketches
> it
> > > > makes more sense to move to newer API offered by the JDK rather that
> do
> > > it
> > > > your self.
> > > >
> > > >
> > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org>
> wrote:
> > > >
> > > > > Hey Druids,
> > > > >
> > > > > There has generally been a lot of talk about moving away from
> > > ByteBuffer
> > > > > and towards the DataSketches Memory package (
> > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or
> > > even
> > > > > using Unsafe directly. Much of that discussion happened on
> > > > > https://github.com/apache/druid/issues/3892.
> > > > >
> > > > > Recently a patch was merged that added datasketches-memory as a
> > > > dependency
> > > > > of druid-processing: https://github.com/apache/druid/pull/9308.
> The
> > > > reason
> > > > > was partially due to better performance and partially due to nicer
> > API
> > > > > (both reasons mentioned in #3892 as well).
> > > > >
> > > > > JEP 370 is a potential long term solution but it seems a while away
> > > from
> > > > > being ready: https://openjdk.java.net/jeps/370
> > > > >
> > > > > I wanted to bring the larger discussion back up and see what people
> > > think
> > > > > is a good path forward.
> > > > >
> > > > > My suggestion is that we migrate the VectorAggregator interface to
> > use
> > > > > Memory, but keep BufferAggregator the way it is. That way, as we
> > build
> > > > out
> > > > > support for vectorization (right now, only timeseries/groupby
> support
> > > it,
> > > > > and only a few aggregators, but we should be building this out)
> we'll
> > > be
> > > > > doing it with a nicer and potentially faster API. But we won't need
> > to
> > > go
> > > > > back and redo a bunch of old code, since we'll keep the
> > non-vectorized
> > > > code
> > > > > paths the way they are. (And hopefully, one day, delete them all
> > > > outright.)
> > > > >
> > > > > Gian
> > > > >
> > > >
> > >
> >
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Gian Merlino <gi...@apache.org>.

I wonder if Memory itself could be that layer?

On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <ja...@imply.io> wrote:

> We could build an abstraction layer on top of the memory interface provided
> by DataSketches. When the JDK gets the new stuff, we can just change the
> implementation of the abstraction.
>
> On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:
>
> > The thing that worries me about JEP 370 is that if historical Java user
> > migration patterns hold up, we will need to support Java 11 for a while
> > (probably another 2–3 years), and we would therefore need to wait that
> long
> > to use JEP 370. It seems like a long time and until then we would be
> stuck
> > with a pretty inferior API.
> >
> > I also would prefer not having to rewrite code a bunch of times, but
> that's
> > why I suggested starting by using Memory for the VectorAggregator
> interface
> > and stuff that interacts with it. There isn't that much code there yet
> > (only a few aggregators implement VectorAggregator). So we will need to
> > write most of it for the first time, and since it is fresh code, I think
> > it'd be nice to use the best API currently available in Java 8 / 11. From
> > what I see that is Memory.
> >
> > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> wrote:
> >
> > > Hi Gian,
> > > Thanks for bringing this up.
> > > IMO for the long run and looking at how much code will have to change,
> it
> > > makes more sense to rely on JDK based API JEP 370 and have this work
> done
> > > ONCE as oppose to multiple iteration. FYI i do not think it is far
> away,
> > > seems like there is a good momentum around it.
> > > This does not exclude or means we should not use Memory API for other
> > stuff
> > > like sketches et al, in fact i think even for project like Sketches it
> > > makes more sense to move to newer API offered by the JDK rather that do
> > it
> > > your self.
> > >
> > >
> > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org> wrote:
> > >
> > > > Hey Druids,
> > > >
> > > > There has generally been a lot of talk about moving away from
> > ByteBuffer
> > > > and towards the DataSketches Memory package (
> > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or
> > even
> > > > using Unsafe directly. Much of that discussion happened on
> > > > https://github.com/apache/druid/issues/3892.
> > > >
> > > > Recently a patch was merged that added datasketches-memory as a
> > > dependency
> > > > of druid-processing: https://github.com/apache/druid/pull/9308. The
> > > reason
> > > > was partially due to better performance and partially due to nicer
> API
> > > > (both reasons mentioned in #3892 as well).
> > > >
> > > > JEP 370 is a potential long term solution but it seems a while away
> > from
> > > > being ready: https://openjdk.java.net/jeps/370
> > > >
> > > > I wanted to bring the larger discussion back up and see what people
> > think
> > > > is a good path forward.
> > > >
> > > > My suggestion is that we migrate the VectorAggregator interface to
> use
> > > > Memory, but keep BufferAggregator the way it is. That way, as we
> build
> > > out
> > > > support for vectorization (right now, only timeseries/groupby support
> > it,
> > > > and only a few aggregators, but we should be building this out) we'll
> > be
> > > > doing it with a nicer and potentially faster API. But we won't need
> to
> > go
> > > > back and redo a bunch of old code, since we'll keep the
> non-vectorized
> > > code
> > > > paths the way they are. (And hopefully, one day, delete them all
> > > outright.)
> > > >
> > > > Gian
> > > >
> > >
> >
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Jad Naous <ja...@imply.io>.

We could build an abstraction layer on top of the memory interface provided
by DataSketches. When the JDK gets the new stuff, we can just change the
implementation of the abstraction.

On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <gi...@apache.org> wrote:

> The thing that worries me about JEP 370 is that if historical Java user
> migration patterns hold up, we will need to support Java 11 for a while
> (probably another 2–3 years), and we would therefore need to wait that long
> to use JEP 370. It seems like a long time and until then we would be stuck
> with a pretty inferior API.
>
> I also would prefer not having to rewrite code a bunch of times, but that's
> why I suggested starting by using Memory for the VectorAggregator interface
> and stuff that interacts with it. There isn't that much code there yet
> (only a few aggregators implement VectorAggregator). So we will need to
> write most of it for the first time, and since it is fresh code, I think
> it'd be nice to use the best API currently available in Java 8 / 11. From
> what I see that is Memory.
>
> On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> wrote:
>
> > Hi Gian,
> > Thanks for bringing this up.
> > IMO for the long run and looking at how much code will have to change, it
> > makes more sense to rely on JDK based API JEP 370 and have this work done
> > ONCE as oppose to multiple iteration. FYI i do not think it is far away,
> > seems like there is a good momentum around it.
> > This does not exclude or means we should not use Memory API for other
> stuff
> > like sketches et al, in fact i think even for project like Sketches it
> > makes more sense to move to newer API offered by the JDK rather that do
> it
> > your self.
> >
> >
> > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org> wrote:
> >
> > > Hey Druids,
> > >
> > > There has generally been a lot of talk about moving away from
> ByteBuffer
> > > and towards the DataSketches Memory package (
> > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or
> even
> > > using Unsafe directly. Much of that discussion happened on
> > > https://github.com/apache/druid/issues/3892.
> > >
> > > Recently a patch was merged that added datasketches-memory as a
> > dependency
> > > of druid-processing: https://github.com/apache/druid/pull/9308. The
> > reason
> > > was partially due to better performance and partially due to nicer API
> > > (both reasons mentioned in #3892 as well).
> > >
> > > JEP 370 is a potential long term solution but it seems a while away
> from
> > > being ready: https://openjdk.java.net/jeps/370
> > >
> > > I wanted to bring the larger discussion back up and see what people
> think
> > > is a good path forward.
> > >
> > > My suggestion is that we migrate the VectorAggregator interface to use
> > > Memory, but keep BufferAggregator the way it is. That way, as we build
> > out
> > > support for vectorization (right now, only timeseries/groupby support
> it,
> > > and only a few aggregators, but we should be building this out) we'll
> be
> > > doing it with a nicer and potentially faster API. But we won't need to
> go
> > > back and redo a bunch of old code, since we'll keep the non-vectorized
> > code
> > > paths the way they are. (And hopefully, one day, delete them all
> > outright.)
> > >
> > > Gian
> > >
> >
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Gian Merlino <gi...@apache.org>.

The thing that worries me about JEP 370 is that if historical Java user
migration patterns hold up, we will need to support Java 11 for a while
(probably another 2–3 years), and we would therefore need to wait that long
to use JEP 370. It seems like a long time and until then we would be stuck
with a pretty inferior API.

I also would prefer not having to rewrite code a bunch of times, but that's
why I suggested starting by using Memory for the VectorAggregator interface
and stuff that interacts with it. There isn't that much code there yet
(only a few aggregators implement VectorAggregator). So we will need to
write most of it for the first time, and since it is fresh code, I think
it'd be nice to use the best API currently available in Java 8 / 11. From
what I see that is Memory.

On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> wrote:

> Hi Gian,
> Thanks for bringing this up.
> IMO for the long run and looking at how much code will have to change, it
> makes more sense to rely on JDK based API JEP 370 and have this work done
> ONCE as oppose to multiple iteration. FYI i do not think it is far away,
> seems like there is a good momentum around it.
> This does not exclude or means we should not use Memory API for other stuff
> like sketches et al, in fact i think even for project like Sketches it
> makes more sense to move to newer API offered by the JDK rather that do it
> your self.
>
>
> On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org> wrote:
>
> > Hey Druids,
> >
> > There has generally been a lot of talk about moving away from ByteBuffer
> > and towards the DataSketches Memory package (
> > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or even
> > using Unsafe directly. Much of that discussion happened on
> > https://github.com/apache/druid/issues/3892.
> >
> > Recently a patch was merged that added datasketches-memory as a
> dependency
> > of druid-processing: https://github.com/apache/druid/pull/9308. The
> reason
> > was partially due to better performance and partially due to nicer API
> > (both reasons mentioned in #3892 as well).
> >
> > JEP 370 is a potential long term solution but it seems a while away from
> > being ready: https://openjdk.java.net/jeps/370
> >
> > I wanted to bring the larger discussion back up and see what people think
> > is a good path forward.
> >
> > My suggestion is that we migrate the VectorAggregator interface to use
> > Memory, but keep BufferAggregator the way it is. That way, as we build
> out
> > support for vectorization (right now, only timeseries/groupby support it,
> > and only a few aggregators, but we should be building this out) we'll be
> > doing it with a nicer and potentially faster API. But we won't need to go
> > back and redo a bunch of old code, since we'll keep the non-vectorized
> code
> > paths the way they are. (And hopefully, one day, delete them all
> outright.)
> >
> > Gian
> >
>

Re: ByteBuffer / Memory / Unsafe et al

Posted by Slim Bouguerra <bs...@apache.org>.

Hi Gian,
Thanks for bringing this up.
IMO for the long run and looking at how much code will have to change, it
makes more sense to rely on JDK based API JEP 370 and have this work done
ONCE as oppose to multiple iteration. FYI i do not think it is far away,
seems like there is a good momentum around it.
This does not exclude or means we should not use Memory API for other stuff
like sketches et al, in fact i think even for project like Sketches it
makes more sense to move to newer API offered by the JDK rather that do it
your self.

On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <gi...@apache.org> wrote:

> Hey Druids,
>
> There has generally been a lot of talk about moving away from ByteBuffer
> and towards the DataSketches Memory package (
> https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or even
> using Unsafe directly. Much of that discussion happened on
> https://github.com/apache/druid/issues/3892.
>
> Recently a patch was merged that added datasketches-memory as a dependency
> of druid-processing: https://github.com/apache/druid/pull/9308. The reason
> was partially due to better performance and partially due to nicer API
> (both reasons mentioned in #3892 as well).
>
> JEP 370 is a potential long term solution but it seems a while away from
> being ready: https://openjdk.java.net/jeps/370
>
> I wanted to bring the larger discussion back up and see what people think
> is a good path forward.
>
> My suggestion is that we migrate the VectorAggregator interface to use
> Memory, but keep BufferAggregator the way it is. That way, as we build out
> support for vectorization (right now, only timeseries/groupby support it,
> and only a few aggregators, but we should be building this out) we'll be
> doing it with a nicer and potentially faster API. But we won't need to go
> back and redo a bunch of old code, since we'll keep the non-vectorized code
> paths the way they are. (And hopefully, one day, delete them all outright.)
>
> Gian
>