You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@datasketches.apache.org by Alexander Saydakov <sa...@verizonmedia.com.INVALID> on 2020/04/14 00:11:25 UTC

Cross-platform discrepancies

Hi everyone!
I would like to discuss some discrepancies we accumulated as results of
some decisions around prioritizing our development efforts in Java and C++,
and also around integration with data processing systems.

Druid is one of the most important systems in which our library is used. It
has a strong requirement to have off-heap memory support and also to
operate in a single contiguous block. This was the main motivation behind
what we call Direct memory support in our Java library.

Currently some sketches do not support Direct memory, in particular: KLL
quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
Therefore they are not integrated with Druid yet.

Quantiles sketch:
In Java we have ItemsSketch<T> for any type and a specialized numeric
DoublesSketch with Direct memory support - this is the one integrated with
Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
historical reasons in Java we have implemented KllFloatsSketch only - no
generic implementation and no Direct memory support. This contiguous memory
block mode is so limiting that we did not think that KLL algorithm can be
implemented efficiently in that mode. We might need to reconsider this.
Development is C++ happened later, so we did not implement the older
algorithm. We may want to do it if a strong use case arises to support
reading data prepared using Java library. On the other hand, in C++
kll_sketch is a template, and therefore it can be a container of any user
type. In Java we felt the need to have both: a generic implementation and a
specialized implementation for some numeric type to avoid object overhead.

To summarize the situation with quantiles sketches:
Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no older
quantiles sketch.
Druid: quantiles DoublesSketch
PostgreSQL: kll_sketch<float>

Regarding distinct counting sketches:
Druid: Theta, HLL
PostgreSQL: Theta, HLL, CPC

Regarding frequent items:
Druid: none
PostgreSQL: frequent_items_sketch<std::string>

It seems to me that we need to do something to improve the situation.
Possible tasks:
- Discuss alternatives to Direct memory mode in Druid with Druid developers
(again). They were not quite happy with the current approach either. It
leads to gross overallocation of BufferAggredator memory. There was an open
issue in Druid repo about this I believe.
- Find some way to have KLL sketch in Druid
- Find some way to have Frequent Items (frequent strings?) sketch in Druid
- Consider KllItemsSketch<T> in Java
- Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL - with
double values, compatible with Druid)

I would appreciate your thoughts about this.
Thank you.
Alexander Saydakov

Re: Cross-platform discrepancies

Posted by Gian Merlino <gi...@apache.org>.

The way postgres does it sounds pretty reasonable.

I'd like to be able to do the reallocations out of the pre-allocated
processing buffer, so that way freeing them all at the end of the query is,
well, free. (We can just reuse the buffer for the next query without
actually doing anything.) Maybe we carve out some space in the processing
buffer for reallocations. Perhaps each aggregator type can give us a hint
about the expected amount of additional memory needed beyond the starting
size (so, zero for fixed-size aggregators, maybe 10% overhead for ones that
need to grow?) and we can use that to size the carve-out.

If aggregators need more space beyond the carve-out, we could allocate
memory outside of the processing buffer, perhaps on the Java heap or
perhaps by grabbing additional off-heap memory beyond the processing
buffer. (DoublesSketchBuildBufferAggregator already does this sneakily: the
DirectUpdateDoublesSketch it is based on might bail completely out of the
off-heap buffer and allocate new memory on-heap, without Druid knowing
about it.)

I don't think we'd be able to reuse the original allocation. I'm not sure
if it'd be valuable to allow the aggregator to continue to use it: if so we
could provide an array of positions instead of a single position in the
BufferAggregator aggregate method, and an array of array of positions in
the VectorAggregator one. If not then we can just leave it there unused
until the query is over, when it'll get released.

Does this seem like a useful way of doing things?

Himanshu does this line up with what you were thinking about in
https://github.com/apache/druid/issues/8126? What are your thoughts?

Gian

On Thu, Apr 16, 2020 at 1:30 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

>
> PostgreSQL provides its own memory allocation and deallocation functions
> palloc and pfree (replacements for malloc and free). Documentation for
> user-defined functions in C is here:
> https://www.postgresql.org/docs/current/xfunc-c.html
> I am not sure you need such a broad context about writing UDFs in general.
> The main point about aggregating functions is that they consist of two
> parts: a state transition function and a finalizer. State transition
> functions are called to process the next value in an aggregation (such as
> group-by). One of the parameters is a pointer to the current state. It is
> null on the first call, and the function allocates an arbitrary data
> structure using palloc, updates it with the given value and returns the
> pointer. It can choose to reallocate and return a different pointer or
> return the same pointer. A finalizer is called at the end to convert this
> arbitrary state object into whatever return type is specified for this
> aggregation. When the query (or transaction) is finished, all allocations
> in this context are freed.
>
>
> On Wed, Apr 15, 2020 at 11:03 PM Himanshu <g....@gmail.com> wrote:
>
>> Alex, Thanks for posting it here.
>>
>> We have definitely thought about ability to start with a small off-heap
>> buffer and grow to bigger off-heap buffers if needed and  that is what
>> https://github.com/apache/druid/issues/8126 is trying to address.
>> However, we have never considered one sketch being spread across several
>> different contiguous memory blocks. As you pointed out, That would likely
>> require BufferAggregator interface having access to Druid supplied
>> MemoryAllocator.
>>
>> > ... but we would need a  "maloc / free"-like API similar to how
>> PostgreSQL has implemented their aggregations API. ...
>> Can you point to some documentation/code describing what kind of API does
>> Postgres provide for implementing custom aggregators.
>>
>> -- Himanshu
>>
>> On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
>> <sa...@verizonmedia.com.invalid> wrote:
>>
>>> Let's involve Druid developers as Gian suggested.
>>>
>>>
>>> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>>>
>>> > Hi,
>>> > It is pretty easy to graphically view the various cross-platform
>>> > discrepancies that Alex is talking about in the Sketch Features Matrix
>>> > <
>>> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
>>> >.
>>> > We still need to add the PostgreSQL System Integration column.
>>> > Nevertheless, it is easy to see that where we have implemented "Off
>>> Java
>>> > Heap" capability is directly correlated to our Druid integrations.
>>> >
>>> > To underscore Alex's point, many of the other sketches in this matrix
>>> that
>>> > are not currently integrated with Druid have very dynamic memory
>>> > requirements and may be impractical to implement in a contiguous-memory
>>> > model.  They could still be implemented "off-heap", but we would need a
>>> > "maloc / free"-like API similar to how PostgreSQL has implemented their
>>> > aggregations API.
>>> >
>>> > One of the conclusions from this matrix is that Druid would have a much
>>> > richer and more powerful sketching analytics capability to offer its
>>> > customers if it had a more flexible aggregations API.
>>> >
>>> >
>>> >
>>> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
>>> > <sa...@verizonmedia.com.invalid> wrote:
>>> >
>>> >> Gian,
>>> >> Thanks for your reply. I did not mean to suggest moving away from
>>> >> off-heap memory. As I see it, the problem mostly is with preallocating
>>> >> fixed blocks for each sketch of some maximum size, which, on the one
>>> hand,
>>> >> is wasteful because most sketches will be small, and on the other
>>> hand,
>>> >> requires operating in a "contiguous block" mode, which is very
>>> limiting.
>>> >> Perhaps you might consider something like PostgreSQL does: provide an
>>> >> allocator, so that the memory allocation is controlled and can be
>>> done in a
>>> >> context of a query or transaction, but it does not impose a single
>>> >> contiguous block for a data structure. So a custom aggregation
>>> function
>>> >> allocates memory for its own state on the first call, and PostgreSQL
>>> passes
>>> >> this state to the next call as a pointer and so on. At any moment the
>>> >> aggregation can choose to reallocate and return a new pointer. And
>>> that
>>> >> state can be a complex data structure with pointers to other blocks
>>> >> allocated using the same mechanism, so it does not have to be a single
>>> >> contiguous block.
>>> >>
>>> >> This will still require some changes to our library to support memory
>>> >> allocation like this, but it seems to be less challenging then the
>>> current
>>> >> Direct memory mode we have.
>>> >>
>>> >> There are some trade offs here, as usually. For instance, currently
>>> (if
>>> >> implemented) wrapping a chunk of bytes as a fast alternative to
>>> creating an
>>> >> on-heap object during deserialization is the same operation as
>>> interpreting
>>> >> this "contiguous block" aggregation state. But if the aggregation
>>> state is
>>> >> fragmented, it is not going to be equivalent to a serialized blob
>>> anymore.
>>> >> This is how it is in the current C++ implementation - there is no
>>> >> equivalent of wrap() as in Direct mode in Java. This is a potential
>>> >> performance improvement that can be discussed separately. Some
>>> sketches can
>>> >> choose to operate in this contiguous block mode if it does not go
>>> against
>>> >> the algorithm.
>>> >>
>>> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> >>
>>> >>> Hi Alexander,
>>> >>>
>>> >>> It sounds like integrating with Druid is an important concern here.
>>> It
>>> >>> might be nice to cross post the discussion to dev@druid.
>>> >>>
>>> >>> Personally, I don't think it's likely the Druid community will move
>>> away
>>> >>> from requiring that aggregators be able to work with raw memory. The
>>> >>> requirement exists so it can use allocate an arena for aggregation
>>> space,
>>> >>> minimizing GC load. But there have been some proposals floating
>>> around for
>>> >>> adding ability to dynamically allocate new memory (perhaps out of
>>> the same
>>> >>> arena or perhaps some other way — the proposals varied). I think it
>>> would
>>> >>> be useful for Druid devs to understand more deeply what effect the
>>> >>> allocator API has on the DataSketches implementation.
>>> >>>
>>> >>> (By the way, the Druid issue you're referring to might be
>>> >>> https://github.com/apache/druid/issues/8126.)
>>> >>>
>>> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>>> >>> <sa...@verizonmedia.com.invalid> wrote:
>>> >>>
>>> >>>> Hi everyone!
>>> >>>> I would like to discuss some discrepancies we accumulated as
>>> results of
>>> >>>> some decisions around prioritizing our development efforts in Java
>>> and C++,
>>> >>>> and also around integration with data processing systems.
>>> >>>>
>>> >>>> Druid is one of the most important systems in which our library is
>>> >>>> used. It has a strong requirement to have off-heap memory support
>>> and also
>>> >>>> to operate in a single contiguous block. This was the main
>>> motivation
>>> >>>> behind what we call Direct memory support in our Java library.
>>> >>>>
>>> >>>> Currently some sketches do not support Direct memory, in particular:
>>> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
>>> sketch.
>>> >>>> Therefore they are not integrated with Druid yet.
>>> >>>>
>>> >>>> Quantiles sketch:
>>> >>>> In Java we have ItemsSketch<T> for any type and a specialized
>>> numeric
>>> >>>> DoublesSketch with Direct memory support - this is the one
>>> integrated with
>>> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
>>> but for
>>> >>>> historical reasons in Java we have implemented KllFloatsSketch only
>>> - no
>>> >>>> generic implementation and no Direct memory support. This
>>> contiguous memory
>>> >>>> block mode is so limiting that we did not think that KLL algorithm
>>> can be
>>> >>>> implemented efficiently in that mode. We might need to reconsider
>>> this.
>>> >>>> Development is C++ happened later, so we did not implement the older
>>> >>>> algorithm. We may want to do it if a strong use case arises to
>>> support
>>> >>>> reading data prepared using Java library. On the other hand, in C++
>>> >>>> kll_sketch is a template, and therefore it can be a container of
>>> any user
>>> >>>> type. In Java we felt the need to have both: a generic
>>> implementation and a
>>> >>>> specialized implementation for some numeric type to avoid object
>>> overhead.
>>> >>>>
>>> >>>> To summarize the situation with quantiles sketches:
>>> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
>>> DoublesSketch
>>> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>>> >>>> older quantiles sketch.
>>> >>>> Druid: quantiles DoublesSketch
>>> >>>> PostgreSQL: kll_sketch<float>
>>> >>>>
>>> >>>> Regarding distinct counting sketches:
>>> >>>> Druid: Theta, HLL
>>> >>>> PostgreSQL: Theta, HLL, CPC
>>> >>>>
>>> >>>> Regarding frequent items:
>>> >>>> Druid: none
>>> >>>> PostgreSQL: frequent_items_sketch<std::string>
>>> >>>>
>>> >>>> It seems to me that we need to do something to improve the
>>> situation.
>>> >>>> Possible tasks:
>>> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>>> >>>> developers (again). They were not quite happy with the current
>>> approach
>>> >>>> either. It leads to gross overallocation of BufferAggredator
>>> memory. There
>>> >>>> was an open issue in Druid repo about this I believe.
>>> >>>> - Find some way to have KLL sketch in Druid
>>> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>>> >>>> Druid
>>> >>>> - Consider KllItemsSketch<T> in Java
>>> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in
>>> PostgreSQL -
>>> >>>> with double values, compatible with Druid)
>>> >>>>
>>> >>>> I would appreciate your thoughts about this.
>>> >>>> Thank you.
>>> >>>> Alexander Saydakov
>>> >>>>
>>> >>>
>>>
>>

Re: Cross-platform discrepancies

Posted by Gian Merlino <gi...@apache.org>.

The way postgres does it sounds pretty reasonable.

I'd like to be able to do the reallocations out of the pre-allocated
processing buffer, so that way freeing them all at the end of the query is,
well, free. (We can just reuse the buffer for the next query without
actually doing anything.) Maybe we carve out some space in the processing
buffer for reallocations. Perhaps each aggregator type can give us a hint
about the expected amount of additional memory needed beyond the starting
size (so, zero for fixed-size aggregators, maybe 10% overhead for ones that
need to grow?) and we can use that to size the carve-out.

If aggregators need more space beyond the carve-out, we could allocate
memory outside of the processing buffer, perhaps on the Java heap or
perhaps by grabbing additional off-heap memory beyond the processing
buffer. (DoublesSketchBuildBufferAggregator already does this sneakily: the
DirectUpdateDoublesSketch it is based on might bail completely out of the
off-heap buffer and allocate new memory on-heap, without Druid knowing
about it.)

I don't think we'd be able to reuse the original allocation. I'm not sure
if it'd be valuable to allow the aggregator to continue to use it: if so we
could provide an array of positions instead of a single position in the
BufferAggregator aggregate method, and an array of array of positions in
the VectorAggregator one. If not then we can just leave it there unused
until the query is over, when it'll get released.

Does this seem like a useful way of doing things?

Himanshu does this line up with what you were thinking about in
https://github.com/apache/druid/issues/8126? What are your thoughts?

Gian

On Thu, Apr 16, 2020 at 1:30 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

>
> PostgreSQL provides its own memory allocation and deallocation functions
> palloc and pfree (replacements for malloc and free). Documentation for
> user-defined functions in C is here:
> https://www.postgresql.org/docs/current/xfunc-c.html
> I am not sure you need such a broad context about writing UDFs in general.
> The main point about aggregating functions is that they consist of two
> parts: a state transition function and a finalizer. State transition
> functions are called to process the next value in an aggregation (such as
> group-by). One of the parameters is a pointer to the current state. It is
> null on the first call, and the function allocates an arbitrary data
> structure using palloc, updates it with the given value and returns the
> pointer. It can choose to reallocate and return a different pointer or
> return the same pointer. A finalizer is called at the end to convert this
> arbitrary state object into whatever return type is specified for this
> aggregation. When the query (or transaction) is finished, all allocations
> in this context are freed.
>
>
> On Wed, Apr 15, 2020 at 11:03 PM Himanshu <g....@gmail.com> wrote:
>
>> Alex, Thanks for posting it here.
>>
>> We have definitely thought about ability to start with a small off-heap
>> buffer and grow to bigger off-heap buffers if needed and  that is what
>> https://github.com/apache/druid/issues/8126 is trying to address.
>> However, we have never considered one sketch being spread across several
>> different contiguous memory blocks. As you pointed out, That would likely
>> require BufferAggregator interface having access to Druid supplied
>> MemoryAllocator.
>>
>> > ... but we would need a  "maloc / free"-like API similar to how
>> PostgreSQL has implemented their aggregations API. ...
>> Can you point to some documentation/code describing what kind of API does
>> Postgres provide for implementing custom aggregators.
>>
>> -- Himanshu
>>
>> On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
>> <sa...@verizonmedia.com.invalid> wrote:
>>
>>> Let's involve Druid developers as Gian suggested.
>>>
>>>
>>> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>>>
>>> > Hi,
>>> > It is pretty easy to graphically view the various cross-platform
>>> > discrepancies that Alex is talking about in the Sketch Features Matrix
>>> > <
>>> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
>>> >.
>>> > We still need to add the PostgreSQL System Integration column.
>>> > Nevertheless, it is easy to see that where we have implemented "Off
>>> Java
>>> > Heap" capability is directly correlated to our Druid integrations.
>>> >
>>> > To underscore Alex's point, many of the other sketches in this matrix
>>> that
>>> > are not currently integrated with Druid have very dynamic memory
>>> > requirements and may be impractical to implement in a contiguous-memory
>>> > model.  They could still be implemented "off-heap", but we would need a
>>> > "maloc / free"-like API similar to how PostgreSQL has implemented their
>>> > aggregations API.
>>> >
>>> > One of the conclusions from this matrix is that Druid would have a much
>>> > richer and more powerful sketching analytics capability to offer its
>>> > customers if it had a more flexible aggregations API.
>>> >
>>> >
>>> >
>>> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
>>> > <sa...@verizonmedia.com.invalid> wrote:
>>> >
>>> >> Gian,
>>> >> Thanks for your reply. I did not mean to suggest moving away from
>>> >> off-heap memory. As I see it, the problem mostly is with preallocating
>>> >> fixed blocks for each sketch of some maximum size, which, on the one
>>> hand,
>>> >> is wasteful because most sketches will be small, and on the other
>>> hand,
>>> >> requires operating in a "contiguous block" mode, which is very
>>> limiting.
>>> >> Perhaps you might consider something like PostgreSQL does: provide an
>>> >> allocator, so that the memory allocation is controlled and can be
>>> done in a
>>> >> context of a query or transaction, but it does not impose a single
>>> >> contiguous block for a data structure. So a custom aggregation
>>> function
>>> >> allocates memory for its own state on the first call, and PostgreSQL
>>> passes
>>> >> this state to the next call as a pointer and so on. At any moment the
>>> >> aggregation can choose to reallocate and return a new pointer. And
>>> that
>>> >> state can be a complex data structure with pointers to other blocks
>>> >> allocated using the same mechanism, so it does not have to be a single
>>> >> contiguous block.
>>> >>
>>> >> This will still require some changes to our library to support memory
>>> >> allocation like this, but it seems to be less challenging then the
>>> current
>>> >> Direct memory mode we have.
>>> >>
>>> >> There are some trade offs here, as usually. For instance, currently
>>> (if
>>> >> implemented) wrapping a chunk of bytes as a fast alternative to
>>> creating an
>>> >> on-heap object during deserialization is the same operation as
>>> interpreting
>>> >> this "contiguous block" aggregation state. But if the aggregation
>>> state is
>>> >> fragmented, it is not going to be equivalent to a serialized blob
>>> anymore.
>>> >> This is how it is in the current C++ implementation - there is no
>>> >> equivalent of wrap() as in Direct mode in Java. This is a potential
>>> >> performance improvement that can be discussed separately. Some
>>> sketches can
>>> >> choose to operate in this contiguous block mode if it does not go
>>> against
>>> >> the algorithm.
>>> >>
>>> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> >>
>>> >>> Hi Alexander,
>>> >>>
>>> >>> It sounds like integrating with Druid is an important concern here.
>>> It
>>> >>> might be nice to cross post the discussion to dev@druid.
>>> >>>
>>> >>> Personally, I don't think it's likely the Druid community will move
>>> away
>>> >>> from requiring that aggregators be able to work with raw memory. The
>>> >>> requirement exists so it can use allocate an arena for aggregation
>>> space,
>>> >>> minimizing GC load. But there have been some proposals floating
>>> around for
>>> >>> adding ability to dynamically allocate new memory (perhaps out of
>>> the same
>>> >>> arena or perhaps some other way — the proposals varied). I think it
>>> would
>>> >>> be useful for Druid devs to understand more deeply what effect the
>>> >>> allocator API has on the DataSketches implementation.
>>> >>>
>>> >>> (By the way, the Druid issue you're referring to might be
>>> >>> https://github.com/apache/druid/issues/8126.)
>>> >>>
>>> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>>> >>> <sa...@verizonmedia.com.invalid> wrote:
>>> >>>
>>> >>>> Hi everyone!
>>> >>>> I would like to discuss some discrepancies we accumulated as
>>> results of
>>> >>>> some decisions around prioritizing our development efforts in Java
>>> and C++,
>>> >>>> and also around integration with data processing systems.
>>> >>>>
>>> >>>> Druid is one of the most important systems in which our library is
>>> >>>> used. It has a strong requirement to have off-heap memory support
>>> and also
>>> >>>> to operate in a single contiguous block. This was the main
>>> motivation
>>> >>>> behind what we call Direct memory support in our Java library.
>>> >>>>
>>> >>>> Currently some sketches do not support Direct memory, in particular:
>>> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
>>> sketch.
>>> >>>> Therefore they are not integrated with Druid yet.
>>> >>>>
>>> >>>> Quantiles sketch:
>>> >>>> In Java we have ItemsSketch<T> for any type and a specialized
>>> numeric
>>> >>>> DoublesSketch with Direct memory support - this is the one
>>> integrated with
>>> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
>>> but for
>>> >>>> historical reasons in Java we have implemented KllFloatsSketch only
>>> - no
>>> >>>> generic implementation and no Direct memory support. This
>>> contiguous memory
>>> >>>> block mode is so limiting that we did not think that KLL algorithm
>>> can be
>>> >>>> implemented efficiently in that mode. We might need to reconsider
>>> this.
>>> >>>> Development is C++ happened later, so we did not implement the older
>>> >>>> algorithm. We may want to do it if a strong use case arises to
>>> support
>>> >>>> reading data prepared using Java library. On the other hand, in C++
>>> >>>> kll_sketch is a template, and therefore it can be a container of
>>> any user
>>> >>>> type. In Java we felt the need to have both: a generic
>>> implementation and a
>>> >>>> specialized implementation for some numeric type to avoid object
>>> overhead.
>>> >>>>
>>> >>>> To summarize the situation with quantiles sketches:
>>> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
>>> DoublesSketch
>>> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>>> >>>> older quantiles sketch.
>>> >>>> Druid: quantiles DoublesSketch
>>> >>>> PostgreSQL: kll_sketch<float>
>>> >>>>
>>> >>>> Regarding distinct counting sketches:
>>> >>>> Druid: Theta, HLL
>>> >>>> PostgreSQL: Theta, HLL, CPC
>>> >>>>
>>> >>>> Regarding frequent items:
>>> >>>> Druid: none
>>> >>>> PostgreSQL: frequent_items_sketch<std::string>
>>> >>>>
>>> >>>> It seems to me that we need to do something to improve the
>>> situation.
>>> >>>> Possible tasks:
>>> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>>> >>>> developers (again). They were not quite happy with the current
>>> approach
>>> >>>> either. It leads to gross overallocation of BufferAggredator
>>> memory. There
>>> >>>> was an open issue in Druid repo about this I believe.
>>> >>>> - Find some way to have KLL sketch in Druid
>>> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>>> >>>> Druid
>>> >>>> - Consider KllItemsSketch<T> in Java
>>> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in
>>> PostgreSQL -
>>> >>>> with double values, compatible with Druid)
>>> >>>>
>>> >>>> I would appreciate your thoughts about this.
>>> >>>> Thank you.
>>> >>>> Alexander Saydakov
>>> >>>>
>>> >>>
>>>
>>

Re: Cross-platform discrepancies

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.

PostgreSQL provides its own memory allocation and deallocation functions
palloc and pfree (replacements for malloc and free). Documentation for
user-defined functions in C is here:
https://www.postgresql.org/docs/current/xfunc-c.html
I am not sure you need such a broad context about writing UDFs in general.
The main point about aggregating functions is that they consist of two
parts: a state transition function and a finalizer. State transition
functions are called to process the next value in an aggregation (such as
group-by). One of the parameters is a pointer to the current state. It is
null on the first call, and the function allocates an arbitrary data
structure using palloc, updates it with the given value and returns the
pointer. It can choose to reallocate and return a different pointer or
return the same pointer. A finalizer is called at the end to convert this
arbitrary state object into whatever return type is specified for this
aggregation. When the query (or transaction) is finished, all allocations
in this context are freed.


On Wed, Apr 15, 2020 at 11:03 PM Himanshu <g....@gmail.com> wrote:

> Alex, Thanks for posting it here.
>
> We have definitely thought about ability to start with a small off-heap
> buffer and grow to bigger off-heap buffers if needed and  that is what
> https://github.com/apache/druid/issues/8126 is trying to address.
> However, we have never considered one sketch being spread across several
> different contiguous memory blocks. As you pointed out, That would likely
> require BufferAggregator interface having access to Druid supplied
> MemoryAllocator.
>
> > ... but we would need a  "maloc / free"-like API similar to how
> PostgreSQL has implemented their aggregations API. ...
> Can you point to some documentation/code describing what kind of API does
> Postgres provide for implementing custom aggregators.
>
> -- Himanshu
>
> On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Let's involve Druid developers as Gian suggested.
>>
>>
>> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>>
>> > Hi,
>> > It is pretty easy to graphically view the various cross-platform
>> > discrepancies that Alex is talking about in the Sketch Features Matrix
>> > <
>> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
>> >.
>> > We still need to add the PostgreSQL System Integration column.
>> > Nevertheless, it is easy to see that where we have implemented "Off Java
>> > Heap" capability is directly correlated to our Druid integrations.
>> >
>> > To underscore Alex's point, many of the other sketches in this matrix
>> that
>> > are not currently integrated with Druid have very dynamic memory
>> > requirements and may be impractical to implement in a contiguous-memory
>> > model.  They could still be implemented "off-heap", but we would need a
>> > "maloc / free"-like API similar to how PostgreSQL has implemented their
>> > aggregations API.
>> >
>> > One of the conclusions from this matrix is that Druid would have a much
>> > richer and more powerful sketching analytics capability to offer its
>> > customers if it had a more flexible aggregations API.
>> >
>> >
>> >
>> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
>> > <sa...@verizonmedia.com.invalid> wrote:
>> >
>> >> Gian,
>> >> Thanks for your reply. I did not mean to suggest moving away from
>> >> off-heap memory. As I see it, the problem mostly is with preallocating
>> >> fixed blocks for each sketch of some maximum size, which, on the one
>> hand,
>> >> is wasteful because most sketches will be small, and on the other hand,
>> >> requires operating in a "contiguous block" mode, which is very
>> limiting.
>> >> Perhaps you might consider something like PostgreSQL does: provide an
>> >> allocator, so that the memory allocation is controlled and can be done
>> in a
>> >> context of a query or transaction, but it does not impose a single
>> >> contiguous block for a data structure. So a custom aggregation function
>> >> allocates memory for its own state on the first call, and PostgreSQL
>> passes
>> >> this state to the next call as a pointer and so on. At any moment the
>> >> aggregation can choose to reallocate and return a new pointer. And that
>> >> state can be a complex data structure with pointers to other blocks
>> >> allocated using the same mechanism, so it does not have to be a single
>> >> contiguous block.
>> >>
>> >> This will still require some changes to our library to support memory
>> >> allocation like this, but it seems to be less challenging then the
>> current
>> >> Direct memory mode we have.
>> >>
>> >> There are some trade offs here, as usually. For instance, currently (if
>> >> implemented) wrapping a chunk of bytes as a fast alternative to
>> creating an
>> >> on-heap object during deserialization is the same operation as
>> interpreting
>> >> this "contiguous block" aggregation state. But if the aggregation
>> state is
>> >> fragmented, it is not going to be equivalent to a serialized blob
>> anymore.
>> >> This is how it is in the current C++ implementation - there is no
>> >> equivalent of wrap() as in Direct mode in Java. This is a potential
>> >> performance improvement that can be discussed separately. Some
>> sketches can
>> >> choose to operate in this contiguous block mode if it does not go
>> against
>> >> the algorithm.
>> >>
>> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
>> >>
>> >>> Hi Alexander,
>> >>>
>> >>> It sounds like integrating with Druid is an important concern here. It
>> >>> might be nice to cross post the discussion to dev@druid.
>> >>>
>> >>> Personally, I don't think it's likely the Druid community will move
>> away
>> >>> from requiring that aggregators be able to work with raw memory. The
>> >>> requirement exists so it can use allocate an arena for aggregation
>> space,
>> >>> minimizing GC load. But there have been some proposals floating
>> around for
>> >>> adding ability to dynamically allocate new memory (perhaps out of the
>> same
>> >>> arena or perhaps some other way — the proposals varied). I think it
>> would
>> >>> be useful for Druid devs to understand more deeply what effect the
>> >>> allocator API has on the DataSketches implementation.
>> >>>
>> >>> (By the way, the Druid issue you're referring to might be
>> >>> https://github.com/apache/druid/issues/8126.)
>> >>>
>> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>> >>> <sa...@verizonmedia.com.invalid> wrote:
>> >>>
>> >>>> Hi everyone!
>> >>>> I would like to discuss some discrepancies we accumulated as results
>> of
>> >>>> some decisions around prioritizing our development efforts in Java
>> and C++,
>> >>>> and also around integration with data processing systems.
>> >>>>
>> >>>> Druid is one of the most important systems in which our library is
>> >>>> used. It has a strong requirement to have off-heap memory support
>> and also
>> >>>> to operate in a single contiguous block. This was the main motivation
>> >>>> behind what we call Direct memory support in our Java library.
>> >>>>
>> >>>> Currently some sketches do not support Direct memory, in particular:
>> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
>> sketch.
>> >>>> Therefore they are not integrated with Druid yet.
>> >>>>
>> >>>> Quantiles sketch:
>> >>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>> >>>> DoublesSketch with Direct memory support - this is the one
>> integrated with
>> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
>> but for
>> >>>> historical reasons in Java we have implemented KllFloatsSketch only
>> - no
>> >>>> generic implementation and no Direct memory support. This contiguous
>> memory
>> >>>> block mode is so limiting that we did not think that KLL algorithm
>> can be
>> >>>> implemented efficiently in that mode. We might need to reconsider
>> this.
>> >>>> Development is C++ happened later, so we did not implement the older
>> >>>> algorithm. We may want to do it if a strong use case arises to
>> support
>> >>>> reading data prepared using Java library. On the other hand, in C++
>> >>>> kll_sketch is a template, and therefore it can be a container of any
>> user
>> >>>> type. In Java we felt the need to have both: a generic
>> implementation and a
>> >>>> specialized implementation for some numeric type to avoid object
>> overhead.
>> >>>>
>> >>>> To summarize the situation with quantiles sketches:
>> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
>> DoublesSketch
>> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>> >>>> older quantiles sketch.
>> >>>> Druid: quantiles DoublesSketch
>> >>>> PostgreSQL: kll_sketch<float>
>> >>>>
>> >>>> Regarding distinct counting sketches:
>> >>>> Druid: Theta, HLL
>> >>>> PostgreSQL: Theta, HLL, CPC
>> >>>>
>> >>>> Regarding frequent items:
>> >>>> Druid: none
>> >>>> PostgreSQL: frequent_items_sketch<std::string>
>> >>>>
>> >>>> It seems to me that we need to do something to improve the situation.
>> >>>> Possible tasks:
>> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>> >>>> developers (again). They were not quite happy with the current
>> approach
>> >>>> either. It leads to gross overallocation of BufferAggredator memory.
>> There
>> >>>> was an open issue in Druid repo about this I believe.
>> >>>> - Find some way to have KLL sketch in Druid
>> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>> >>>> Druid
>> >>>> - Consider KllItemsSketch<T> in Java
>> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL
>> -
>> >>>> with double values, compatible with Druid)
>> >>>>
>> >>>> I would appreciate your thoughts about this.
>> >>>> Thank you.
>> >>>> Alexander Saydakov
>> >>>>
>> >>>
>>
>

Re: Cross-platform discrepancies

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.

PostgreSQL provides its own memory allocation and deallocation functions
palloc and pfree (replacements for malloc and free). Documentation for
user-defined functions in C is here:
https://www.postgresql.org/docs/current/xfunc-c.html
I am not sure you need such a broad context about writing UDFs in general.
The main point about aggregating functions is that they consist of two
parts: a state transition function and a finalizer. State transition
functions are called to process the next value in an aggregation (such as
group-by). One of the parameters is a pointer to the current state. It is
null on the first call, and the function allocates an arbitrary data
structure using palloc, updates it with the given value and returns the
pointer. It can choose to reallocate and return a different pointer or
return the same pointer. A finalizer is called at the end to convert this
arbitrary state object into whatever return type is specified for this
aggregation. When the query (or transaction) is finished, all allocations
in this context are freed.


On Wed, Apr 15, 2020 at 11:03 PM Himanshu <g....@gmail.com> wrote:

> Alex, Thanks for posting it here.
>
> We have definitely thought about ability to start with a small off-heap
> buffer and grow to bigger off-heap buffers if needed and  that is what
> https://github.com/apache/druid/issues/8126 is trying to address.
> However, we have never considered one sketch being spread across several
> different contiguous memory blocks. As you pointed out, That would likely
> require BufferAggregator interface having access to Druid supplied
> MemoryAllocator.
>
> > ... but we would need a  "maloc / free"-like API similar to how
> PostgreSQL has implemented their aggregations API. ...
> Can you point to some documentation/code describing what kind of API does
> Postgres provide for implementing custom aggregators.
>
> -- Himanshu
>
> On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Let's involve Druid developers as Gian suggested.
>>
>>
>> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>>
>> > Hi,
>> > It is pretty easy to graphically view the various cross-platform
>> > discrepancies that Alex is talking about in the Sketch Features Matrix
>> > <
>> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
>> >.
>> > We still need to add the PostgreSQL System Integration column.
>> > Nevertheless, it is easy to see that where we have implemented "Off Java
>> > Heap" capability is directly correlated to our Druid integrations.
>> >
>> > To underscore Alex's point, many of the other sketches in this matrix
>> that
>> > are not currently integrated with Druid have very dynamic memory
>> > requirements and may be impractical to implement in a contiguous-memory
>> > model.  They could still be implemented "off-heap", but we would need a
>> > "maloc / free"-like API similar to how PostgreSQL has implemented their
>> > aggregations API.
>> >
>> > One of the conclusions from this matrix is that Druid would have a much
>> > richer and more powerful sketching analytics capability to offer its
>> > customers if it had a more flexible aggregations API.
>> >
>> >
>> >
>> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
>> > <sa...@verizonmedia.com.invalid> wrote:
>> >
>> >> Gian,
>> >> Thanks for your reply. I did not mean to suggest moving away from
>> >> off-heap memory. As I see it, the problem mostly is with preallocating
>> >> fixed blocks for each sketch of some maximum size, which, on the one
>> hand,
>> >> is wasteful because most sketches will be small, and on the other hand,
>> >> requires operating in a "contiguous block" mode, which is very
>> limiting.
>> >> Perhaps you might consider something like PostgreSQL does: provide an
>> >> allocator, so that the memory allocation is controlled and can be done
>> in a
>> >> context of a query or transaction, but it does not impose a single
>> >> contiguous block for a data structure. So a custom aggregation function
>> >> allocates memory for its own state on the first call, and PostgreSQL
>> passes
>> >> this state to the next call as a pointer and so on. At any moment the
>> >> aggregation can choose to reallocate and return a new pointer. And that
>> >> state can be a complex data structure with pointers to other blocks
>> >> allocated using the same mechanism, so it does not have to be a single
>> >> contiguous block.
>> >>
>> >> This will still require some changes to our library to support memory
>> >> allocation like this, but it seems to be less challenging then the
>> current
>> >> Direct memory mode we have.
>> >>
>> >> There are some trade offs here, as usually. For instance, currently (if
>> >> implemented) wrapping a chunk of bytes as a fast alternative to
>> creating an
>> >> on-heap object during deserialization is the same operation as
>> interpreting
>> >> this "contiguous block" aggregation state. But if the aggregation
>> state is
>> >> fragmented, it is not going to be equivalent to a serialized blob
>> anymore.
>> >> This is how it is in the current C++ implementation - there is no
>> >> equivalent of wrap() as in Direct mode in Java. This is a potential
>> >> performance improvement that can be discussed separately. Some
>> sketches can
>> >> choose to operate in this contiguous block mode if it does not go
>> against
>> >> the algorithm.
>> >>
>> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
>> >>
>> >>> Hi Alexander,
>> >>>
>> >>> It sounds like integrating with Druid is an important concern here. It
>> >>> might be nice to cross post the discussion to dev@druid.
>> >>>
>> >>> Personally, I don't think it's likely the Druid community will move
>> away
>> >>> from requiring that aggregators be able to work with raw memory. The
>> >>> requirement exists so it can use allocate an arena for aggregation
>> space,
>> >>> minimizing GC load. But there have been some proposals floating
>> around for
>> >>> adding ability to dynamically allocate new memory (perhaps out of the
>> same
>> >>> arena or perhaps some other way — the proposals varied). I think it
>> would
>> >>> be useful for Druid devs to understand more deeply what effect the
>> >>> allocator API has on the DataSketches implementation.
>> >>>
>> >>> (By the way, the Druid issue you're referring to might be
>> >>> https://github.com/apache/druid/issues/8126.)
>> >>>
>> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>> >>> <sa...@verizonmedia.com.invalid> wrote:
>> >>>
>> >>>> Hi everyone!
>> >>>> I would like to discuss some discrepancies we accumulated as results
>> of
>> >>>> some decisions around prioritizing our development efforts in Java
>> and C++,
>> >>>> and also around integration with data processing systems.
>> >>>>
>> >>>> Druid is one of the most important systems in which our library is
>> >>>> used. It has a strong requirement to have off-heap memory support
>> and also
>> >>>> to operate in a single contiguous block. This was the main motivation
>> >>>> behind what we call Direct memory support in our Java library.
>> >>>>
>> >>>> Currently some sketches do not support Direct memory, in particular:
>> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
>> sketch.
>> >>>> Therefore they are not integrated with Druid yet.
>> >>>>
>> >>>> Quantiles sketch:
>> >>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>> >>>> DoublesSketch with Direct memory support - this is the one
>> integrated with
>> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
>> but for
>> >>>> historical reasons in Java we have implemented KllFloatsSketch only
>> - no
>> >>>> generic implementation and no Direct memory support. This contiguous
>> memory
>> >>>> block mode is so limiting that we did not think that KLL algorithm
>> can be
>> >>>> implemented efficiently in that mode. We might need to reconsider
>> this.
>> >>>> Development is C++ happened later, so we did not implement the older
>> >>>> algorithm. We may want to do it if a strong use case arises to
>> support
>> >>>> reading data prepared using Java library. On the other hand, in C++
>> >>>> kll_sketch is a template, and therefore it can be a container of any
>> user
>> >>>> type. In Java we felt the need to have both: a generic
>> implementation and a
>> >>>> specialized implementation for some numeric type to avoid object
>> overhead.
>> >>>>
>> >>>> To summarize the situation with quantiles sketches:
>> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
>> DoublesSketch
>> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>> >>>> older quantiles sketch.
>> >>>> Druid: quantiles DoublesSketch
>> >>>> PostgreSQL: kll_sketch<float>
>> >>>>
>> >>>> Regarding distinct counting sketches:
>> >>>> Druid: Theta, HLL
>> >>>> PostgreSQL: Theta, HLL, CPC
>> >>>>
>> >>>> Regarding frequent items:
>> >>>> Druid: none
>> >>>> PostgreSQL: frequent_items_sketch<std::string>
>> >>>>
>> >>>> It seems to me that we need to do something to improve the situation.
>> >>>> Possible tasks:
>> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>> >>>> developers (again). They were not quite happy with the current
>> approach
>> >>>> either. It leads to gross overallocation of BufferAggredator memory.
>> There
>> >>>> was an open issue in Druid repo about this I believe.
>> >>>> - Find some way to have KLL sketch in Druid
>> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>> >>>> Druid
>> >>>> - Consider KllItemsSketch<T> in Java
>> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL
>> -
>> >>>> with double values, compatible with Druid)
>> >>>>
>> >>>> I would appreciate your thoughts about this.
>> >>>> Thank you.
>> >>>> Alexander Saydakov
>> >>>>
>> >>>
>>
>

Re: Cross-platform discrepancies

Posted by Himanshu <g....@gmail.com>.

Alex, Thanks for posting it here.

We have definitely thought about ability to start with a small off-heap
buffer and grow to bigger off-heap buffers if needed and  that is what
https://github.com/apache/druid/issues/8126 is trying to address.
However, we have never considered one sketch being spread across several
different contiguous memory blocks. As you pointed out, That would likely
require BufferAggregator interface having access to Druid supplied
MemoryAllocator.

> ... but we would need a  "maloc / free"-like API similar to how
PostgreSQL has implemented their aggregations API. ...
Can you point to some documentation/code describing what kind of API does
Postgres provide for implementing custom aggregators.

-- Himanshu

On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Let's involve Druid developers as Gian suggested.
>
>
> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>
> > Hi,
> > It is pretty easy to graphically view the various cross-platform
> > discrepancies that Alex is talking about in the Sketch Features Matrix
> > <
> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
> >.
> > We still need to add the PostgreSQL System Integration column.
> > Nevertheless, it is easy to see that where we have implemented "Off Java
> > Heap" capability is directly correlated to our Druid integrations.
> >
> > To underscore Alex's point, many of the other sketches in this matrix
> that
> > are not currently integrated with Druid have very dynamic memory
> > requirements and may be impractical to implement in a contiguous-memory
> > model.  They could still be implemented "off-heap", but we would need a
> > "maloc / free"-like API similar to how PostgreSQL has implemented their
> > aggregations API.
> >
> > One of the conclusions from this matrix is that Druid would have a much
> > richer and more powerful sketching analytics capability to offer its
> > customers if it had a more flexible aggregations API.
> >
> >
> >
> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
> > <sa...@verizonmedia.com.invalid> wrote:
> >
> >> Gian,
> >> Thanks for your reply. I did not mean to suggest moving away from
> >> off-heap memory. As I see it, the problem mostly is with preallocating
> >> fixed blocks for each sketch of some maximum size, which, on the one
> hand,
> >> is wasteful because most sketches will be small, and on the other hand,
> >> requires operating in a "contiguous block" mode, which is very limiting.
> >> Perhaps you might consider something like PostgreSQL does: provide an
> >> allocator, so that the memory allocation is controlled and can be done
> in a
> >> context of a query or transaction, but it does not impose a single
> >> contiguous block for a data structure. So a custom aggregation function
> >> allocates memory for its own state on the first call, and PostgreSQL
> passes
> >> this state to the next call as a pointer and so on. At any moment the
> >> aggregation can choose to reallocate and return a new pointer. And that
> >> state can be a complex data structure with pointers to other blocks
> >> allocated using the same mechanism, so it does not have to be a single
> >> contiguous block.
> >>
> >> This will still require some changes to our library to support memory
> >> allocation like this, but it seems to be less challenging then the
> current
> >> Direct memory mode we have.
> >>
> >> There are some trade offs here, as usually. For instance, currently (if
> >> implemented) wrapping a chunk of bytes as a fast alternative to
> creating an
> >> on-heap object during deserialization is the same operation as
> interpreting
> >> this "contiguous block" aggregation state. But if the aggregation state
> is
> >> fragmented, it is not going to be equivalent to a serialized blob
> anymore.
> >> This is how it is in the current C++ implementation - there is no
> >> equivalent of wrap() as in Direct mode in Java. This is a potential
> >> performance improvement that can be discussed separately. Some sketches
> can
> >> choose to operate in this contiguous block mode if it does not go
> against
> >> the algorithm.
> >>
> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
> >>
> >>> Hi Alexander,
> >>>
> >>> It sounds like integrating with Druid is an important concern here. It
> >>> might be nice to cross post the discussion to dev@druid.
> >>>
> >>> Personally, I don't think it's likely the Druid community will move
> away
> >>> from requiring that aggregators be able to work with raw memory. The
> >>> requirement exists so it can use allocate an arena for aggregation
> space,
> >>> minimizing GC load. But there have been some proposals floating around
> for
> >>> adding ability to dynamically allocate new memory (perhaps out of the
> same
> >>> arena or perhaps some other way — the proposals varied). I think it
> would
> >>> be useful for Druid devs to understand more deeply what effect the
> >>> allocator API has on the DataSketches implementation.
> >>>
> >>> (By the way, the Druid issue you're referring to might be
> >>> https://github.com/apache/druid/issues/8126.)
> >>>
> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
> >>> <sa...@verizonmedia.com.invalid> wrote:
> >>>
> >>>> Hi everyone!
> >>>> I would like to discuss some discrepancies we accumulated as results
> of
> >>>> some decisions around prioritizing our development efforts in Java
> and C++,
> >>>> and also around integration with data processing systems.
> >>>>
> >>>> Druid is one of the most important systems in which our library is
> >>>> used. It has a strong requirement to have off-heap memory support and
> also
> >>>> to operate in a single contiguous block. This was the main motivation
> >>>> behind what we call Direct memory support in our Java library.
> >>>>
> >>>> Currently some sketches do not support Direct memory, in particular:
> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
> sketch.
> >>>> Therefore they are not integrated with Druid yet.
> >>>>
> >>>> Quantiles sketch:
> >>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
> >>>> DoublesSketch with Direct memory support - this is the one integrated
> with
> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
> but for
> >>>> historical reasons in Java we have implemented KllFloatsSketch only -
> no
> >>>> generic implementation and no Direct memory support. This contiguous
> memory
> >>>> block mode is so limiting that we did not think that KLL algorithm
> can be
> >>>> implemented efficiently in that mode. We might need to reconsider
> this.
> >>>> Development is C++ happened later, so we did not implement the older
> >>>> algorithm. We may want to do it if a strong use case arises to support
> >>>> reading data prepared using Java library. On the other hand, in C++
> >>>> kll_sketch is a template, and therefore it can be a container of any
> user
> >>>> type. In Java we felt the need to have both: a generic implementation
> and a
> >>>> specialized implementation for some numeric type to avoid object
> overhead.
> >>>>
> >>>> To summarize the situation with quantiles sketches:
> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
> DoublesSketch
> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
> >>>> older quantiles sketch.
> >>>> Druid: quantiles DoublesSketch
> >>>> PostgreSQL: kll_sketch<float>
> >>>>
> >>>> Regarding distinct counting sketches:
> >>>> Druid: Theta, HLL
> >>>> PostgreSQL: Theta, HLL, CPC
> >>>>
> >>>> Regarding frequent items:
> >>>> Druid: none
> >>>> PostgreSQL: frequent_items_sketch<std::string>
> >>>>
> >>>> It seems to me that we need to do something to improve the situation.
> >>>> Possible tasks:
> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
> >>>> developers (again). They were not quite happy with the current
> approach
> >>>> either. It leads to gross overallocation of BufferAggredator memory.
> There
> >>>> was an open issue in Druid repo about this I believe.
> >>>> - Find some way to have KLL sketch in Druid
> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
> >>>> Druid
> >>>> - Consider KllItemsSketch<T> in Java
> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
> >>>> with double values, compatible with Druid)
> >>>>
> >>>> I would appreciate your thoughts about this.
> >>>> Thank you.
> >>>> Alexander Saydakov
> >>>>
> >>>
>

Re: Cross-platform discrepancies

Posted by Himanshu <g....@gmail.com>.

Alex, Thanks for posting it here.

We have definitely thought about ability to start with a small off-heap
buffer and grow to bigger off-heap buffers if needed and  that is what
https://github.com/apache/druid/issues/8126 is trying to address.
However, we have never considered one sketch being spread across several
different contiguous memory blocks. As you pointed out, That would likely
require BufferAggregator interface having access to Druid supplied
MemoryAllocator.

> ... but we would need a  "maloc / free"-like API similar to how
PostgreSQL has implemented their aggregations API. ...
Can you point to some documentation/code describing what kind of API does
Postgres provide for implementing custom aggregators.

-- Himanshu

On Wed, Apr 15, 2020 at 10:11 AM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Let's involve Druid developers as Gian suggested.
>
>
> On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:
>
> > Hi,
> > It is pretty easy to graphically view the various cross-platform
> > discrepancies that Alex is talking about in the Sketch Features Matrix
> > <
> https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html
> >.
> > We still need to add the PostgreSQL System Integration column.
> > Nevertheless, it is easy to see that where we have implemented "Off Java
> > Heap" capability is directly correlated to our Druid integrations.
> >
> > To underscore Alex's point, many of the other sketches in this matrix
> that
> > are not currently integrated with Druid have very dynamic memory
> > requirements and may be impractical to implement in a contiguous-memory
> > model.  They could still be implemented "off-heap", but we would need a
> > "maloc / free"-like API similar to how PostgreSQL has implemented their
> > aggregations API.
> >
> > One of the conclusions from this matrix is that Druid would have a much
> > richer and more powerful sketching analytics capability to offer its
> > customers if it had a more flexible aggregations API.
> >
> >
> >
> > On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
> > <sa...@verizonmedia.com.invalid> wrote:
> >
> >> Gian,
> >> Thanks for your reply. I did not mean to suggest moving away from
> >> off-heap memory. As I see it, the problem mostly is with preallocating
> >> fixed blocks for each sketch of some maximum size, which, on the one
> hand,
> >> is wasteful because most sketches will be small, and on the other hand,
> >> requires operating in a "contiguous block" mode, which is very limiting.
> >> Perhaps you might consider something like PostgreSQL does: provide an
> >> allocator, so that the memory allocation is controlled and can be done
> in a
> >> context of a query or transaction, but it does not impose a single
> >> contiguous block for a data structure. So a custom aggregation function
> >> allocates memory for its own state on the first call, and PostgreSQL
> passes
> >> this state to the next call as a pointer and so on. At any moment the
> >> aggregation can choose to reallocate and return a new pointer. And that
> >> state can be a complex data structure with pointers to other blocks
> >> allocated using the same mechanism, so it does not have to be a single
> >> contiguous block.
> >>
> >> This will still require some changes to our library to support memory
> >> allocation like this, but it seems to be less challenging then the
> current
> >> Direct memory mode we have.
> >>
> >> There are some trade offs here, as usually. For instance, currently (if
> >> implemented) wrapping a chunk of bytes as a fast alternative to
> creating an
> >> on-heap object during deserialization is the same operation as
> interpreting
> >> this "contiguous block" aggregation state. But if the aggregation state
> is
> >> fragmented, it is not going to be equivalent to a serialized blob
> anymore.
> >> This is how it is in the current C++ implementation - there is no
> >> equivalent of wrap() as in Direct mode in Java. This is a potential
> >> performance improvement that can be discussed separately. Some sketches
> can
> >> choose to operate in this contiguous block mode if it does not go
> against
> >> the algorithm.
> >>
> >> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
> >>
> >>> Hi Alexander,
> >>>
> >>> It sounds like integrating with Druid is an important concern here. It
> >>> might be nice to cross post the discussion to dev@druid.
> >>>
> >>> Personally, I don't think it's likely the Druid community will move
> away
> >>> from requiring that aggregators be able to work with raw memory. The
> >>> requirement exists so it can use allocate an arena for aggregation
> space,
> >>> minimizing GC load. But there have been some proposals floating around
> for
> >>> adding ability to dynamically allocate new memory (perhaps out of the
> same
> >>> arena or perhaps some other way — the proposals varied). I think it
> would
> >>> be useful for Druid devs to understand more deeply what effect the
> >>> allocator API has on the DataSketches implementation.
> >>>
> >>> (By the way, the Druid issue you're referring to might be
> >>> https://github.com/apache/druid/issues/8126.)
> >>>
> >>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
> >>> <sa...@verizonmedia.com.invalid> wrote:
> >>>
> >>>> Hi everyone!
> >>>> I would like to discuss some discrepancies we accumulated as results
> of
> >>>> some decisions around prioritizing our development efforts in Java
> and C++,
> >>>> and also around integration with data processing systems.
> >>>>
> >>>> Druid is one of the most important systems in which our library is
> >>>> used. It has a strong requirement to have off-heap memory support and
> also
> >>>> to operate in a single contiguous block. This was the main motivation
> >>>> behind what we call Direct memory support in our Java library.
> >>>>
> >>>> Currently some sketches do not support Direct memory, in particular:
> >>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting
> sketch.
> >>>> Therefore they are not integrated with Druid yet.
> >>>>
> >>>> Quantiles sketch:
> >>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
> >>>> DoublesSketch with Direct memory support - this is the one integrated
> with
> >>>> Druid. We consider this algorithm replaced by KLL quantiles sketch,
> but for
> >>>> historical reasons in Java we have implemented KllFloatsSketch only -
> no
> >>>> generic implementation and no Direct memory support. This contiguous
> memory
> >>>> block mode is so limiting that we did not think that KLL algorithm
> can be
> >>>> implemented efficiently in that mode. We might need to reconsider
> this.
> >>>> Development is C++ happened later, so we did not implement the older
> >>>> algorithm. We may want to do it if a strong use case arises to support
> >>>> reading data prepared using Java library. On the other hand, in C++
> >>>> kll_sketch is a template, and therefore it can be a container of any
> user
> >>>> type. In Java we felt the need to have both: a generic implementation
> and a
> >>>> specialized implementation for some numeric type to avoid object
> overhead.
> >>>>
> >>>> To summarize the situation with quantiles sketches:
> >>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and
> DoublesSketch
> >>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
> >>>> older quantiles sketch.
> >>>> Druid: quantiles DoublesSketch
> >>>> PostgreSQL: kll_sketch<float>
> >>>>
> >>>> Regarding distinct counting sketches:
> >>>> Druid: Theta, HLL
> >>>> PostgreSQL: Theta, HLL, CPC
> >>>>
> >>>> Regarding frequent items:
> >>>> Druid: none
> >>>> PostgreSQL: frequent_items_sketch<std::string>
> >>>>
> >>>> It seems to me that we need to do something to improve the situation.
> >>>> Possible tasks:
> >>>> - Discuss alternatives to Direct memory mode in Druid with Druid
> >>>> developers (again). They were not quite happy with the current
> approach
> >>>> either. It leads to gross overallocation of BufferAggredator memory.
> There
> >>>> was an open issue in Druid repo about this I believe.
> >>>> - Find some way to have KLL sketch in Druid
> >>>> - Find some way to have Frequent Items (frequent strings?) sketch in
> >>>> Druid
> >>>> - Consider KllItemsSketch<T> in Java
> >>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
> >>>> with double values, compatible with Druid)
> >>>>
> >>>> I would appreciate your thoughts about this.
> >>>> Thank you.
> >>>> Alexander Saydakov
> >>>>
> >>>
>

Re: Cross-platform discrepancies

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.

Let's involve Druid developers as Gian suggested.


On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:

> Hi,
> It is pretty easy to graphically view the various cross-platform
> discrepancies that Alex is talking about in the Sketch Features Matrix
> <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>.
> We still need to add the PostgreSQL System Integration column.
> Nevertheless, it is easy to see that where we have implemented "Off Java
> Heap" capability is directly correlated to our Druid integrations.
>
> To underscore Alex's point, many of the other sketches in this matrix that
> are not currently integrated with Druid have very dynamic memory
> requirements and may be impractical to implement in a contiguous-memory
> model.  They could still be implemented "off-heap", but we would need a
> "maloc / free"-like API similar to how PostgreSQL has implemented their
> aggregations API.
>
> One of the conclusions from this matrix is that Druid would have a much
> richer and more powerful sketching analytics capability to offer its
> customers if it had a more flexible aggregations API.
>
>
>
> On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Gian,
>> Thanks for your reply. I did not mean to suggest moving away from
>> off-heap memory. As I see it, the problem mostly is with preallocating
>> fixed blocks for each sketch of some maximum size, which, on the one hand,
>> is wasteful because most sketches will be small, and on the other hand,
>> requires operating in a "contiguous block" mode, which is very limiting.
>> Perhaps you might consider something like PostgreSQL does: provide an
>> allocator, so that the memory allocation is controlled and can be done in a
>> context of a query or transaction, but it does not impose a single
>> contiguous block for a data structure. So a custom aggregation function
>> allocates memory for its own state on the first call, and PostgreSQL passes
>> this state to the next call as a pointer and so on. At any moment the
>> aggregation can choose to reallocate and return a new pointer. And that
>> state can be a complex data structure with pointers to other blocks
>> allocated using the same mechanism, so it does not have to be a single
>> contiguous block.
>>
>> This will still require some changes to our library to support memory
>> allocation like this, but it seems to be less challenging then the current
>> Direct memory mode we have.
>>
>> There are some trade offs here, as usually. For instance, currently (if
>> implemented) wrapping a chunk of bytes as a fast alternative to creating an
>> on-heap object during deserialization is the same operation as interpreting
>> this "contiguous block" aggregation state. But if the aggregation state is
>> fragmented, it is not going to be equivalent to a serialized blob anymore.
>> This is how it is in the current C++ implementation - there is no
>> equivalent of wrap() as in Direct mode in Java. This is a potential
>> performance improvement that can be discussed separately. Some sketches can
>> choose to operate in this contiguous block mode if it does not go against
>> the algorithm.
>>
>> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
>>
>>> Hi Alexander,
>>>
>>> It sounds like integrating with Druid is an important concern here. It
>>> might be nice to cross post the discussion to dev@druid.
>>>
>>> Personally, I don't think it's likely the Druid community will move away
>>> from requiring that aggregators be able to work with raw memory. The
>>> requirement exists so it can use allocate an arena for aggregation space,
>>> minimizing GC load. But there have been some proposals floating around for
>>> adding ability to dynamically allocate new memory (perhaps out of the same
>>> arena or perhaps some other way — the proposals varied). I think it would
>>> be useful for Druid devs to understand more deeply what effect the
>>> allocator API has on the DataSketches implementation.
>>>
>>> (By the way, the Druid issue you're referring to might be
>>> https://github.com/apache/druid/issues/8126.)
>>>
>>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>>> <sa...@verizonmedia.com.invalid> wrote:
>>>
>>>> Hi everyone!
>>>> I would like to discuss some discrepancies we accumulated as results of
>>>> some decisions around prioritizing our development efforts in Java and C++,
>>>> and also around integration with data processing systems.
>>>>
>>>> Druid is one of the most important systems in which our library is
>>>> used. It has a strong requirement to have off-heap memory support and also
>>>> to operate in a single contiguous block. This was the main motivation
>>>> behind what we call Direct memory support in our Java library.
>>>>
>>>> Currently some sketches do not support Direct memory, in particular:
>>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
>>>> Therefore they are not integrated with Druid yet.
>>>>
>>>> Quantiles sketch:
>>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>>>> DoublesSketch with Direct memory support - this is the one integrated with
>>>> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
>>>> historical reasons in Java we have implemented KllFloatsSketch only - no
>>>> generic implementation and no Direct memory support. This contiguous memory
>>>> block mode is so limiting that we did not think that KLL algorithm can be
>>>> implemented efficiently in that mode. We might need to reconsider this.
>>>> Development is C++ happened later, so we did not implement the older
>>>> algorithm. We may want to do it if a strong use case arises to support
>>>> reading data prepared using Java library. On the other hand, in C++
>>>> kll_sketch is a template, and therefore it can be a container of any user
>>>> type. In Java we felt the need to have both: a generic implementation and a
>>>> specialized implementation for some numeric type to avoid object overhead.
>>>>
>>>> To summarize the situation with quantiles sketches:
>>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
>>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>>>> older quantiles sketch.
>>>> Druid: quantiles DoublesSketch
>>>> PostgreSQL: kll_sketch<float>
>>>>
>>>> Regarding distinct counting sketches:
>>>> Druid: Theta, HLL
>>>> PostgreSQL: Theta, HLL, CPC
>>>>
>>>> Regarding frequent items:
>>>> Druid: none
>>>> PostgreSQL: frequent_items_sketch<std::string>
>>>>
>>>> It seems to me that we need to do something to improve the situation.
>>>> Possible tasks:
>>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>>>> developers (again). They were not quite happy with the current approach
>>>> either. It leads to gross overallocation of BufferAggredator memory. There
>>>> was an open issue in Druid repo about this I believe.
>>>> - Find some way to have KLL sketch in Druid
>>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>>>> Druid
>>>> - Consider KllItemsSketch<T> in Java
>>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
>>>> with double values, compatible with Druid)
>>>>
>>>> I would appreciate your thoughts about this.
>>>> Thank you.
>>>> Alexander Saydakov
>>>>
>>>

Re: Cross-platform discrepancies

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.

Let's involve Druid developers as Gian suggested.


On Tue, Apr 14, 2020 at 4:44 PM leerho <le...@gmail.com> wrote:

> Hi,
> It is pretty easy to graphically view the various cross-platform
> discrepancies that Alex is talking about in the Sketch Features Matrix
> <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>.
> We still need to add the PostgreSQL System Integration column.
> Nevertheless, it is easy to see that where we have implemented "Off Java
> Heap" capability is directly correlated to our Druid integrations.
>
> To underscore Alex's point, many of the other sketches in this matrix that
> are not currently integrated with Druid have very dynamic memory
> requirements and may be impractical to implement in a contiguous-memory
> model.  They could still be implemented "off-heap", but we would need a
> "maloc / free"-like API similar to how PostgreSQL has implemented their
> aggregations API.
>
> One of the conclusions from this matrix is that Druid would have a much
> richer and more powerful sketching analytics capability to offer its
> customers if it had a more flexible aggregations API.
>
>
>
> On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Gian,
>> Thanks for your reply. I did not mean to suggest moving away from
>> off-heap memory. As I see it, the problem mostly is with preallocating
>> fixed blocks for each sketch of some maximum size, which, on the one hand,
>> is wasteful because most sketches will be small, and on the other hand,
>> requires operating in a "contiguous block" mode, which is very limiting.
>> Perhaps you might consider something like PostgreSQL does: provide an
>> allocator, so that the memory allocation is controlled and can be done in a
>> context of a query or transaction, but it does not impose a single
>> contiguous block for a data structure. So a custom aggregation function
>> allocates memory for its own state on the first call, and PostgreSQL passes
>> this state to the next call as a pointer and so on. At any moment the
>> aggregation can choose to reallocate and return a new pointer. And that
>> state can be a complex data structure with pointers to other blocks
>> allocated using the same mechanism, so it does not have to be a single
>> contiguous block.
>>
>> This will still require some changes to our library to support memory
>> allocation like this, but it seems to be less challenging then the current
>> Direct memory mode we have.
>>
>> There are some trade offs here, as usually. For instance, currently (if
>> implemented) wrapping a chunk of bytes as a fast alternative to creating an
>> on-heap object during deserialization is the same operation as interpreting
>> this "contiguous block" aggregation state. But if the aggregation state is
>> fragmented, it is not going to be equivalent to a serialized blob anymore.
>> This is how it is in the current C++ implementation - there is no
>> equivalent of wrap() as in Direct mode in Java. This is a potential
>> performance improvement that can be discussed separately. Some sketches can
>> choose to operate in this contiguous block mode if it does not go against
>> the algorithm.
>>
>> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
>>
>>> Hi Alexander,
>>>
>>> It sounds like integrating with Druid is an important concern here. It
>>> might be nice to cross post the discussion to dev@druid.
>>>
>>> Personally, I don't think it's likely the Druid community will move away
>>> from requiring that aggregators be able to work with raw memory. The
>>> requirement exists so it can use allocate an arena for aggregation space,
>>> minimizing GC load. But there have been some proposals floating around for
>>> adding ability to dynamically allocate new memory (perhaps out of the same
>>> arena or perhaps some other way — the proposals varied). I think it would
>>> be useful for Druid devs to understand more deeply what effect the
>>> allocator API has on the DataSketches implementation.
>>>
>>> (By the way, the Druid issue you're referring to might be
>>> https://github.com/apache/druid/issues/8126.)
>>>
>>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>>> <sa...@verizonmedia.com.invalid> wrote:
>>>
>>>> Hi everyone!
>>>> I would like to discuss some discrepancies we accumulated as results of
>>>> some decisions around prioritizing our development efforts in Java and C++,
>>>> and also around integration with data processing systems.
>>>>
>>>> Druid is one of the most important systems in which our library is
>>>> used. It has a strong requirement to have off-heap memory support and also
>>>> to operate in a single contiguous block. This was the main motivation
>>>> behind what we call Direct memory support in our Java library.
>>>>
>>>> Currently some sketches do not support Direct memory, in particular:
>>>> KLL quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
>>>> Therefore they are not integrated with Druid yet.
>>>>
>>>> Quantiles sketch:
>>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>>>> DoublesSketch with Direct memory support - this is the one integrated with
>>>> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
>>>> historical reasons in Java we have implemented KllFloatsSketch only - no
>>>> generic implementation and no Direct memory support. This contiguous memory
>>>> block mode is so limiting that we did not think that KLL algorithm can be
>>>> implemented efficiently in that mode. We might need to reconsider this.
>>>> Development is C++ happened later, so we did not implement the older
>>>> algorithm. We may want to do it if a strong use case arises to support
>>>> reading data prepared using Java library. On the other hand, in C++
>>>> kll_sketch is a template, and therefore it can be a container of any user
>>>> type. In Java we felt the need to have both: a generic implementation and a
>>>> specialized implementation for some numeric type to avoid object overhead.
>>>>
>>>> To summarize the situation with quantiles sketches:
>>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
>>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no
>>>> older quantiles sketch.
>>>> Druid: quantiles DoublesSketch
>>>> PostgreSQL: kll_sketch<float>
>>>>
>>>> Regarding distinct counting sketches:
>>>> Druid: Theta, HLL
>>>> PostgreSQL: Theta, HLL, CPC
>>>>
>>>> Regarding frequent items:
>>>> Druid: none
>>>> PostgreSQL: frequent_items_sketch<std::string>
>>>>
>>>> It seems to me that we need to do something to improve the situation.
>>>> Possible tasks:
>>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>>>> developers (again). They were not quite happy with the current approach
>>>> either. It leads to gross overallocation of BufferAggredator memory. There
>>>> was an open issue in Druid repo about this I believe.
>>>> - Find some way to have KLL sketch in Druid
>>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>>>> Druid
>>>> - Consider KllItemsSketch<T> in Java
>>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
>>>> with double values, compatible with Druid)
>>>>
>>>> I would appreciate your thoughts about this.
>>>> Thank you.
>>>> Alexander Saydakov
>>>>
>>>

Re: Cross-platform discrepancies

Posted by leerho <le...@gmail.com>.

Hi,
It is pretty easy to graphically view the various cross-platform
discrepancies that Alex is talking about in the Sketch Features Matrix
<https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>.
We still need to add the PostgreSQL System Integration column.
Nevertheless, it is easy to see that where we have implemented "Off Java
Heap" capability is directly correlated to our Druid integrations.

To underscore Alex's point, many of the other sketches in this matrix that
are not currently integrated with Druid have very dynamic memory
requirements and may be impractical to implement in a contiguous-memory
model.  They could still be implemented "off-heap", but we would need a
"maloc / free"-like API similar to how PostgreSQL has implemented their
aggregations API.

One of the conclusions from this matrix is that Druid would have a much
richer and more powerful sketching analytics capability to offer its
customers if it had a more flexible aggregations API.



On Tue, Apr 14, 2020 at 1:47 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Gian,
> Thanks for your reply. I did not mean to suggest moving away from off-heap
> memory. As I see it, the problem mostly is with preallocating fixed blocks
> for each sketch of some maximum size, which, on the one hand, is wasteful
> because most sketches will be small, and on the other hand, requires
> operating in a "contiguous block" mode, which is very limiting.
> Perhaps you might consider something like PostgreSQL does: provide an
> allocator, so that the memory allocation is controlled and can be done in a
> context of a query or transaction, but it does not impose a single
> contiguous block for a data structure. So a custom aggregation function
> allocates memory for its own state on the first call, and PostgreSQL passes
> this state to the next call as a pointer and so on. At any moment the
> aggregation can choose to reallocate and return a new pointer. And that
> state can be a complex data structure with pointers to other blocks
> allocated using the same mechanism, so it does not have to be a single
> contiguous block.
>
> This will still require some changes to our library to support memory
> allocation like this, but it seems to be less challenging then the current
> Direct memory mode we have.
>
> There are some trade offs here, as usually. For instance, currently (if
> implemented) wrapping a chunk of bytes as a fast alternative to creating an
> on-heap object during deserialization is the same operation as interpreting
> this "contiguous block" aggregation state. But if the aggregation state is
> fragmented, it is not going to be equivalent to a serialized blob anymore.
> This is how it is in the current C++ implementation - there is no
> equivalent of wrap() as in Direct mode in Java. This is a potential
> performance improvement that can be discussed separately. Some sketches can
> choose to operate in this contiguous block mode if it does not go against
> the algorithm.
>
> On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:
>
>> Hi Alexander,
>>
>> It sounds like integrating with Druid is an important concern here. It
>> might be nice to cross post the discussion to dev@druid.
>>
>> Personally, I don't think it's likely the Druid community will move away
>> from requiring that aggregators be able to work with raw memory. The
>> requirement exists so it can use allocate an arena for aggregation space,
>> minimizing GC load. But there have been some proposals floating around for
>> adding ability to dynamically allocate new memory (perhaps out of the same
>> arena or perhaps some other way — the proposals varied). I think it would
>> be useful for Druid devs to understand more deeply what effect the
>> allocator API has on the DataSketches implementation.
>>
>> (By the way, the Druid issue you're referring to might be
>> https://github.com/apache/druid/issues/8126.)
>>
>> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
>> <sa...@verizonmedia.com.invalid> wrote:
>>
>>> Hi everyone!
>>> I would like to discuss some discrepancies we accumulated as results of
>>> some decisions around prioritizing our development efforts in Java and C++,
>>> and also around integration with data processing systems.
>>>
>>> Druid is one of the most important systems in which our library is used.
>>> It has a strong requirement to have off-heap memory support and also to
>>> operate in a single contiguous block. This was the main motivation behind
>>> what we call Direct memory support in our Java library.
>>>
>>> Currently some sketches do not support Direct memory, in particular: KLL
>>> quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
>>> Therefore they are not integrated with Druid yet.
>>>
>>> Quantiles sketch:
>>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>>> DoublesSketch with Direct memory support - this is the one integrated with
>>> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
>>> historical reasons in Java we have implemented KllFloatsSketch only - no
>>> generic implementation and no Direct memory support. This contiguous memory
>>> block mode is so limiting that we did not think that KLL algorithm can be
>>> implemented efficiently in that mode. We might need to reconsider this.
>>> Development is C++ happened later, so we did not implement the older
>>> algorithm. We may want to do it if a strong use case arises to support
>>> reading data prepared using Java library. On the other hand, in C++
>>> kll_sketch is a template, and therefore it can be a container of any user
>>> type. In Java we felt the need to have both: a generic implementation and a
>>> specialized implementation for some numeric type to avoid object overhead.
>>>
>>> To summarize the situation with quantiles sketches:
>>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
>>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no older
>>> quantiles sketch.
>>> Druid: quantiles DoublesSketch
>>> PostgreSQL: kll_sketch<float>
>>>
>>> Regarding distinct counting sketches:
>>> Druid: Theta, HLL
>>> PostgreSQL: Theta, HLL, CPC
>>>
>>> Regarding frequent items:
>>> Druid: none
>>> PostgreSQL: frequent_items_sketch<std::string>
>>>
>>> It seems to me that we need to do something to improve the situation.
>>> Possible tasks:
>>> - Discuss alternatives to Direct memory mode in Druid with Druid
>>> developers (again). They were not quite happy with the current approach
>>> either. It leads to gross overallocation of BufferAggredator memory. There
>>> was an open issue in Druid repo about this I believe.
>>> - Find some way to have KLL sketch in Druid
>>> - Find some way to have Frequent Items (frequent strings?) sketch in
>>> Druid
>>> - Consider KllItemsSketch<T> in Java
>>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
>>> with double values, compatible with Druid)
>>>
>>> I would appreciate your thoughts about this.
>>> Thank you.
>>> Alexander Saydakov
>>>
>>

Re: Cross-platform discrepancies

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.

Gian,
Thanks for your reply. I did not mean to suggest moving away from off-heap
memory. As I see it, the problem mostly is with preallocating fixed blocks
for each sketch of some maximum size, which, on the one hand, is wasteful
because most sketches will be small, and on the other hand, requires
operating in a "contiguous block" mode, which is very limiting.
Perhaps you might consider something like PostgreSQL does: provide an
allocator, so that the memory allocation is controlled and can be done in a
context of a query or transaction, but it does not impose a single
contiguous block for a data structure. So a custom aggregation function
allocates memory for its own state on the first call, and PostgreSQL passes
this state to the next call as a pointer and so on. At any moment the
aggregation can choose to reallocate and return a new pointer. And that
state can be a complex data structure with pointers to other blocks
allocated using the same mechanism, so it does not have to be a single
contiguous block.

This will still require some changes to our library to support memory
allocation like this, but it seems to be less challenging then the current
Direct memory mode we have.

There are some trade offs here, as usually. For instance, currently (if
implemented) wrapping a chunk of bytes as a fast alternative to creating an
on-heap object during deserialization is the same operation as interpreting
this "contiguous block" aggregation state. But if the aggregation state is
fragmented, it is not going to be equivalent to a serialized blob anymore.
This is how it is in the current C++ implementation - there is no
equivalent of wrap() as in Direct mode in Java. This is a potential
performance improvement that can be discussed separately. Some sketches can
choose to operate in this contiguous block mode if it does not go against
the algorithm.

On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <gi...@apache.org> wrote:

> Hi Alexander,
>
> It sounds like integrating with Druid is an important concern here. It
> might be nice to cross post the discussion to dev@druid.
>
> Personally, I don't think it's likely the Druid community will move away
> from requiring that aggregators be able to work with raw memory. The
> requirement exists so it can use allocate an arena for aggregation space,
> minimizing GC load. But there have been some proposals floating around for
> adding ability to dynamically allocate new memory (perhaps out of the same
> arena or perhaps some other way — the proposals varied). I think it would
> be useful for Druid devs to understand more deeply what effect the
> allocator API has on the DataSketches implementation.
>
> (By the way, the Druid issue you're referring to might be
> https://github.com/apache/druid/issues/8126.)
>
> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Hi everyone!
>> I would like to discuss some discrepancies we accumulated as results of
>> some decisions around prioritizing our development efforts in Java and C++,
>> and also around integration with data processing systems.
>>
>> Druid is one of the most important systems in which our library is used.
>> It has a strong requirement to have off-heap memory support and also to
>> operate in a single contiguous block. This was the main motivation behind
>> what we call Direct memory support in our Java library.
>>
>> Currently some sketches do not support Direct memory, in particular: KLL
>> quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
>> Therefore they are not integrated with Druid yet.
>>
>> Quantiles sketch:
>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>> DoublesSketch with Direct memory support - this is the one integrated with
>> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
>> historical reasons in Java we have implemented KllFloatsSketch only - no
>> generic implementation and no Direct memory support. This contiguous memory
>> block mode is so limiting that we did not think that KLL algorithm can be
>> implemented efficiently in that mode. We might need to reconsider this.
>> Development is C++ happened later, so we did not implement the older
>> algorithm. We may want to do it if a strong use case arises to support
>> reading data prepared using Java library. On the other hand, in C++
>> kll_sketch is a template, and therefore it can be a container of any user
>> type. In Java we felt the need to have both: a generic implementation and a
>> specialized implementation for some numeric type to avoid object overhead.
>>
>> To summarize the situation with quantiles sketches:
>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no older
>> quantiles sketch.
>> Druid: quantiles DoublesSketch
>> PostgreSQL: kll_sketch<float>
>>
>> Regarding distinct counting sketches:
>> Druid: Theta, HLL
>> PostgreSQL: Theta, HLL, CPC
>>
>> Regarding frequent items:
>> Druid: none
>> PostgreSQL: frequent_items_sketch<std::string>
>>
>> It seems to me that we need to do something to improve the situation.
>> Possible tasks:
>> - Discuss alternatives to Direct memory mode in Druid with Druid
>> developers (again). They were not quite happy with the current approach
>> either. It leads to gross overallocation of BufferAggredator memory. There
>> was an open issue in Druid repo about this I believe.
>> - Find some way to have KLL sketch in Druid
>> - Find some way to have Frequent Items (frequent strings?) sketch in Druid
>> - Consider KllItemsSketch<T> in Java
>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
>> with double values, compatible with Druid)
>>
>> I would appreciate your thoughts about this.
>> Thank you.
>> Alexander Saydakov
>>
>

Re: Cross-platform discrepancies

Posted by Gian Merlino <gi...@apache.org>.

Hi Alexander,

It sounds like integrating with Druid is an important concern here. It
might be nice to cross post the discussion to dev@druid.

Personally, I don't think it's likely the Druid community will move away
from requiring that aggregators be able to work with raw memory. The
requirement exists so it can use allocate an arena for aggregation space,
minimizing GC load. But there have been some proposals floating around for
adding ability to dynamically allocate new memory (perhaps out of the same
arena or perhaps some other way — the proposals varied). I think it would
be useful for Druid devs to understand more deeply what effect the
allocator API has on the DataSketches implementation.

(By the way, the Druid issue you're referring to might be
https://github.com/apache/druid/issues/8126.)

On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Hi everyone!
> I would like to discuss some discrepancies we accumulated as results of
> some decisions around prioritizing our development efforts in Java and C++,
> and also around integration with data processing systems.
>
> Druid is one of the most important systems in which our library is used.
> It has a strong requirement to have off-heap memory support and also to
> operate in a single contiguous block. This was the main motivation behind
> what we call Direct memory support in our Java library.
>
> Currently some sketches do not support Direct memory, in particular: KLL
> quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
> Therefore they are not integrated with Druid yet.
>
> Quantiles sketch:
> In Java we have ItemsSketch<T> for any type and a specialized numeric
> DoublesSketch with Direct memory support - this is the one integrated with
> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
> historical reasons in Java we have implemented KllFloatsSketch only - no
> generic implementation and no Direct memory support. This contiguous memory
> block mode is so limiting that we did not think that KLL algorithm can be
> implemented efficiently in that mode. We might need to reconsider this.
> Development is C++ happened later, so we did not implement the older
> algorithm. We may want to do it if a strong use case arises to support
> reading data prepared using Java library. On the other hand, in C++
> kll_sketch is a template, and therefore it can be a container of any user
> type. In Java we felt the need to have both: a generic implementation and a
> specialized implementation for some numeric type to avoid object overhead.
>
> To summarize the situation with quantiles sketches:
> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no older
> quantiles sketch.
> Druid: quantiles DoublesSketch
> PostgreSQL: kll_sketch<float>
>
> Regarding distinct counting sketches:
> Druid: Theta, HLL
> PostgreSQL: Theta, HLL, CPC
>
> Regarding frequent items:
> Druid: none
> PostgreSQL: frequent_items_sketch<std::string>
>
> It seems to me that we need to do something to improve the situation.
> Possible tasks:
> - Discuss alternatives to Direct memory mode in Druid with Druid
> developers (again). They were not quite happy with the current approach
> either. It leads to gross overallocation of BufferAggredator memory. There
> was an open issue in Druid repo about this I believe.
> - Find some way to have KLL sketch in Druid
> - Find some way to have Frequent Items (frequent strings?) sketch in Druid
> - Consider KllItemsSketch<T> in Java
> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
> with double values, compatible with Druid)
>
> I would appreciate your thoughts about this.
> Thank you.
> Alexander Saydakov
>