You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@datasketches.apache.org by Andy Dang <na...@gmail.com> on 2020/09/16 00:59:21 UTC

Memory usage of frequent items datasketches-cpp package

Hi,

I was running some benchmark with the CPP package and I noticed some
strange memory behavior. I noticed that the memory seems to
increase linearly with the item size when using size 32 or 64. The notebook
si
https://suspicious-bassi-380e27.netlify.app/

<https://suspicious-bassi-380e27.netlify.app/>
I <https://suspicious-bassi-380e27.netlify.app/>s it the expected behavior?

- Andy

Re: [E] Re: Memory usage of frequent items datasketches-cpp package

Posted by Alexander Saydakov <sa...@verizonmedia.com>.
You can control the sketch size, which is, in this case of the Frequent
Items sketch, the maximum size of the hash table. It is never exceeded. The
threshold for purging is at 3/4, I believe. Purging discards approximately
half of the entries. So the hash table oscillates between 1/4 and 3/4 of
the given size.
However, there might be some temporary allocations in the process of
purging (I am not sure, need to look at the implementation), and entries
can have variable size depending on the type of items you are using.
How did you pick your sketch size? You should start from the relative
frequency of items you would like to capture. Say, you want to have items
as heavy as 1% of the input. Sketch of size 512 should do the business (lgK
= 9).

On Wed, Sep 16, 2020 at 9:07 AM Andy Dang <na...@gmail.com> wrote:

> We're running  the package in a memory-tight environment and would like to
> minimize the memory overhead with a hard limit for the process, and was
> wondering if it's possible.
>
> - Andy
>
>
> On Wed, Sep 16, 2020 at 9:01 AM Alexander Saydakov <
> saydakov@verizonmedia.com> wrote:
>
>> Why would you trigger a compaction?
>>
>> On Tue, Sep 15, 2020 at 6:45 PM Andy Dang <na...@gmail.com> wrote:
>>
>>> Scrap this. Coming from the JVM library I embarrassingly misunderstood
>>> the size parameter in the Python API (in Java you give the actual size, in
>>> Python you give the log 2 of the size).
>>>
>>> On the other hand, is it possible to trigger a compaction explicitly or
>>> that is not supported?
>>>
>>> - Andy
>>>
>>>
>>> On Tue, Sep 15, 2020 at 5:59 PM Andy Dang <na...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I was running some benchmark with the CPP package and I noticed some
>>>> strange memory behavior. I noticed that the memory seems to
>>>> increase linearly with the item size when using size 32 or 64. The notebook
>>>> si
>>>> https://suspicious-bassi-380e27.netlify.app/
>>>>
>>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>>>>
>>>>
>>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>>>> I
>>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>s
>>>> it the expected behavior?
>>>>
>>>> - Andy
>>>>
>>>

Re: [E] Re: Memory usage of frequent items datasketches-cpp package

Posted by Andy Dang <na...@gmail.com>.
We're running  the package in a memory-tight environment and would like to
minimize the memory overhead with a hard limit for the process, and was
wondering if it's possible.

- Andy


On Wed, Sep 16, 2020 at 9:01 AM Alexander Saydakov <
saydakov@verizonmedia.com> wrote:

> Why would you trigger a compaction?
>
> On Tue, Sep 15, 2020 at 6:45 PM Andy Dang <na...@gmail.com> wrote:
>
>> Scrap this. Coming from the JVM library I embarrassingly misunderstood
>> the size parameter in the Python API (in Java you give the actual size, in
>> Python you give the log 2 of the size).
>>
>> On the other hand, is it possible to trigger a compaction explicitly or
>> that is not supported?
>>
>> - Andy
>>
>>
>> On Tue, Sep 15, 2020 at 5:59 PM Andy Dang <na...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was running some benchmark with the CPP package and I noticed some
>>> strange memory behavior. I noticed that the memory seems to
>>> increase linearly with the item size when using size 32 or 64. The notebook
>>> si
>>> https://suspicious-bassi-380e27.netlify.app/
>>>
>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>>>
>>>
>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>>> I
>>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>s
>>> it the expected behavior?
>>>
>>> - Andy
>>>
>>

Re: [E] Re: Memory usage of frequent items datasketches-cpp package

Posted by Alexander Saydakov <sa...@verizonmedia.com>.
Why would you trigger a compaction?

On Tue, Sep 15, 2020 at 6:45 PM Andy Dang <na...@gmail.com> wrote:

> Scrap this. Coming from the JVM library I embarrassingly misunderstood the
> size parameter in the Python API (in Java you give the actual size, in
> Python you give the log 2 of the size).
>
> On the other hand, is it possible to trigger a compaction explicitly or
> that is not supported?
>
> - Andy
>
>
> On Tue, Sep 15, 2020 at 5:59 PM Andy Dang <na...@gmail.com> wrote:
>
>> Hi,
>>
>> I was running some benchmark with the CPP package and I noticed some
>> strange memory behavior. I noticed that the memory seems to
>> increase linearly with the item size when using size 32 or 64. The notebook
>> si
>> https://suspicious-bassi-380e27.netlify.app/
>>
>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>>
>>
>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>
>> I
>> <https://urldefense.com/v3/__https://suspicious-bassi-380e27.netlify.app/__;!!Op6eflyXZCqGR5I!VDAOuCKId3cgAn4kHEQKwc_j4j91jbi-p5TRCzENtOMhxpOygtZ0xNTRv2xlCFg7Ga6S$>s
>> it the expected behavior?
>>
>> - Andy
>>
>

Re: Memory usage of frequent items datasketches-cpp package

Posted by Andy Dang <na...@gmail.com>.
Scrap this. Coming from the JVM library I embarrassingly misunderstood the
size parameter in the Python API (in Java you give the actual size, in
Python you give the log 2 of the size).

On the other hand, is it possible to trigger a compaction explicitly or
that is not supported?

- Andy


On Tue, Sep 15, 2020 at 5:59 PM Andy Dang <na...@gmail.com> wrote:

> Hi,
>
> I was running some benchmark with the CPP package and I noticed some
> strange memory behavior. I noticed that the memory seems to
> increase linearly with the item size when using size 32 or 64. The notebook
> si
> https://suspicious-bassi-380e27.netlify.app/
>
> <https://suspicious-bassi-380e27.netlify.app/>
> I <https://suspicious-bassi-380e27.netlify.app/>s it the expected
> behavior?
>
> - Andy
>