You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Andrey Mashenkov <an...@gmail.com> on 2017/01/21 12:26:27 UTC

Optimize integer sets.

Hi Guys

Alexei Scherbakov report a ticket few time ago [1]. The solution look
promissing.

Alexei, you wrote that this can save some memory. More over replacing
linked Set structure to array based bit-set
can give a speed-up due to array based structures are cache friendly.

But one thing is not clear for me how we will handle sparsed bit-sets? For
example, if we have 1024 partiotions (as it is by default)
and have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes
that seem to be more than Set<Integer>.

What do you mean exactly to use bit-set as more compact structure then
Set<Integer> or bit-set with some additional compression?

I would thought, we can use hash-set with open addressing in some cases
like that to get gain of array bases structures over linked structures and
save memory?
For example, we could use such hash-set for small data (64bytes as cache
line size) and use bit-sets for bigger data, if it's possible of course.


Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-4554

答复: Optimize integer sets.

Posted by Shawn Du <sh...@neulion.com.cn>.
Try this https://github.com/RoaringBitmap/RoaringBitmap


-----邮件原件-----
发件人: Sergi Vladykin [mailto:sergi.vladykin@gmail.com] 
发送时间: 2017年1月21日 21:29
收件人: dev@ignite.apache.org
主题: Re: Optimize integer sets.

I'd suggest to have a single abstract class `Partitions` with protected constructor and static factory method. This will allow to add different optimized for any particular case implementations transparently.

Sergi

2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:

> Hi Guys
>
> Alexei Scherbakov report a ticket few time ago [1]. The solution look 
> promissing.
>
> Alexei, you wrote that this can save some memory. More over replacing 
> linked Set structure to array based bit-set can give a speed-up due to 
> array based structures are cache friendly.
>
> But one thing is not clear for me how we will handle sparsed bit-sets? 
> For example, if we have 1024 partiotions (as it is by default) and 
> have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes 
> that seem to be more than Set<Integer>.
>
> What do you mean exactly to use bit-set as more compact structure then 
> Set<Integer> or bit-set with some additional compression?
>
> I would thought, we can use hash-set with open addressing in some 
> cases like that to get gain of array bases structures over linked 
> structures and save memory?
> For example, we could use such hash-set for small data (64bytes as 
> cache line size) and use bit-sets for bigger data, if it's possible of course.
>
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4554
>


Re: Optimize integer sets.

Posted by Alexei Scherbakov <al...@gmail.com>.
Andrey,

IGNITE-4554 is about compressed bit sets, not standart bit sets.

Currently I'm implementing data structure based on [1]

It should efficiently handle sparse bit sets.

[1] http://roaringbitmap.org/

2017-01-21 16:28 GMT+03:00 Sergi Vladykin <se...@gmail.com>:

> I'd suggest to have a single abstract class `Partitions` with protected
> constructor and static factory method. This will allow to add different
> optimized for any particular case implementations transparently.
>
> Sergi
>
> 2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:
>
> > Hi Guys
> >
> > Alexei Scherbakov report a ticket few time ago [1]. The solution look
> > promissing.
> >
> > Alexei, you wrote that this can save some memory. More over replacing
> > linked Set structure to array based bit-set
> > can give a speed-up due to array based structures are cache friendly.
> >
> > But one thing is not clear for me how we will handle sparsed bit-sets?
> For
> > example, if we have 1024 partiotions (as it is by default)
> > and have much nodes, e.g. 512. In this case, bit-set will occupy 256
> bytes
> > that seem to be more than Set<Integer>.
> >
> > What do you mean exactly to use bit-set as more compact structure then
> > Set<Integer> or bit-set with some additional compression?
> >
> > I would thought, we can use hash-set with open addressing in some cases
> > like that to get gain of array bases structures over linked structures
> and
> > save memory?
> > For example, we could use such hash-set for small data (64bytes as cache
> > line size) and use bit-sets for bigger data, if it's possible of course.
> >
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-4554
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Optimize integer sets.

Posted by Sergi Vladykin <se...@gmail.com>.
I'd suggest to have a single abstract class `Partitions` with protected
constructor and static factory method. This will allow to add different
optimized for any particular case implementations transparently.

Sergi

2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:

> Hi Guys
>
> Alexei Scherbakov report a ticket few time ago [1]. The solution look
> promissing.
>
> Alexei, you wrote that this can save some memory. More over replacing
> linked Set structure to array based bit-set
> can give a speed-up due to array based structures are cache friendly.
>
> But one thing is not clear for me how we will handle sparsed bit-sets? For
> example, if we have 1024 partiotions (as it is by default)
> and have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes
> that seem to be more than Set<Integer>.
>
> What do you mean exactly to use bit-set as more compact structure then
> Set<Integer> or bit-set with some additional compression?
>
> I would thought, we can use hash-set with open addressing in some cases
> like that to get gain of array bases structures over linked structures and
> save memory?
> For example, we could use such hash-set for small data (64bytes as cache
> line size) and use bit-sets for bigger data, if it's possible of course.
>
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4554
>