You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directmemory.apache.org by Simone Tripodi <si...@apache.org> on 2012/06/15 08:25:50 UTC

Sharing (maybe) useful experience to benefit DM

Hi all guys,

I was recently involved in the realization of a simple prototypal
server  based on NIO APIs, so had the chance of thinking more on few
doubts I've had in DM, hopefully sharing my experience will benefit DM
:)

SO, this a list of thoughts:

 * a single ByteBuffer _could_ be NOT enough to handle a single
object: despite how good the serializer you've chosen is, there'll be
always an object that, once serialized, has a dimension that is bigger
that the ByteBuffer maximum capacity - ByteBuffer size is an int, it
could be a long (and types cast are bad jokes! :P) - my proposal is
that a single DM entry point _could_ be split in a sequence of
ByteBuffers;

 * Objects can be serialized directly to ByteBuffers: ATM we are
wrapping the produced byte[], which still is an object in the Heap, so
we can optimize that step simply by implementing (Input|Output)Streams
wrapping the target sequence of ByteBuffers - Benoit already did some
work on it, but I don't see it committed, please correct me if I am
wrong!

 * We didn't think to apply a GZip compression - it is true that we
are working off-heap, but hopefully the allocated space can handle
more objects by compressing them

WDYT? does it make sense or there is something we want to speak more carefully?

Thanks and best,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/

Re: Sharing (maybe) useful experience to benefit DM

Posted by Simone Tripodi <si...@apache.org>.

Always great suggestions from you tatu, thanks a lot, much more than
appreciated!

big +1 for adding support of pluggable different compressors.

best,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


On Fri, Jun 15, 2012 at 5:19 PM, Tatu Saloranta <ts...@gmail.com> wrote:
> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
> <si...@apache.org> wrote:
>> Hi all guys,
> ..
>>
>>  * We didn't think to apply a GZip compression - it is true that we
>> are working off-heap, but hopefully the allocated space can handle
>> more objects by compressing them
>
> Gzip is VERY CPU intensive, so maybe just say "support compression".
> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
> they could be better choices here.
> So at least it would make sense to allow pluggable compression codecs.
>
> -+ Tatu +-

Re: Sharing (maybe) useful experience to benefit DM

Posted by Jeff MAURY <je...@gmail.com>.

+1 for having pluggable and configurable codecs

Jeff
Le 15 juin 2012 17:19, "Tatu Saloranta" <ts...@gmail.com> a écrit :

> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
> <si...@apache.org> wrote:
> > Hi all guys,
> ..
> >
> >  * We didn't think to apply a GZip compression - it is true that we
> > are working off-heap, but hopefully the allocated space can handle
> > more objects by compressing them
>
> Gzip is VERY CPU intensive, so maybe just say "support compression".
> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
> they could be better choices here.
> So at least it would make sense to allow pluggable compression codecs.
>
> -+ Tatu +-
>

Re: Sharing (maybe) useful experience to benefit DM

Posted by Daniel Manzke <da...@googlemail.com>.

+1 for making GZIP configurable.

We have integrated it into our FileStorageMemory. It is pretty easy,
because you can just wrap the streams. If you like to you could also add
encryption. ;)


Bye,
Daniel

2012/6/15 Tatu Saloranta <ts...@gmail.com>

> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
> <si...@apache.org> wrote:
> > Hi all guys,
> ..
> >
> >  * We didn't think to apply a GZip compression - it is true that we
> > are working off-heap, but hopefully the allocated space can handle
> > more objects by compressing them
>
> Gzip is VERY CPU intensive, so maybe just say "support compression".
> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
> they could be better choices here.
> So at least it would make sense to allow pluggable compression codecs.
>
> -+ Tatu +-
>



-- 
Viele Grüße/Best Regards

Daniel Manzke

Re: Sharing (maybe) useful experience to benefit DM

Posted by Jeff MAURY <je...@jeffmaury.com>.

Compression will be a gain for objects of large size, and probably
containing many String or byte array attributes

Jeff


On Fri, Jun 15, 2012 at 10:58 PM, Tatu Saloranta <ts...@gmail.com>wrote:

> On Fri, Jun 15, 2012 at 1:21 PM, Olivier Lamy <ol...@apache.org> wrote:
> > 2012/6/15 Tatu Saloranta <ts...@gmail.com>:
> >> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
> >> <si...@apache.org> wrote:
> >>> Hi all guys,
> >> ..
> >>>
> >>>  * We didn't think to apply a GZip compression - it is true that we
> >>> are working off-heap, but hopefully the allocated space can handle
> >>> more objects by compressing them
> >>
> >> Gzip is VERY CPU intensive, so maybe just say "support compression".
> >> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
> >> they could be better choices here.
> >> So at least it would make sense to allow pluggable compression codecs.
> > +1
> > Good idea about having a pluggable mechanism for this feature.
> >
> > I just ask myself if compression of a serialization of a object will
> > be a huge gain ?
> > At least for the server side using the plain/text (i.e. String)
> > transfer mode the factor can be high but for serialization of an
> > Object, I'm septic (but I agree don't have any figures :-) )
>
> Hard to know, depends on what gets compressed (single entry, multiple,
> page/block) and so forth.
> The biggest gain would be if there's actual disk storage to slow disk
> (no SSD), as fast codecs can compress as fast or faster than disks can
> write, and uncompress faster.
> But it can also help by allowing bigger data sets to kept in working set.
>
> Anyway, it all depends, and very hard to say without trying things out. :)
>
> -+ Tatu +-
>



-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: Sharing (maybe) useful experience to benefit DM

Posted by Tatu Saloranta <ts...@gmail.com>.

On Fri, Jun 15, 2012 at 1:21 PM, Olivier Lamy <ol...@apache.org> wrote:
> 2012/6/15 Tatu Saloranta <ts...@gmail.com>:
>> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
>> <si...@apache.org> wrote:
>>> Hi all guys,
>> ..
>>>
>>>  * We didn't think to apply a GZip compression - it is true that we
>>> are working off-heap, but hopefully the allocated space can handle
>>> more objects by compressing them
>>
>> Gzip is VERY CPU intensive, so maybe just say "support compression".
>> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
>> they could be better choices here.
>> So at least it would make sense to allow pluggable compression codecs.
> +1
> Good idea about having a pluggable mechanism for this feature.
>
> I just ask myself if compression of a serialization of a object will
> be a huge gain ?
> At least for the server side using the plain/text (i.e. String)
> transfer mode the factor can be high but for serialization of an
> Object, I'm septic (but I agree don't have any figures :-) )

Hard to know, depends on what gets compressed (single entry, multiple,
page/block) and so forth.
The biggest gain would be if there's actual disk storage to slow disk
(no SSD), as fast codecs can compress as fast or faster than disks can
write, and uncompress faster.
But it can also help by allowing bigger data sets to kept in working set.

Anyway, it all depends, and very hard to say without trying things out. :)

-+ Tatu +-

Re: Sharing (maybe) useful experience to benefit DM

Posted by Olivier Lamy <ol...@apache.org>.

2012/6/15 Tatu Saloranta <ts...@gmail.com>:
> On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
> <si...@apache.org> wrote:
>> Hi all guys,
> ..
>>
>>  * We didn't think to apply a GZip compression - it is true that we
>> are working off-heap, but hopefully the allocated space can handle
>> more objects by compressing them
>
> Gzip is VERY CPU intensive, so maybe just say "support compression".
> LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
> they could be better choices here.
> So at least it would make sense to allow pluggable compression codecs.
+1
Good idea about having a pluggable mechanism for this feature.

I just ask myself if compression of a serialization of a object will
be a huge gain ?
At least for the server side using the plain/text (i.e. String)
transfer mode the factor can be high but for serialization of an
Object, I'm septic (but I agree don't have any figures :-) )

>
> -+ Tatu +-



-- 
Olivier Lamy
Talend: http://coders.talend.com
http://twitter.com/olamy | http://linkedin.com/in/olamy

Re: Sharing (maybe) useful experience to benefit DM

Posted by Tatu Saloranta <ts...@gmail.com>.

On Thu, Jun 14, 2012 at 11:25 PM, Simone Tripodi
<si...@apache.org> wrote:
> Hi all guys,
..
>
>  * We didn't think to apply a GZip compression - it is true that we
> are working off-heap, but hopefully the allocated space can handle
> more objects by compressing them

Gzip is VERY CPU intensive, so maybe just say "support compression".
LZF/Snappy/LZ4 are 4-6x faster to compress, 2-3x to uncompress, so
they could be better choices here.
So at least it would make sense to allow pluggable compression codecs.

-+ Tatu +-

Re: Sharing (maybe) useful experience to benefit DM

Posted by Michael André Pearce <mi...@me.com>.

Whilst I think this is great and def additional features and support for larger files sizes is good that we're looking forward.

I feel some of the more historical issues need to be addressed first to make this a commercial production quality solution, before we expand further. We have a lot of old open jira tickets where time needs to be spent , on really some of the less glamorous tasks.

For me those to biggest hold ups for advocating the full use of the project in the systems i work with are the memory fragmentation over time causing usable memory loss and concurrency issues in a highly multithreaded environment. 

My two cents anyhow.

Michael André Pearce

On 15 Jun 2012, at 07:25, Simone Tripodi <si...@apache.org> wrote:

> Hi all guys,
> 
> I was recently involved in the realization of a simple prototypal
> server  based on NIO APIs, so had the chance of thinking more on few
> doubts I've had in DM, hopefully sharing my experience will benefit DM
> :)
> 
> SO, this a list of thoughts:
> 
> * a single ByteBuffer _could_ be NOT enough to handle a single
> object: despite how good the serializer you've chosen is, there'll be
> always an object that, once serialized, has a dimension that is bigger
> that the ByteBuffer maximum capacity - ByteBuffer size is an int, it
> could be a long (and types cast are bad jokes! :P) - my proposal is
> that a single DM entry point _could_ be split in a sequence of
> ByteBuffers;
> 
> * Objects can be serialized directly to ByteBuffers: ATM we are
> wrapping the produced byte[], which still is an object in the Heap, so
> we can optimize that step simply by implementing (Input|Output)Streams
> wrapping the target sequence of ByteBuffers - Benoit already did some
> work on it, but I don't see it committed, please correct me if I am
> wrong!
> 
> * We didn't think to apply a GZip compression - it is true that we
> are working off-heap, but hopefully the allocated space can handle
> more objects by compressing them
> 
> WDYT? does it make sense or there is something we want to speak more carefully?
> 
> Thanks and best,
> -Simo
> 
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/

Re: Sharing (maybe) useful experience to benefit DM

Posted by Jeff MAURY <je...@jeffmaury.com>.

On Fri, Jun 15, 2012 at 8:25 AM, Simone Tripodi <si...@apache.org>wrote:

> Hi all guys,
>
> I was recently involved in the realization of a simple prototypal
> server  based on NIO APIs, so had the chance of thinking more on few
> doubts I've had in DM, hopefully sharing my experience will benefit DM
> :)
>
> SO, this a list of thoughts:
>
>  * a single ByteBuffer _could_ be NOT enough to handle a single
> object: despite how good the serializer you've chosen is, there'll be
> always an object that, once serialized, has a dimension that is bigger
> that the ByteBuffer maximum capacity - ByteBuffer size is an int, it
> could be a long (and types cast are bad jokes! :P) - my proposal is
> that a single DM entry point _could_ be split in a sequence of
> ByteBuffers;
>
Don't forget that int are 32bits in Java so this give a 2Gb size for a
single object. I see more a cache used for storing large number of objects
but we never know, with Moore's law.


>
>  * Objects can be serialized directly to ByteBuffers: ATM we are
> wrapping the produced byte[], which still is an object in the Heap, so
> we can optimize that step simply by implementing (Input|Output)Streams
> wrapping the target sequence of ByteBuffers - Benoit already did some
> work on it, but I don't see it committed, please correct me if I am
> wrong!
>
+1

>
>  * We didn't think to apply a GZip compression - it is true that we
> are working off-heap, but hopefully the allocated space can handle
> more objects by compressing them
>
+1

>
> WDYT? does it make sense or there is something we want to speak more
> carefully?
>
> Thanks and best,
> -Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>



-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: Sharing (maybe) useful experience to benefit DM

Posted by Simone Tripodi <si...@apache.org>.

Hi Tom!

>>
>>  * We didn't think to apply a GZip compression - it is true that we
>> are working off-heap, but hopefully the allocated space can handle
>> more objects by compressing them
>>
>
> do you mean as an intermediate step before "going off-heap" ? I think
> that'd be good but we should be able to read both GZipped and not GZipped
> data thus we may need some kind of markers/headers for that (or just a try
> / catch block).
>

nope, I meant on the fly, while storing serialized bytes - it is
possible applying the GZip compression on the Stream wrapper on
ByetBuffers, have a look at Zentaur's ResponseSerializer[1] that
already does that! :P

all the best, have a nice day!
-Simo

[1] http://s.apache.org/8wX

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/

Re: Sharing (maybe) useful experience to benefit DM

Posted by Tommaso Teofili <to...@gmail.com>.

Hi Simo,

really good points!

2012/6/15 Simone Tripodi <si...@apache.org>

> Hi all guys,
>
> I was recently involved in the realization of a simple prototypal
> server  based on NIO APIs, so had the chance of thinking more on few
> doubts I've had in DM, hopefully sharing my experience will benefit DM
> :)
>
> SO, this a list of thoughts:
>
>  * a single ByteBuffer _could_ be NOT enough to handle a single
> object: despite how good the serializer you've chosen is, there'll be
> always an object that, once serialized, has a dimension that is bigger
> that the ByteBuffer maximum capacity - ByteBuffer size is an int, it
> could be a long (and types cast are bad jokes! :P) - my proposal is
> that a single DM entry point _could_ be split in a sequence of
> ByteBuffers;
>

+1, we may either use a Composite pattern for that and/or explicitly use
ByteBuffer[] or collections.


>
>  * Objects can be serialized directly to ByteBuffers: ATM we are
> wrapping the produced byte[], which still is an object in the Heap, so
> we can optimize that step simply by implementing (Input|Output)Streams
> wrapping the target sequence of ByteBuffers - Benoit already did some
> work on it, but I don't see it committed, please correct me if I am
> wrong!
>

+1


>
>  * We didn't think to apply a GZip compression - it is true that we
> are working off-heap, but hopefully the allocated space can handle
> more objects by compressing them
>

do you mean as an intermediate step before "going off-heap" ? I think
that'd be good but we should be able to read both GZipped and not GZipped
data thus we may need some kind of markers/headers for that (or just a try
/ catch block).


>
> WDYT? does it make sense or there is something we want to speak more
> carefully?
>

What I think is missing, apart a release, is documentation.
As far as I can remember different users asked for architecture and design
diagrams or just how to do X/Y/Z in the past so we should definitely try to
plan some work on that too.
My 2 cents,
Tommaso


>
> Thanks and best,
> -Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>