You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Roger Whitcomb <Ro...@actian.com> on 2017/05/09 23:44:38 UTC

[COMPRESS] Anyone implemented "pigz"?

Someone here was doing benchmarks using "pigz" (see here: http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't find any "reasonable" Java implementations.  Anyone thought about it for Commons Compress?

Thanks,
Roger Whitcomb

RE: [COMPRESS] Anyone implemented "pigz"?

Posted by Roger Whitcomb <Ro...@actian.com>.
Exactly.

-----Original Message-----
From: Gary Gregory [mailto:garydgregory@gmail.com] 
Sent: Tuesday, May 09, 2017 5:29 PM
To: Commons Developers List <de...@commons.apache.org>
Subject: Re: [COMPRESS] Anyone implemented "pigz"?

I think the question is can/should [Compress] use any of the stock code in java.util.zip in a multi-threaded fashion for performance gains.

Gary

On Tue, May 9, 2017 at 5:22 PM, sebb <se...@gmail.com> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <ga...@gmail.com> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> Roger.Whitcomb@actian.com>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I 
> >> couldn't find any "reasonable" Java implementations.  Anyone 
> >> thought about it for Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


--
E-Mail: garydgregory@gmail.com | ggregory@apache.org Java Persistence with Hibernate, Second Edition <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Stefan Bodewig <bo...@apache.org>.
On 2017-05-10, Matt Sicker wrote:

> Would the scattering and gathering byte channel APIs in java.nio be helpful
> in splitting up a stream into chunks for parallel processing?

Possibly. pigz breaks up the stream into chunks of 128k and using the
scattering part we should be able to do the same. I'm not so sure about
gathering as we'd need to massage the individual outputs and create new
overall headers and trailers.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Matt Sicker <bo...@gmail.com>.
Would the scattering and gathering byte channel APIs in java.nio be helpful
in splitting up a stream into chunks for parallel processing?

On 10 May 2017 at 02:57, Stefan Bodewig <bo...@apache.org> wrote:

> On 2017-05-10, Gary Gregory wrote:
>
> > I think the question is can/should [Compress] use any of the stock code
> > in java.util.zip in a multi-threaded fashion for performance gains.
>
> We rely on java.util.zip.Deflater for DEFLATE which isn't thread safe by
> itself.
>
> But we could implement the same strategy pigz uses, which is to break up
> the stream into chunks and work on the chunks in parallel. Combining the
> output of several streams may become tricky using the Java API.
>
> If my first read of the comments in
> https://github.com/madler/pigz/blob/master/pigz.c is correct then we'd
> need to manipulate the output of Deflater in order to strip headers and
> trailers and insert empty stored blocks as well as create headers and
> trailers of our own for the combined output.
>
> In theory we could implement something like pigz on top of the LZ77
> support I've added for Snappy and LZ4 (and some additional Hufmann code
> yet to be written) but it would be slower than zlib - probably a lot -
> and likely eat up the speed gain provided by parallel processing.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
Matt Sicker <bo...@gmail.com>

Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Stefan Bodewig <bo...@apache.org>.
On 2017-05-10, Gary Gregory wrote:

> I think the question is can/should [Compress] use any of the stock code
> in java.util.zip in a multi-threaded fashion for performance gains.

We rely on java.util.zip.Deflater for DEFLATE which isn't thread safe by
itself.

But we could implement the same strategy pigz uses, which is to break up
the stream into chunks and work on the chunks in parallel. Combining the
output of several streams may become tricky using the Java API.

If my first read of the comments in
https://github.com/madler/pigz/blob/master/pigz.c is correct then we'd
need to manipulate the output of Deflater in order to strip headers and
trailers and insert empty stored blocks as well as create headers and
trailers of our own for the combined output.

In theory we could implement something like pigz on top of the LZ77
support I've added for Snappy and LZ4 (and some additional Hufmann code
yet to be written) but it would be slower than zlib - probably a lot -
and likely eat up the speed gain provided by parallel processing.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Gary Gregory <ga...@gmail.com>.
I think the question is can/should [Compress] use any of the stock code
in java.util.zip in a multi-threaded fashion for performance gains.

Gary

On Tue, May 9, 2017 at 5:22 PM, sebb <se...@gmail.com> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <ga...@gmail.com> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> Roger.Whitcomb@actian.com>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
> >> find any "reasonable" Java implementations.  Anyone thought about it for
> >> Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Matt Sicker <bo...@gmail.com>.
Those C libraries are pthread (don't need that in Java as it has its own
thread API) and zlib (pretty standard gz library). With that in mind, this
may be a useful reference: http://www.jcraft.com/jzlib/

On 9 May 2017 at 19:22, sebb <se...@gmail.com> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <ga...@gmail.com> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> Roger.Whitcomb@actian.com>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
> >> find any "reasonable" Java implementations.  Anyone thought about it for
> >> Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
Matt Sicker <bo...@gmail.com>

Re: [COMPRESS] Anyone implemented "pigz"?

Posted by sebb <se...@gmail.com>.
AFAICT the implementation is written in C and uses some C libraries.

It would have to be completely rewritten for Java.
Not a trivial job, though it may be possible to use the algorithm.

On 10 May 2017 at 01:03, Gary Gregory <ga...@gmail.com> wrote:
> I've not heard of it on the ML yet. Go for it! ;-)
>
> Gary
>
> On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <Ro...@actian.com>
> wrote:
>
>> Someone here was doing benchmarks using "pigz" (see here:
>> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
>> find any "reasonable" Java implementations.  Anyone thought about it for
>> Commons Compress?
>>
>> Thanks,
>> Roger Whitcomb
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [COMPRESS] Anyone implemented "pigz"?

Posted by Gary Gregory <ga...@gmail.com>.
I've not heard of it on the ML yet. Go for it! ;-)

Gary

On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <Ro...@actian.com>
wrote:

> Someone here was doing benchmarks using "pigz" (see here:
> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
> find any "reasonable" Java implementations.  Anyone thought about it for
> Commons Compress?
>
> Thanks,
> Roger Whitcomb
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory