You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/07/10 02:16:45 UTC
Review Request 12377: Provide an option to do request compression
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/
-----------------------------------------------------------
Review request for giraph.
Bugs: GIRAPH-713
https://issues.apache.org/jira/browse/GIRAPH-713
Repository: giraph-git
Description
-------
In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
https://gist.github.com/majakabiljo/5962269
Diffs
-----
giraph-core/pom.xml cab0157
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e
giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f
giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192
giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea
giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac
giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCompressorTest.java PRE-CREATION
pom.xml 72fe6f2
Diff: https://reviews.apache.org/r/12377/diff/
Testing
-------
mvn clean verify
PageRank on cluster
Added test for compressor
Verified that NoOpCompressor has the same performance as before
Thanks,
Maja Kabiljo
Re: Review Request 12377: Provide an option to do request compression
Posted by Maja Kabiljo <ma...@fb.com>.
> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> >
Thanks for the review!
> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java, line 122
> > <https://reviews.apache.org/r/12377/diff/1/?file=319438#file319438line122>
> >
> > ?
The list of observers was getting corrupted, I was getting NPE while iterating through it. Probably these get registered from different threads. 'synchronized' fixed it.
> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java, line 95
> > <https://reviews.apache.org/r/12377/diff/1/?file=319431#file319431line95>
> >
> > Just curious, is there some on-the-fly decompressor, or do we have to do it all at once?
Java ones have it but they were much slower. This particular library doesn't, maybe there is some other which does. If someone has a suggestion I can look into it, or we can do that later if we come across one.
- Maja
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/#review22952
-----------------------------------------------------------
On July 10, 2013, 6:01 p.m., Maja Kabiljo wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12377/
> -----------------------------------------------------------
>
> (Updated July 10, 2013, 6:01 p.m.)
>
>
> Review request for giraph.
>
>
> Bugs: GIRAPH-713
> https://issues.apache.org/jira/browse/GIRAPH-713
>
>
> Repository: giraph-git
>
>
> Description
> -------
>
> In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
>
> I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
> https://gist.github.com/majakabiljo/5962269
>
>
> Diffs
> -----
>
> giraph-core/pom.xml cab0157
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCodec.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/NoOpRequestCodec.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCodec.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f
> giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192
> giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea
> giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac
> giraph-core/src/main/java/org/apache/giraph/utils/Sizes.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayInputStream.java 20ed92b
> giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java 9ff1242
> giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCodecTest.java PRE-CREATION
> pom.xml 72fe6f2
>
> Diff: https://reviews.apache.org/r/12377/diff/
>
>
> Testing
> -------
>
> mvn clean verify
> PageRank on cluster
> Added test for compressor
> Verified that NoOpCompressor has the same performance as before
>
>
> Thanks,
>
> Maja Kabiljo
>
>
Re: Review Request 12377: Provide an option to do request compression
Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/#review22952
-----------------------------------------------------------
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46736>
Let's reuse UnsafeByteArrayOutputStream:SIZE_OF_INT / UnsafeByteArrayOutputStream:SIZE_OF_LONG. You probably want to move those constants to another place like Sizes or something.
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46738>
Just curious, is there some on-the-fly decompressor, or do we have to do it all at once?
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46737>
Make this an interface.
Also rename to RequestCodec (since it's also a decompressor)?
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java
<https://reviews.apache.org/r/12377/#comment46739>
SIZE_OF_INT
giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
<https://reviews.apache.org/r/12377/#comment46740>
pass in this as well to allow configurable?
giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java
<https://reviews.apache.org/r/12377/#comment46741>
?
giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java
<https://reviews.apache.org/r/12377/#comment46742>
cool, nice to have histograms tracking it :)
pom.xml
<https://reviews.apache.org/r/12377/#comment46735>
nit: put a constant above in <properties>, we should move all the versions up there.
- Nitay Joffe
On July 10, 2013, 12:16 a.m., Maja Kabiljo wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12377/
> -----------------------------------------------------------
>
> (Updated July 10, 2013, 12:16 a.m.)
>
>
> Review request for giraph.
>
>
> Bugs: GIRAPH-713
> https://issues.apache.org/jira/browse/GIRAPH-713
>
>
> Repository: giraph-git
>
>
> Description
> -------
>
> In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
>
> I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
> https://gist.github.com/majakabiljo/5962269
>
>
> Diffs
> -----
>
> giraph-core/pom.xml cab0157
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f
> giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192
> giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea
> giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac
> giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCompressorTest.java PRE-CREATION
> pom.xml 72fe6f2
>
> Diff: https://reviews.apache.org/r/12377/diff/
>
>
> Testing
> -------
>
> mvn clean verify
> PageRank on cluster
> Added test for compressor
> Verified that NoOpCompressor has the same performance as before
>
>
> Thanks,
>
> Maja Kabiljo
>
>
Re: Review Request 12377: Provide an option to do request compression
Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/
-----------------------------------------------------------
(Updated July 10, 2013, 6:01 p.m.)
Review request for giraph.
Changes
-------
Nitay's comments
Bugs: GIRAPH-713
https://issues.apache.org/jira/browse/GIRAPH-713
Repository: giraph-git
Description
-------
In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
https://gist.github.com/majakabiljo/5962269
Diffs (updated)
-----
giraph-core/pom.xml cab0157
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCodec.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/NoOpRequestCodec.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCodec.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e
giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f
giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192
giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea
giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac
giraph-core/src/main/java/org/apache/giraph/utils/Sizes.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayInputStream.java 20ed92b
giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java 9ff1242
giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCodecTest.java PRE-CREATION
pom.xml 72fe6f2
Diff: https://reviews.apache.org/r/12377/diff/
Testing
-------
mvn clean verify
PageRank on cluster
Added test for compressor
Verified that NoOpCompressor has the same performance as before
Thanks,
Maja Kabiljo