You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/07/10 02:16:45 UTC

Review Request 12377: Provide an option to do request compression

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/
-----------------------------------------------------------

Review request for giraph.


Bugs: GIRAPH-713
    https://issues.apache.org/jira/browse/GIRAPH-713


Repository: giraph-git


Description
-------

In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.

I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
https://gist.github.com/majakabiljo/5962269


Diffs
-----

  giraph-core/pom.xml cab0157 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192 
  giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea 
  giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac 
  giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCompressorTest.java PRE-CREATION 
  pom.xml 72fe6f2 

Diff: https://reviews.apache.org/r/12377/diff/


Testing
-------

mvn clean verify
PageRank on cluster
Added test for compressor
Verified that NoOpCompressor has the same performance as before


Thanks,

Maja Kabiljo


Re: Review Request 12377: Provide an option to do request compression

Posted by Maja Kabiljo <ma...@fb.com>.

> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> >

Thanks for the review!


> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java, line 122
> > <https://reviews.apache.org/r/12377/diff/1/?file=319438#file319438line122>
> >
> >     ?

The list of observers was getting corrupted, I was getting NPE while iterating through it. Probably these get registered from different threads. 'synchronized' fixed it.


> On July 10, 2013, 12:18 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java, line 95
> > <https://reviews.apache.org/r/12377/diff/1/?file=319431#file319431line95>
> >
> >     Just curious, is there some on-the-fly decompressor, or do we have to do it all at once?

Java ones have it but they were much slower. This particular library doesn't, maybe there is some other which does. If someone has a suggestion I can look into it, or we can do that later if we come across one.


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/#review22952
-----------------------------------------------------------


On July 10, 2013, 6:01 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12377/
> -----------------------------------------------------------
> 
> (Updated July 10, 2013, 6:01 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-713
>     https://issues.apache.org/jira/browse/GIRAPH-713
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
> 
> I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
> https://gist.github.com/majakabiljo/5962269
> 
> 
> Diffs
> -----
> 
>   giraph-core/pom.xml cab0157 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCodec.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/NoOpRequestCodec.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCodec.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192 
>   giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea 
>   giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac 
>   giraph-core/src/main/java/org/apache/giraph/utils/Sizes.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayInputStream.java 20ed92b 
>   giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java 9ff1242 
>   giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCodecTest.java PRE-CREATION 
>   pom.xml 72fe6f2 
> 
> Diff: https://reviews.apache.org/r/12377/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> PageRank on cluster
> Added test for compressor
> Verified that NoOpCompressor has the same performance as before
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request 12377: Provide an option to do request compression

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/#review22952
-----------------------------------------------------------



giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46736>

    Let's reuse UnsafeByteArrayOutputStream:SIZE_OF_INT / UnsafeByteArrayOutputStream:SIZE_OF_LONG. You probably want to move those constants to another place like Sizes or something.



giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46738>

    Just curious, is there some on-the-fly decompressor, or do we have to do it all at once?



giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java
<https://reviews.apache.org/r/12377/#comment46737>

    Make this an interface.
    Also rename to RequestCodec (since it's also a decompressor)?



giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java
<https://reviews.apache.org/r/12377/#comment46739>

    SIZE_OF_INT



giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
<https://reviews.apache.org/r/12377/#comment46740>

    pass in this as well to allow configurable?



giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java
<https://reviews.apache.org/r/12377/#comment46741>

    ?



giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java
<https://reviews.apache.org/r/12377/#comment46742>

    cool, nice to have histograms tracking it :)



pom.xml
<https://reviews.apache.org/r/12377/#comment46735>

    nit: put a constant above in <properties>, we should move all the versions up there.


- Nitay Joffe


On July 10, 2013, 12:16 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12377/
> -----------------------------------------------------------
> 
> (Updated July 10, 2013, 12:16 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-713
>     https://issues.apache.org/jira/browse/GIRAPH-713
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.
> 
> I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
> https://gist.github.com/majakabiljo/5962269
> 
> 
> Diffs
> -----
> 
>   giraph-core/pom.xml cab0157 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCompressor.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCompressor.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192 
>   giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea 
>   giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac 
>   giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCompressorTest.java PRE-CREATION 
>   pom.xml 72fe6f2 
> 
> Diff: https://reviews.apache.org/r/12377/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> PageRank on cluster
> Added test for compressor
> Verified that NoOpCompressor has the same performance as before
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request 12377: Provide an option to do request compression

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12377/
-----------------------------------------------------------

(Updated July 10, 2013, 6:01 p.m.)


Review request for giraph.


Changes
-------

Nitay's comments


Bugs: GIRAPH-713
    https://issues.apache.org/jira/browse/GIRAPH-713


Repository: giraph-git


Description
-------

In some cases, network is much slower than all the computation stuff we do, and we could benefit from compressing requests.

I am using LZ4 compression. Tried out two versions of Snappy and some java ones, and LZ4 had the least CPU overhead, and about the same speed as Snappy. It's also easy to plug in your own compressor. In cases when we are not bounded by network, CPU overhead of using compression is about 10% (testing with PageRankBenchmark) and time stays about the same. Depending on how slow your network is and how good your data compresses this can lead to big time savings. Also added compression to metrics, from one of the pagerank iterations:
https://gist.github.com/majakabiljo/5962269


Diffs (updated)
-----

  giraph-core/pom.xml cab0157 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/LZ4RequestCodec.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/NoOpRequestCodec.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestCodec.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java 6eb6549 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java 83b408e 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 6655834 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java c4cc96f 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java ed63192 
  giraph-core/src/main/java/org/apache/giraph/metrics/GiraphMetrics.java 7d980ea 
  giraph-core/src/main/java/org/apache/giraph/metrics/MetricNames.java cc237ac 
  giraph-core/src/main/java/org/apache/giraph/utils/Sizes.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayInputStream.java 20ed92b 
  giraph-core/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java 9ff1242 
  giraph-core/src/test/java/org/apache/giraph/comm/LZ4RequestCodecTest.java PRE-CREATION 
  pom.xml 72fe6f2 

Diff: https://reviews.apache.org/r/12377/diff/


Testing
-------

mvn clean verify
PageRank on cluster
Added test for compressor
Verified that NoOpCompressor has the same performance as before


Thanks,

Maja Kabiljo