You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Claudio Martella (JIRA)" <ji...@apache.org> on 2013/02/01 07:59:12 UTC

[jira] [Commented] (GIRAPH-494) Edge should be an interface

    [ https://issues.apache.org/jira/browse/GIRAPH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568530#comment-13568530 ] 

Claudio Martella commented on GIRAPH-494:
-----------------------------------------

Quite frankly the memory impact of this patch is measurable without benchmarks. It is one reference per edge, there's no magic involved. The comparison between giraph and other systems show that we eat and waste so much memory. I recently ran PageRankBenchmark on 64 workers with 7GB heap each for a 65M vertices graph and 100 edges each, and it went OOM. This is quite incredible. Other systems (Signal/Collect) run PR on less machines/memory within 60 seconds on that graph.

Memory consumption should be at the top of our priority. Plus, I strongly believe that most of the algorithms out there live happily without a value, and we should not penalize them.

I agree with you that the API is not there yet, it is not coherent, and there is no bigger picture. But we are not out there with 0.2 yet, and this is the moment to break the API. This does not mean that we should keep on breaking it regardless, of course.
                
> Edge should be an interface
> ---------------------------
>
>                 Key: GIRAPH-494
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-494
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-494.patch
>
>
> In terms of architecture and for flexibility I think our Edge class should be an interface instead of a real class. In this diff I change it to an interface and add a sub interface called MutableEdge. The existing Edge class is now called DefaultEdge. Note that only one class in our codebase actually needs a MutableEdge - RepresentativeVertex. Everything else works perfectly fine using the immutable Edge interface.
> One nice thing this allowed me to do is to create a EdgeNoValue which we can use for algorithms whose edges have no value at all. Currently the same functionality is achieved by using NullWritable, however using EdgeNoValue means not storing a reference to the single NullWritable instance in every single edge. Working on a job that reads 1B+ edges per worker, a pointer per edge adds up.
> https://reviews.apache.org/r/9172/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira