You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2013/02/22 19:20:12 UTC

[jira] [Updated] (GIRAPH-535) Range-partitioning and edge locality benchmark

     [ https://issues.apache.org/jira/browse/GIRAPH-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Presta updated GIRAPH-535:
-------------------------------------

    Attachment: GIRAPH-535.patch

- Adds support for balanced range-partitioning. The user specifies a mapping from his vertex key space to long. Default classes for int and long ids are provided.
- Extends the pseudo-random input formats so that one can specify a minimum ratio of partition-local edges. This works with both hash- and range-partitioning.
- Adds options to change partitioner and specify edge locality in PageRankBenchmark.

Example runs of PageRankBenchmark with 1B vertices, 100B edges, 200 workers, 32 compute threads:

Range-partitioning, random edges (run with "-p 1")
---
Superstep 0 (milliseconds) 66,906
Superstep 1 (milliseconds) 72,256
Superstep 2 (milliseconds) 71,551

Range-partitioning, 75% local edges (run with "-Dpartition.vertexKeySpaceSize=1000000000 -p 1 -l 0.75")
---
Superstep 0 (milliseconds) 27,883
Superstep 1 (milliseconds) 34,693
Superstep 2 (milliseconds) 33,729
                
> Range-partitioning and edge locality benchmark
> ----------------------------------------------
>
>                 Key: GIRAPH-535
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-535
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-535.patch
>
>
> Range-based partitioning can drastically reduce network communication when using a vertex id space where close ids correspond to vertices likely to be connected.
> We currently have an incomplete implementation of range-based partitioning that tries to be very generic (allowing arbitrarily different partition sizes).
> Talking about this with Avery, we thought that for now it's better to add a simpler version (which tries to split as evenly as possible) and leave the current classes there in case someone wants to implement a more complex logic.
> A nice-to-have is also extending the pseudo-random formats to generate a required ratio of partition-local edges, in order to estimate the impact of locality with benchmarks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira