You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2013/02/22 19:20:12 UTC
[jira] [Updated] (GIRAPH-535) Range-partitioning and edge locality
benchmark
[ https://issues.apache.org/jira/browse/GIRAPH-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Presta updated GIRAPH-535:
-------------------------------------
Attachment: GIRAPH-535.patch
- Adds support for balanced range-partitioning. The user specifies a mapping from his vertex key space to long. Default classes for int and long ids are provided.
- Extends the pseudo-random input formats so that one can specify a minimum ratio of partition-local edges. This works with both hash- and range-partitioning.
- Adds options to change partitioner and specify edge locality in PageRankBenchmark.
Example runs of PageRankBenchmark with 1B vertices, 100B edges, 200 workers, 32 compute threads:
Range-partitioning, random edges (run with "-p 1")
---
Superstep 0 (milliseconds) 66,906
Superstep 1 (milliseconds) 72,256
Superstep 2 (milliseconds) 71,551
Range-partitioning, 75% local edges (run with "-Dpartition.vertexKeySpaceSize=1000000000 -p 1 -l 0.75")
---
Superstep 0 (milliseconds) 27,883
Superstep 1 (milliseconds) 34,693
Superstep 2 (milliseconds) 33,729
> Range-partitioning and edge locality benchmark
> ----------------------------------------------
>
> Key: GIRAPH-535
> URL: https://issues.apache.org/jira/browse/GIRAPH-535
> Project: Giraph
> Issue Type: New Feature
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-535.patch
>
>
> Range-based partitioning can drastically reduce network communication when using a vertex id space where close ids correspond to vertices likely to be connected.
> We currently have an incomplete implementation of range-based partitioning that tries to be very generic (allowing arbitrarily different partition sizes).
> Talking about this with Avery, we thought that for now it's better to add a simpler version (which tries to split as evenly as possible) and leave the current classes there in case someone wants to implement a more complex logic.
> A nice-to-have is also extending the pseudo-random formats to generate a required ratio of partition-local edges, in order to estimate the impact of locality with benchmarks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira