You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/01/14 08:00:14 UTC
[50/50] git commit: Merge pull request #367 from ankurdave/graphx
Merge pull request #367 from ankurdave/graphx
GraphX: Unifying Graphs and Tables
GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/.
Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak.
Tasks left:
- [x] Graph-level uncache
- [x] Uncache previous iterations in Pregel
- [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release)
- [x] - Describe GC issue with GraphLab
- [ ] Write `docs/graphx-programming-guide.md`
- [x] - Mention future Bagel support in docs
- [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again.
- [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx
- [x] Make Graph serializable to work around capture in Spark shell
- [x] Rename graph -> graphx in package name and subproject
- [x] Remove standalone PageRank
- [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~
Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/4a805aff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/4a805aff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/4a805aff
Branch: refs/heads/master
Commit: 4a805aff5e381752afb2bfd579af908d623743ed
Parents: 945fe7a 80e73ed
Author: Patrick Wendell <pw...@gmail.com>
Authored: Mon Jan 13 22:58:38 2014 -0800
Committer: Patrick Wendell <pw...@gmail.com>
Committed: Mon Jan 13 22:58:38 2014 -0800
----------------------------------------------------------------------
bin/compute-classpath.sh | 2 +
.../org/apache/spark/rdd/PairRDDFunctions.scala | 2 +-
.../main/scala/org/apache/spark/rdd/RDD.scala | 5 +
.../apache/spark/util/collection/BitSet.scala | 87 +-
.../spark/util/collection/OpenHashSet.scala | 23 +-
docs/_layouts/global.html | 8 +-
docs/_plugins/copy_api_dirs.rb | 2 +-
docs/api.md | 1 +
docs/bagel-programming-guide.md | 10 +-
docs/graphx-programming-guide.md | 1003 ++++++++++++++++++
docs/img/data_parallel_vs_graph_parallel.png | Bin 0 -> 432725 bytes
docs/img/edge-cut.png | Bin 0 -> 12563 bytes
docs/img/edge_cut_vs_vertex_cut.png | Bin 0 -> 79745 bytes
docs/img/graph_analytics_pipeline.png | Bin 0 -> 427220 bytes
docs/img/graph_parallel.png | Bin 0 -> 92288 bytes
docs/img/graphx_figures.pptx | Bin 0 -> 1123363 bytes
docs/img/graphx_logo.png | Bin 0 -> 40324 bytes
docs/img/graphx_performance_comparison.png | Bin 0 -> 166343 bytes
docs/img/property_graph.png | Bin 0 -> 225151 bytes
docs/img/tables_and_graphs.png | Bin 0 -> 166265 bytes
docs/img/triplet.png | Bin 0 -> 31489 bytes
docs/img/vertex-cut.png | Bin 0 -> 12246 bytes
docs/img/vertex_routing_edge_tables.png | Bin 0 -> 570007 bytes
docs/index.md | 4 +-
.../examples/graphx/LiveJournalPageRank.scala | 49 +
graphx/data/followers.txt | 8 +
graphx/data/users.txt | 7 +
graphx/pom.xml | 67 ++
.../scala/org/apache/spark/graphx/Edge.scala | 45 +
.../org/apache/spark/graphx/EdgeDirection.scala | 44 +
.../scala/org/apache/spark/graphx/EdgeRDD.scala | 102 ++
.../org/apache/spark/graphx/EdgeTriplet.scala | 49 +
.../scala/org/apache/spark/graphx/Graph.scala | 405 +++++++
.../spark/graphx/GraphKryoRegistrator.scala | 31 +
.../org/apache/spark/graphx/GraphLoader.scala | 72 ++
.../org/apache/spark/graphx/GraphOps.scala | 301 ++++++
.../apache/spark/graphx/PartitionStrategy.scala | 103 ++
.../scala/org/apache/spark/graphx/Pregel.scala | 139 +++
.../org/apache/spark/graphx/VertexRDD.scala | 347 ++++++
.../spark/graphx/impl/EdgePartition.scala | 220 ++++
.../graphx/impl/EdgePartitionBuilder.scala | 45 +
.../spark/graphx/impl/EdgeTripletIterator.scala | 42 +
.../apache/spark/graphx/impl/GraphImpl.scala | 379 +++++++
.../spark/graphx/impl/MessageToPartition.scala | 98 ++
.../graphx/impl/ReplicatedVertexView.scala | 195 ++++
.../apache/spark/graphx/impl/RoutingTable.scala | 65 ++
.../apache/spark/graphx/impl/Serializers.scala | 395 +++++++
.../spark/graphx/impl/VertexPartition.scala | 261 +++++
.../org/apache/spark/graphx/impl/package.scala | 7 +
.../org/apache/spark/graphx/lib/Analytics.scala | 136 +++
.../spark/graphx/lib/ConnectedComponents.scala | 38 +
.../org/apache/spark/graphx/lib/PageRank.scala | 147 +++
.../apache/spark/graphx/lib/SVDPlusPlus.scala | 138 +++
.../lib/StronglyConnectedComponents.scala | 94 ++
.../apache/spark/graphx/lib/TriangleCount.scala | 76 ++
.../scala/org/apache/spark/graphx/package.scala | 18 +
.../spark/graphx/util/BytecodeUtils.scala | 117 ++
.../spark/graphx/util/GraphGenerators.scala | 218 ++++
.../collection/PrimitiveKeyOpenHashMap.scala | 153 +++
graphx/src/test/resources/log4j.properties | 28 +
.../org/apache/spark/graphx/GraphOpsSuite.scala | 66 ++
.../org/apache/spark/graphx/GraphSuite.scala | 273 +++++
.../apache/spark/graphx/LocalSparkContext.scala | 28 +
.../org/apache/spark/graphx/PregelSuite.scala | 41 +
.../apache/spark/graphx/SerializerSuite.scala | 183 ++++
.../apache/spark/graphx/VertexRDDSuite.scala | 85 ++
.../spark/graphx/impl/EdgePartitionSuite.scala | 76 ++
.../graphx/impl/VertexPartitionSuite.scala | 113 ++
.../graphx/lib/ConnectedComponentsSuite.scala | 113 ++
.../apache/spark/graphx/lib/PageRankSuite.scala | 119 +++
.../spark/graphx/lib/SVDPlusPlusSuite.scala | 31 +
.../lib/StronglyConnectedComponentsSuite.scala | 57 +
.../spark/graphx/lib/TriangleCountSuite.scala | 70 ++
.../spark/graphx/util/BytecodeUtilsSuite.scala | 93 ++
pom.xml | 5 +-
project/SparkBuild.scala | 14 +-
76 files changed, 7132 insertions(+), 21 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4a805aff/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4a805aff/project/SparkBuild.scala
----------------------------------------------------------------------