You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/01/14 08:00:14 UTC

[50/50] git commit: Merge pull request #367 from ankurdave/graphx

Merge pull request #367 from ankurdave/graphx

GraphX: Unifying Graphs and Tables

GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/.

Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak.

Tasks left:
- [x] Graph-level uncache
- [x] Uncache previous iterations in Pregel
- [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release)
- [x] - Describe GC issue with GraphLab
- [ ] Write `docs/graphx-programming-guide.md`
- [x] - Mention future Bagel support in docs
- [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again.
- [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx
- [x] Make Graph serializable to work around capture in Spark shell
- [x] Rename graph -> graphx in package name and subproject
- [x] Remove standalone PageRank
- [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/4a805aff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/4a805aff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/4a805aff

Branch: refs/heads/master
Commit: 4a805aff5e381752afb2bfd579af908d623743ed
Parents: 945fe7a 80e73ed
Author: Patrick Wendell <pw...@gmail.com>
Authored: Mon Jan 13 22:58:38 2014 -0800
Committer: Patrick Wendell <pw...@gmail.com>
Committed: Mon Jan 13 22:58:38 2014 -0800

----------------------------------------------------------------------
 bin/compute-classpath.sh                        |    2 +
 .../org/apache/spark/rdd/PairRDDFunctions.scala |    2 +-
 .../main/scala/org/apache/spark/rdd/RDD.scala   |    5 +
 .../apache/spark/util/collection/BitSet.scala   |   87 +-
 .../spark/util/collection/OpenHashSet.scala     |   23 +-
 docs/_layouts/global.html                       |    8 +-
 docs/_plugins/copy_api_dirs.rb                  |    2 +-
 docs/api.md                                     |    1 +
 docs/bagel-programming-guide.md                 |   10 +-
 docs/graphx-programming-guide.md                | 1003 ++++++++++++++++++
 docs/img/data_parallel_vs_graph_parallel.png    |  Bin 0 -> 432725 bytes
 docs/img/edge-cut.png                           |  Bin 0 -> 12563 bytes
 docs/img/edge_cut_vs_vertex_cut.png             |  Bin 0 -> 79745 bytes
 docs/img/graph_analytics_pipeline.png           |  Bin 0 -> 427220 bytes
 docs/img/graph_parallel.png                     |  Bin 0 -> 92288 bytes
 docs/img/graphx_figures.pptx                    |  Bin 0 -> 1123363 bytes
 docs/img/graphx_logo.png                        |  Bin 0 -> 40324 bytes
 docs/img/graphx_performance_comparison.png      |  Bin 0 -> 166343 bytes
 docs/img/property_graph.png                     |  Bin 0 -> 225151 bytes
 docs/img/tables_and_graphs.png                  |  Bin 0 -> 166265 bytes
 docs/img/triplet.png                            |  Bin 0 -> 31489 bytes
 docs/img/vertex-cut.png                         |  Bin 0 -> 12246 bytes
 docs/img/vertex_routing_edge_tables.png         |  Bin 0 -> 570007 bytes
 docs/index.md                                   |    4 +-
 .../examples/graphx/LiveJournalPageRank.scala   |   49 +
 graphx/data/followers.txt                       |    8 +
 graphx/data/users.txt                           |    7 +
 graphx/pom.xml                                  |   67 ++
 .../scala/org/apache/spark/graphx/Edge.scala    |   45 +
 .../org/apache/spark/graphx/EdgeDirection.scala |   44 +
 .../scala/org/apache/spark/graphx/EdgeRDD.scala |  102 ++
 .../org/apache/spark/graphx/EdgeTriplet.scala   |   49 +
 .../scala/org/apache/spark/graphx/Graph.scala   |  405 +++++++
 .../spark/graphx/GraphKryoRegistrator.scala     |   31 +
 .../org/apache/spark/graphx/GraphLoader.scala   |   72 ++
 .../org/apache/spark/graphx/GraphOps.scala      |  301 ++++++
 .../apache/spark/graphx/PartitionStrategy.scala |  103 ++
 .../scala/org/apache/spark/graphx/Pregel.scala  |  139 +++
 .../org/apache/spark/graphx/VertexRDD.scala     |  347 ++++++
 .../spark/graphx/impl/EdgePartition.scala       |  220 ++++
 .../graphx/impl/EdgePartitionBuilder.scala      |   45 +
 .../spark/graphx/impl/EdgeTripletIterator.scala |   42 +
 .../apache/spark/graphx/impl/GraphImpl.scala    |  379 +++++++
 .../spark/graphx/impl/MessageToPartition.scala  |   98 ++
 .../graphx/impl/ReplicatedVertexView.scala      |  195 ++++
 .../apache/spark/graphx/impl/RoutingTable.scala |   65 ++
 .../apache/spark/graphx/impl/Serializers.scala  |  395 +++++++
 .../spark/graphx/impl/VertexPartition.scala     |  261 +++++
 .../org/apache/spark/graphx/impl/package.scala  |    7 +
 .../org/apache/spark/graphx/lib/Analytics.scala |  136 +++
 .../spark/graphx/lib/ConnectedComponents.scala  |   38 +
 .../org/apache/spark/graphx/lib/PageRank.scala  |  147 +++
 .../apache/spark/graphx/lib/SVDPlusPlus.scala   |  138 +++
 .../lib/StronglyConnectedComponents.scala       |   94 ++
 .../apache/spark/graphx/lib/TriangleCount.scala |   76 ++
 .../scala/org/apache/spark/graphx/package.scala |   18 +
 .../spark/graphx/util/BytecodeUtils.scala       |  117 ++
 .../spark/graphx/util/GraphGenerators.scala     |  218 ++++
 .../collection/PrimitiveKeyOpenHashMap.scala    |  153 +++
 graphx/src/test/resources/log4j.properties      |   28 +
 .../org/apache/spark/graphx/GraphOpsSuite.scala |   66 ++
 .../org/apache/spark/graphx/GraphSuite.scala    |  273 +++++
 .../apache/spark/graphx/LocalSparkContext.scala |   28 +
 .../org/apache/spark/graphx/PregelSuite.scala   |   41 +
 .../apache/spark/graphx/SerializerSuite.scala   |  183 ++++
 .../apache/spark/graphx/VertexRDDSuite.scala    |   85 ++
 .../spark/graphx/impl/EdgePartitionSuite.scala  |   76 ++
 .../graphx/impl/VertexPartitionSuite.scala      |  113 ++
 .../graphx/lib/ConnectedComponentsSuite.scala   |  113 ++
 .../apache/spark/graphx/lib/PageRankSuite.scala |  119 +++
 .../spark/graphx/lib/SVDPlusPlusSuite.scala     |   31 +
 .../lib/StronglyConnectedComponentsSuite.scala  |   57 +
 .../spark/graphx/lib/TriangleCountSuite.scala   |   70 ++
 .../spark/graphx/util/BytecodeUtilsSuite.scala  |   93 ++
 pom.xml                                         |    5 +-
 project/SparkBuild.scala                        |   14 +-
 76 files changed, 7132 insertions(+), 21 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4a805aff/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4a805aff/project/SparkBuild.scala
----------------------------------------------------------------------