You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2016/09/15 21:17:01 UTC
Ruminations on SparkGraphComputer Part 666
Hello,
Its about that time again. Spark 2.0 was released recently and, with the help of Chen Xin Yu, TINKERPOP-1389 has been updated to support Spark 2.0. How does it perform? A little faster here and a little slower there. Note that this work will go into TinkerPop 3.3.0. Currently, we don’t have a branch for TinkerPop 3.3.0 development work and until then, this will remain in TINKERPOP-1389. Finally, note that there have been no changes to SparkGraphComputer besides tweaks to get alignment with Spark’s API and serialization updates.
g.V().count() -- answer 125000000 (125 million vertices)
- TinkerPop 3.0.0.MX: 2.5 hours
- TinkerPop 3.0.0: 1.5 hours
- TinkerPop 3.1.1: 23 minutes
- TinkerPop 3.2.0: 6.8 minutes (Spark 1.5.2)
- TinkerPop 3.2.0: 5.5 minutes (Spark 1.6.1)
- TinkerPop 3.2.1: 2.2 minutes (Spark 1.6.1)
- TinkerPop 3.3.x: 1.6 minutes (Spark 2.2.0)
g.V().out().count() -- answer 2586147869 (2.5 billion length-1 paths (i.e. edges))
- TinkerPop 3.0.0.MX: unknown
- TinkerPop 3.0.0: 2.5 hours
- TinkerPop 3.1.1: 1.1 hours
- TinkerPop 3.2.0: 13 minutes (Spark 1.5.2)
- TinkerPop 3.2.0: 12 minutes (Spark 1.6.1)
- TinkerPop 3.2.1: 2.4 minutes (Spark 1.6.1)
- TinkerPop 3.3.x: 2.1 minutes (Spark 2.0.0)
g.V().out().out().count() -- answer 640528666156 (640 billion length-2 paths)
- TinkerPop 3.2.0: 55 minutes (Spark 1.5.2)
- TinkerPop 3.2.0: 50 minutes (Spark 1.6.1)
- TinkerPop 3.2.1: 37 minutes (Spark 1.6.1)
- TinkerPop 3.3.x: 40 minutes (Spark 2.0.0)
g.V().out().out().out().count() -- answer 215664338057221 (215 trillion length 3-paths)
- TinkerPop 3.0.0.MX: 12.8 hours
- TinkerPop 3.0.0: 8.6 hours
- TinkerPop 3.1.1: 2.4 hours
- TinkerPop 3.2.0: 1.6 hours (Spark 1.5.2)
- TinkerPop 3.2.0: 1.5 hours (Spark 1.6.1)
- TinkerPop 3.2.1: 1.1 hours (Spark 1.6.1)
- TinkerPop 3.3.x: 1.3 hours (Spark 2.0.0)
g.V().out().out().out().out().count() -- answer 83841426570464575 (83 quadrillion length 4-paths)
- TinkerPop 3.2.0: 2.1 hours (Spark 1.6.1)
- TinkerPop 3.2.1: 1.7 hours (Spark 1.6.1)
- TinkerPop 3.3.x: 2.0 hours (Spark 2.0.0)
g.V().out().out().out().out().out().count() -- answer -2280190503167902456 !! I blew the long space -- 64-bit overflow.
- TinkerPop 3.2.0: 2.8 hours (Spark 1.6.1)
- TinkerPop 3.2.1: 2.2 hours (Spark 1.6.1)
- TinkerPop 3.3.x: 2.6 hours (Spark 2.0.0)
g.V().group().by(outE().count()).by(count()).
- TinkerPop 3.2.0: 12 minutes (Spark 1.6.1)
- TinkerPop 3.2.1: 2.4 minutes (Spark 1.6.1)
- TinkerPop 3.3.x: 3.1 minutes (Spark 2.0.0)
g.V().groupCount().by(outE().count())
- TinkerPop 3.2.0: 12 minutes (Spark 1.6.1)
- TinkerPop 3.2.1: 2.7 minutes (Spark 1.6.1)
- TinkerPop 3.3.x: 2.2 minutes (Spark 2.0.0)
Take care,
Marko.
http://markorodriguez.com