You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2016/09/15 21:17:01 UTC

Ruminations on SparkGraphComputer Part 666

Hello,

Its about that time again. Spark 2.0 was released recently and, with the help of Chen Xin Yu, TINKERPOP-1389 has been updated to support Spark 2.0. How does it perform? A little faster here and a little slower there. Note that this work will go into TinkerPop 3.3.0. Currently, we don’t have a branch for TinkerPop 3.3.0 development work and until then, this will remain in TINKERPOP-1389. Finally, note that there have been no changes to SparkGraphComputer besides tweaks to get alignment with Spark’s API and serialization updates.

g.V().count() -- answer 125000000 (125 million vertices)
	- TinkerPop 3.0.0.MX: 2.5 hours
	- TinkerPop 3.0.0:	1.5 hours
	- TinkerPop 3.1.1:	23 minutes
	- TinkerPop 3.2.0:	6.8 minutes (Spark 1.5.2)
	- TinkerPop 3.2.0:	5.5 minutes (Spark 1.6.1)
	- TinkerPop 3.2.1:	2.2 minutes (Spark 1.6.1)
	- TinkerPop 3.3.x:	1.6 minutes (Spark 2.2.0)

g.V().out().count() -- answer 2586147869 (2.5 billion length-1 paths (i.e. edges))
	- TinkerPop 3.0.0.MX: unknown
	- TinkerPop 3.0.0:	2.5 hours
	- TinkerPop 3.1.1:	1.1 hours
	- TinkerPop 3.2.0:	13 minutes (Spark 1.5.2)
	- TinkerPop 3.2.0:	12 minutes (Spark 1.6.1)
	- TinkerPop 3.2.1:	2.4 minutes (Spark 1.6.1)
	- TinkerPop 3.3.x:	2.1 minutes (Spark 2.0.0)
	
g.V().out().out().count() -- answer 640528666156 (640 billion length-2 paths)
	- TinkerPop 3.2.0:	55 minutes (Spark 1.5.2)
	- TinkerPop 3.2.0:	50 minutes (Spark 1.6.1)
	- TinkerPop 3.2.1:	37 minutes (Spark 1.6.1)
	- TinkerPop 3.3.x:	40 minutes (Spark 2.0.0)

g.V().out().out().out().count() -- answer 215664338057221 (215 trillion length 3-paths)
	- TinkerPop 3.0.0.MX: 12.8 hours
	- TinkerPop 3.0.0:	8.6 hours
	- TinkerPop 3.1.1:	2.4 hours
	- TinkerPop 3.2.0:	1.6 hours (Spark 1.5.2)
	- TinkerPop 3.2.0:	1.5 hours (Spark 1.6.1)
	- TinkerPop 3.2.1:	1.1 hours (Spark 1.6.1)
	- TinkerPop 3.3.x:	1.3 hours (Spark 2.0.0)

g.V().out().out().out().out().count() -- answer 83841426570464575 (83 quadrillion length 4-paths)
	- TinkerPop 3.2.0:	2.1 hours (Spark 1.6.1)
	- TinkerPop 3.2.1:	1.7 hours (Spark 1.6.1)
	- TinkerPop 3.3.x:	2.0 hours (Spark 2.0.0)

g.V().out().out().out().out().out().count() -- answer -2280190503167902456 !! I blew the long space -- 64-bit overflow.
	- TinkerPop 3.2.0:	2.8 hours (Spark 1.6.1)
	- TinkerPop 3.2.1:	2.2 hours (Spark 1.6.1)
	- TinkerPop 3.3.x:	2.6 hours (Spark 2.0.0)

g.V().group().by(outE().count()).by(count()). 
	- TinkerPop 3.2.0: 	12 minutes (Spark 1.6.1)
	- TinkerPop 3.2.1: 	2.4 minutes (Spark 1.6.1)
	- TinkerPop 3.3.x:	3.1 minutes (Spark 2.0.0)

g.V().groupCount().by(outE().count())
	- TinkerPop 3.2.0:	12 minutes (Spark 1.6.1)
 	- TinkerPop 3.2.1:	2.7 minutes (Spark 1.6.1)
 	- TinkerPop 3.3.x:	2.2 minutes (Spark 2.0.0)

Take care,
Marko.

http://markorodriguez.com