You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Daniel Quest <da...@gmail.com> on 2016/04/06 01:48:17 UTC

Re: [TinkerPop] Ruminations on SparkGraphComputer Part Drei

Marko, this is great!  I always like it when you send out posts like this!

Best
DQ

Sent from my iPhone

> On Apr 5, 2016, at 4:43 PM, Marko Rodriguez <ok...@gmail.com> wrote:
> 
> Hello,
> 
> With the imminent release of TinkerPop 3.2.0, during our week long code freeze, I took 3.2.0 for a spin on a 4 node Blade cluster using the Friendster graph which is composed of 125 million vertices and 2.5 billion edges. TinkerPop 3.2.0 will release using Spark 1.6.1. Note that there were some issues in the initial testing around SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES. The 1.5.2 settings I used previously were "too much" for 1.6.1. I toned it down a bit and things work smoothly, and interestingly enough, with seemingly less "firepower," we are getting better results (speed-wise). Enjoy the results.
> 
> g.V().count() -- answer 125000000 (125 million vertices)
> 	- TinkerPop 3.0.0.MX: 2.5 hours
> 	- TinkerPop 3.0.0:	1.5 hours
> 	- TinkerPop 3.1.1:	23 minutes
> 	- TinkerPop 3.2.0:	6.8 minutes (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	5.5 minutes (Spark 1.6.1)
> 
> g.V().out().count() -- answer 2586147869 (2.5 billion length-1 paths (i.e. edges))
> 	- TinkerPop 3.0.0.MX: unknown
> 	- TinkerPop 3.0.0:	2.5 hours
> 	- TinkerPop 3.1.1:	1.1 hours
> 	- TinkerPop 3.2.0:	13 minutes (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	12 minutes (Spark 1.6.1)
> 	
> g.V().out().out().count() -- answer 640528666156 (640 billion length-2 paths)
> 	- TinkerPop 3.0.0.MX: unknown
> 	- TinkerPop 3.0.0:	unknown
> 	- TinkerPop 3.1.1:	unknown
> 	- TinkerPop 3.2.0:	55 minutes (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	50 minutes (Spark 1.6.1)
> 
> g.V().out().out().out().count() -- answer 215664338057221 (215 trillion length 3-paths)
> 	- TinkerPop 3.0.0.MX: 12.8 hours
> 	- TinkerPop 3.0.0:	8.6 hours
> 	- TinkerPop 3.1.1:	2.4 hours
> 	- TinkerPop 3.2.0:	1.6 hours (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	1.5 hours (Spark 1.6.1)
> 
> g.V().out().out().out().out().count() -- answer 83841426570464575 (83 quadrillion length 4-paths)
> 	- TinkerPop 3.0.0.MX: unknown
> 	- TinkerPop 3.0.0:	unknown
> 	- TinkerPop 3.1.1:	unknown
> 	- TinkerPop 3.2.0:	unknown (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	2.1 hours (Spark 1.6.1)
> 
> g.V().out().out().out().out().count() -- answer -2280190503167902456 !! I blew the long space -- 64-bit overflow.
> 	- TinkerPop 3.0.0.MX: unknown
> 	- TinkerPop 3.0.0:	unknown
> 	- TinkerPop 3.1.1:	unknown
> 	- TinkerPop 3.2.0:	unknown (Spark 1.5.2)
> 	- TinkerPop 3.2.0:	2.8 hours (Spark 1.6.1)
> 
> Next, group()-step has been redesigned to be much more efficient in OLAP mode when the by()-value traversal maintains a ReducingBarrierStep (e.g. count, sum, max, min, fold, mean, ...). Thus, prior to this moment, something like:
> 
> g.V().group().by(outE().count()).by(count())
> 
>   // this is equivalent to g.V().map(outE().count()).groupCount(), 
>   // but I wanted to test group()'s new reducer model.
> 
> ….would have failed miserably on such a large graph. However, with TinkerPop 3.2.0, because the second by() (the value traversal) maintains a ReducingBarrierStep (count()), we get on-the-fly reductions which limits memory usage and ensure that such group'ing traversal now work at scale in OLAP.
> 
> g.V().group().by(outE().count()).by(count()) -- answer below. 
> 	- TinkerPop 3.2.0: 12 minutes (Spark 1.6.1)
> 
> ==>[0:68889802, 1:14490104, 2:5924264, 3:3630690, 4:2520455, 5:1887641, 6:1499489, 7:1235456, 8:1048559, 9:909576, 10:802183, 11:716357, 12:644813, 13:590507, 14:542157, 15:501000, 16:465449, 17:434955, 18:407146, 19:383250, 20:362687, 21:341529, 22:325269, 23:308506, 24:295382, 25:282257, 26:270540, 27:259267, 28:248882, 29:241110, 30:240857, 31:221426, 32:213362, 33:206135, 34:200053, 35:193185, 36:186947, 37:181301, 38:176271, 39:171148, 40:166312, 41:161646, 42:156552, 43:153162, 44:148875, 45:145339, 46:141780, 47:138058, 48:135479, 49:131795, 50:128793, 51:126391, 52:123254, 53:121081, 54:118758, 55:115864, 56:113936, 57:110845, 58:108192, 59:106723, 60:104243, 61:102829, 62:100759, 63:98617, 64:96827, 65:95385, 66:93629, 67:92324, 68:90519, 69:88766, 70:87682, 71:85794, 72:84279, 73:83389, 74:81654, 75:80978, 76:78906, 77:78126, 78:76857, 79:75987, 80:75312, 81:73354, 82:72901, 83:71195, 84:70463, 85:69502, 86:68107, 87:66984, 88:65986, 89:65349, 90:64568, 91:63761, 92:63283, 93:62092, 94:61089, 95:60195, 96:59655, 97:58788, 98:57847, 99:56935, 100:57341, 101:55483, 102:54973, 103:54610, 104:53367, 105:53699, 106:52948, 107:52060, 108:51386, 109:51032, 110:50442, 111:49429, 112:48994, 113:48790, 114:48250, 115:47808, 116:47517, 117:47024, 118:46299, 119:45855, 120:45529, 121:45262, 122:44453, 123:43738, 124:43768, 125:43257, 126:42852, 127:41977, 128:41580, 129:41091, 130:41027, 131:40569, 132:40019, 133:39416, 134:39448, 135:38935, 136:38228, 137:37863, 138:37641, 139:37261, 140:36908, 141:36326, 142:36090, 143:35654, 144:35610, 145:34760, 146:34946, 147:34355, 148:33948, 149:33946, 150:33341, 151:33193, 152:32877, 153:32440, 154:32268, 155:31728, 156:31627, 157:30762, 158:30625, 159:30233, 160:30345, 161:29881, 162:29851, 163:29523, 164:29081, 165:28844, 166:28402, 167:28053, 168:27706, 169:27623, 170:27502, 171:27156, 172:27112, 173:26538, 174:26578, 175:26187, 176:25951, 177:25572, 178:25297, 179:25441, 180:24653, 181:24935, 182:24478, 183:24262, 184:23926, 185:24006, 186:23499, 187:23317, 188:22860, 189:22704, 190:22441, 191:22565, 192:22164, 193:22105, 194:21728, 195:21870, 196:21431, 197:21395,
> ...
> 
> Take care,
> Marko.
> 
> http://markorodriguez.com
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/0F921BDF-E8C6-4A90-B479-68090E8AAEC5%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [TinkerPop] Ruminations on SparkGraphComputer Part Drei

Posted by Marko Rodriguez <ok...@gmail.com>.
https://pmcdeadline2.files.wordpress.com/2015/05/jonny-quest.jpg



On Apr 5, 2016, at 5:48 PM, Daniel Quest <da...@gmail.com> wrote:

> Marko, this is great!  I always like it when you send out posts like this!
> 
> Best
> DQ
> 
> Sent from my iPhone
> 
> On Apr 5, 2016, at 4:43 PM, Marko Rodriguez <ok...@gmail.com> wrote:
> 
>> Hello,
>> 
>> With the imminent release of TinkerPop 3.2.0, during our week long code freeze, I took 3.2.0 for a spin on a 4 node Blade cluster using the Friendster graph which is composed of 125 million vertices and 2.5 billion edges. TinkerPop 3.2.0 will release using Spark 1.6.1. Note that there were some issues in the initial testing around SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES. The 1.5.2 settings I used previously were "too much" for 1.6.1. I toned it down a bit and things work smoothly, and interestingly enough, with seemingly less "firepower," we are getting better results (speed-wise). Enjoy the results.
>> 
>> g.V().count() -- answer 125000000 (125 million vertices)
>> 	- TinkerPop 3.0.0.MX: 2.5 hours
>> 	- TinkerPop 3.0.0:	1.5 hours
>> 	- TinkerPop 3.1.1:	23 minutes
>> 	- TinkerPop 3.2.0:	6.8 minutes (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	5.5 minutes (Spark 1.6.1)
>> 
>> g.V().out().count() -- answer 2586147869 (2.5 billion length-1 paths (i.e. edges))
>> 	- TinkerPop 3.0.0.MX: unknown
>> 	- TinkerPop 3.0.0:	2.5 hours
>> 	- TinkerPop 3.1.1:	1.1 hours
>> 	- TinkerPop 3.2.0:	13 minutes (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	12 minutes (Spark 1.6.1)
>> 	
>> g.V().out().out().count() -- answer 640528666156 (640 billion length-2 paths)
>> 	- TinkerPop 3.0.0.MX: unknown
>> 	- TinkerPop 3.0.0:	unknown
>> 	- TinkerPop 3.1.1:	unknown
>> 	- TinkerPop 3.2.0:	55 minutes (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	50 minutes (Spark 1.6.1)
>> 
>> g.V().out().out().out().count() -- answer 215664338057221 (215 trillion length 3-paths)
>> 	- TinkerPop 3.0.0.MX: 12.8 hours
>> 	- TinkerPop 3.0.0:	8.6 hours
>> 	- TinkerPop 3.1.1:	2.4 hours
>> 	- TinkerPop 3.2.0:	1.6 hours (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	1.5 hours (Spark 1.6.1)
>> 
>> g.V().out().out().out().out().count() -- answer 83841426570464575 (83 quadrillion length 4-paths)
>> 	- TinkerPop 3.0.0.MX: unknown
>> 	- TinkerPop 3.0.0:	unknown
>> 	- TinkerPop 3.1.1:	unknown
>> 	- TinkerPop 3.2.0:	unknown (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	2.1 hours (Spark 1.6.1)
>> 
>> g.V().out().out().out().out().count() -- answer -2280190503167902456 !! I blew the long space -- 64-bit overflow.
>> 	- TinkerPop 3.0.0.MX: unknown
>> 	- TinkerPop 3.0.0:	unknown
>> 	- TinkerPop 3.1.1:	unknown
>> 	- TinkerPop 3.2.0:	unknown (Spark 1.5.2)
>> 	- TinkerPop 3.2.0:	2.8 hours (Spark 1.6.1)
>> 
>> Next, group()-step has been redesigned to be much more efficient in OLAP mode when the by()-value traversal maintains a ReducingBarrierStep (e.g. count, sum, max, min, fold, mean, ...). Thus, prior to this moment, something like:
>> 
>> g.V().group().by(outE().count()).by(count())
>> 
>>   // this is equivalent to g.V().map(outE().count()).groupCount(), 
>>   // but I wanted to test group()'s new reducer model.
>> 
>> ….would have failed miserably on such a large graph. However, with TinkerPop 3.2.0, because the second by() (the value traversal) maintains a ReducingBarrierStep (count()), we get on-the-fly reductions which limits memory usage and ensure that such group'ing traversal now work at scale in OLAP.
>> 
>> g.V().group().by(outE().count()).by(count()) -- answer below. 
>> 	- TinkerPop 3.2.0: 12 minutes (Spark 1.6.1)
>> 
>> ==>[0:68889802, 1:14490104, 2:5924264, 3:3630690, 4:2520455, 5:1887641, 6:1499489, 7:1235456, 8:1048559, 9:909576, 10:802183, 11:716357, 12:644813, 13:590507, 14:542157, 15:501000, 16:465449, 17:434955, 18:407146, 19:383250, 20:362687, 21:341529, 22:325269, 23:308506, 24:295382, 25:282257, 26:270540, 27:259267, 28:248882, 29:241110, 30:240857, 31:221426, 32:213362, 33:206135, 34:200053, 35:193185, 36:186947, 37:181301, 38:176271, 39:171148, 40:166312, 41:161646, 42:156552, 43:153162, 44:148875, 45:145339, 46:141780, 47:138058, 48:135479, 49:131795, 50:128793, 51:126391, 52:123254, 53:121081, 54:118758, 55:115864, 56:113936, 57:110845, 58:108192, 59:106723, 60:104243, 61:102829, 62:100759, 63:98617, 64:96827, 65:95385, 66:93629, 67:92324, 68:90519, 69:88766, 70:87682, 71:85794, 72:84279, 73:83389, 74:81654, 75:80978, 76:78906, 77:78126, 78:76857, 79:75987, 80:75312, 81:73354, 82:72901, 83:71195, 84:70463, 85:69502, 86:68107, 87:66984, 88:65986, 89:65349, 90:64568, 91:63761, 92:63283, 93:62092, 94:61089, 95:60195, 96:59655, 97:58788, 98:57847, 99:56935, 100:57341, 101:55483, 102:54973, 103:54610, 104:53367, 105:53699, 106:52948, 107:52060, 108:51386, 109:51032, 110:50442, 111:49429, 112:48994, 113:48790, 114:48250, 115:47808, 116:47517, 117:47024, 118:46299, 119:45855, 120:45529, 121:45262, 122:44453, 123:43738, 124:43768, 125:43257, 126:42852, 127:41977, 128:41580, 129:41091, 130:41027, 131:40569, 132:40019, 133:39416, 134:39448, 135:38935, 136:38228, 137:37863, 138:37641, 139:37261, 140:36908, 141:36326, 142:36090, 143:35654, 144:35610, 145:34760, 146:34946, 147:34355, 148:33948, 149:33946, 150:33341, 151:33193, 152:32877, 153:32440, 154:32268, 155:31728, 156:31627, 157:30762, 158:30625, 159:30233, 160:30345, 161:29881, 162:29851, 163:29523, 164:29081, 165:28844, 166:28402, 167:28053, 168:27706, 169:27623, 170:27502, 171:27156, 172:27112, 173:26538, 174:26578, 175:26187, 176:25951, 177:25572, 178:25297, 179:25441, 180:24653, 181:24935, 182:24478, 183:24262, 184:23926, 185:24006, 186:23499, 187:23317, 188:22860, 189:22704, 190:22441, 191:22565, 192:22164, 193:22105, 194:21728, 195:21870, 196:21431, 197:21395,
>> ...
>> 
>> Take care,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/0F921BDF-E8C6-4A90-B479-68090E8AAEC5%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/582BCBB5-95C6-4B32-850E-9DCCD08C31B5%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.