You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2012/08/23 22:55:41 UTC

[jira] [Created] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Eli Reisman created GIRAPH-314:
----------------------------------

             Summary: Implement better message grouping to improve performance in SimpleTriangleClosingVertex
                 Key: GIRAPH-314
                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
             Project: Giraph
          Issue Type: Improvement
          Components: examples
    Affects Versions: 0.2.0
            Reporter: Eli Reisman
            Assignee: Eli Reisman
            Priority: Trivial
             Fix For: 0.2.0


After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.

Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455093#comment-13455093 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

No problem, I welcome the input. The combiner is not needed at the beginning or is just an extra step once at the sending side, because we just combined the messages using IntArrayListWritable instead of many IntWritables right from the get go. From the receiver side, combiners don't help us much because we still have incredible amounts of extra messages coming in over Netty all the time as long as the are serialized and de-serialized organized around Partition -> vertexid -> List<M> and thats what GIRAPH-322 addresses.

As for the message limiting, as long as the sender does not keep iterating on compute() and we don't overwhelm the sender that way, its a great idea. But once we serialize-deserialize to disk or anywhere else, we lose the single reference to each message and we get back individual objects, which then have to be put into a sender-side combiner or other extra plumbing, or just sent out duplicated on Netty. And we're talking about degree(V)^2 messages for all V in G(V) so its a lot to churn through in one superstep. The amortizing is fast and by avoiding the disk we leave the possibility for GIRAPH-322 to manage the message growth without serializing-deserializing and ending up with a bunch of instances to send over the wire again or random access on the disk. So I'm not conviced 314 + 322 are a good alternative, but they seem worth exploring at this point. If it turns out the only way to make large jobs on an application like 314 run to completion is to focus on spill to disk entirely, I will certainly embrace that route.



                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446155#comment-13446155 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

There might be another tweak I want to make to this if testing reveals its a good idea, maybe wait on commit and I'll report back ASAP...

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated GIRAPH-314:
-------------------------------

    Attachment: GIRAPH-314-2.patch

This patch represents a blatant tradeoff of compute time for space (in the form of memory, networking resources/buffers etc.)

It adds a user-configurable option that allows you to amortize the overwhelming load of degree(V)^2 messages in a triangle closing operation over as many super steps as you can stomach, which in practice has turned out to be surprisingly effective, and still reasonably quick given there is little work for most vertices to do in such super steps when only 1/N of them on any given superstep will do any sending, and many will similarly not receive anything.

This has been tested on the cluster (even under surprisingly high load) and passes mvn verify etc.

It will be part of a 2-pronged strategy to get triangle closing to work at a useful scale, the other half of which will be posted in a separate JIRA. But this part is good to go.


                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448953#comment-13448953 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

Yeah the idea is to not so much gain any performance but to let us get past degree(V)^2 messages being sent without crashing. Neither the disk-backed solutions nor in memory solutions are working so far as the messages just pile up so quickly at scale. So the "performance gain" is just surviving a large run at all.

This is sort of in prep for stage 2 where we assume we know some things about messages sent through sendMessageToAllEdges() calls (namely that there will be a lot of unneeded duplication as things stand now) and handle those differently through the whole pipeline. Even then to run at this at the scale I'm trying to, the amortization option has to be there also, so this is just getting a scalable example up and running for testing purposes.

I can fix the hashCode and check the javadoc, thanks again. We're amortizing the cost of all those messages over the time. So I guess its more of a trade off than an amortization. But then the un-amortized cost is crashing the job, so maybe it is...?



                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449342#comment-13449342 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

FYI: The "nightmare scenario" described above is the message growth, not combiners! ;) The general thinking there is by grouping the messages we eliminate the need on the sending side, and on the receive side the chances of an identical grouped message of this sort to be sent to the same partition destination vertices more than once is very rare and not driving the duplication.

But I'm open to thoughts on the matter! The "stage 2" stuff is getting run and tweaked right now, any eventual version will need to pass muster (and evolve to play nice with) with Maya's new message store system and probably be generally pluggable by use case regardless. More to follow on that JIRA (when I post it)

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448657#comment-13448657 ] 

Maja Kabiljo commented on GIRAPH-314:
-------------------------------------

So the performance gain here comes from the fact that when you spread the load over time you get to process messages before receiving new ones, and processed messages take less space? Did you try implementing this with a combiner?

A few tiny comments:

- There is one small mistake in IntArrayListWritable.hashCode(), you probably wanted to do:
result = result * 19 + iw.get();
or something similar.

- Maybe add a comment there about what the value of giraph.amortizeMessagingCost exactly means. 

- The javadoc for SimpleTriangleClosingVertex seems incorrect to me, wouldn't A also have B in its list and vice versa? If so, can you please fix the comment while you are there?
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449211#comment-13449211 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

The "M" type in this patch changing from IntWritable to IntArrayListWritable + hash.userPartitionCount==# of workers is the combiner minus the overhead. Our combiners want to aggregate or concatenate messages, by using an array message type we just skip that stage with the same effect. i was hoping to use a combiner originally but this is kind of the nightmare use case for Giraph in general.

A combiner won't work alone because of the E^2 message growth as I scale the input data up to size, and Giraph is sort of hardcoded to combine and interpret "PartitionId -> VertexId -> List<M>" and what I need to get algorithms like this to run at scale is PartitionId mapping to Message mapping to set of vertex destinations w/in the partition. The messaging I'm using in the "part 2" does a sort of "PartitionId -> M -> Set<I>" run-length encoding over Netty to do this, but I think I might just incorporate this into the pipeline from sendMessagesToAllEdges on down because the PartId -> vertId -> List<M> is sort of baked in everywhere and that code path is guaranteed to be one message to many recipients.

The out of core just plain crashed no matter what i did, same as Netty with the settings, once the messages are being generated they start to pile up wherever they are on the data path very quickly. I had the best luck so far with thousands of workers and in-core messaging + the amortizing, but the growth is quite fast at the volume of input I'm dealing with, and in the end the message deduplication plus this "amortization" is going to be the only way to cobble this together. This implementation got me in the door, but no amount of dividing at this level of message growth will get all the way there and still scale to run my input graph.

I think my goal is just to get something like a practical scale for triangle closing working for us, and then look at how to refine it into a more general way for Giraph users to take advantage of this kind of message growth. I hope to involve some on disk buffering and I'll certainly try it again but the growth has to be managed before we can even get to that point.

Thanks for your input, I'll certainly be reviewing these ideas as I go along.



                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450883#comment-13450883 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

I'd love for us to move this to GIRAPH-322 and get your input now that the code is up and you can see what the idea was, I am instrumenting it now so I can see where I messed up the wiring, but the basic idea is there. 

I didn't implement the combiner option yet in the solution I put up. I would be interested in trying some more as I am sure you're right that with better tuning (or more expert tuning) disk spill should be a part of a final solution. I was surprised it didn't work too, it looks like it should handle exactly this situation. And again, with better tuning perhaps it will.

But I was running real data, and a lot of it. Everyone here has noted the benchmarks are great for A/B'ing Giraph as it improves  and measuring progress in a sane way, but not great for comparing with conditions out in the wild. I'm hoping GIRAPH-26 will help close this gap, but for us the benchmarks have been poor predictors of real performance in our target use-cases.

As for my solution so far, the idea is to reduce the # of partitions to one per worker with -Dhash.userPartitionCount and then store messages so that only a single object is in memory at any given time (with one reference per partition destination), and they simply accumulate destination vertices and flush regularly. The only "real" messages that go out is 1 per partition that requires a copy of that message (depending on which vertices need it) which will differ per-message. Again, I tried to make the patch simple and changeable so we can tune this or improve the idea and try things to see what works best.

The problem I had so far with combiners is they just aggregate messages for one vertex rather than destinations for one message. In the solution so far I found it easier to just sort of set this stuff up by hand to happen since we are in a special case where we know something we can use about the properties of a message sent with sendMessageToAllEdges() and can avoid some of the object creations and checks along the way. As you said, the place for a combiner in this scenario if anywhere seems to be on the receiving end.

Now that the "game plan" patch is up, I'll be very interest in ideas and observations. If this gets any traction, we could then set up a disk spill strategy for these types of messages that does not re-duplicate them on load (since the message stores are all currently set up for Giraph's existing Partition -> Vertex -> List<M> paradigm.) Alternately, the whole exercise might be a waste of time ;) but I have it on good authority this is a route worth pursuing, so we'll see where it leads.

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449348#comment-13449348 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

Actually, what I'm looking at here for "stage 2" is looking a lot like the data structure from sendMessagesToAllVertex down to the run-length encoded request is going to map like M -> partitionId -> Set<I> so that its easy to make sure we aren't keeping any extra copies of the same message on the way. This doesn't help with my application as messages are likely to be more often unique than individual destinations will be, but not for everyone so this might be a better mapping for a general use case of this deduplicating feature and should not hurt my use case in the process.

Anyway, I'll post it soon so we can address shortcomings!
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449015#comment-13449015 ] 

Maja Kabiljo commented on GIRAPH-314:
-------------------------------------

I see. 

So you'll add a combiner in the stage 2?

Out-of-core will be slower, but I'm just curious, have you tried setting giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests together with out-of-core messages? 

Maybe once you add a combiner these two options alone could be enough, since in that case limiting the number of open requests should be somewhat equivalent to amortizing?
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated GIRAPH-314:
-------------------------------

    Attachment: GIRAPH-314-3.patch

Thanks for the review Maja, this really helped. While I was in there I fixed a couple of things and it was good to take a 2nd look. The comments had been butchered over several cut and paste style revisions and I figured a switch to a simple example was probably more illustrative anyway. 

Thanks again

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated GIRAPH-314:
-------------------------------

    Attachment: GIRAPH-314-4.patch

oops. unused import stayed in on last patch. should be good to go now.

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454725#comment-13454725 ] 

Maja Kabiljo commented on GIRAPH-314:
-------------------------------------

Sorry that I keep asking about this, but here is the thing I'm trying to get to with your problem size discussion: is above described combiner together with limiting number of open requests (but still everything in-core) a good alternative to this solution, and if not why not? Amortizing says convert messages to these maps every once in a while - that's what combiner could do. And amortizing says wait for part of messages to be processed before sending/receiving new ones - that's what limiting number of open requests does.
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated GIRAPH-314:
-------------------------------

    Attachment: GIRAPH-314-1.patch
    
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449716#comment-13449716 ] 

Maja Kabiljo commented on GIRAPH-314:
-------------------------------------

Great change, javadoc is much easier to understand now.

Two options which I mentioned should prevent us from generating new messages while enough of the current messages are not processed. So if we use out-of-core messages they shouldn't be able to pile up. With those options I was able to run RandomMessageBenchmark with really huge number of messages (it was slow, of course, but it worked). I'm surprised to hear it didn't work for you.

I'm not sure that we are thinking of the same combiner. Correct me if I'm wrong, but the reason why amortizing saves you is that you get to process part of messages before receiving new ones. And processing messages decrease memory used just by replacing several occurrences of one second degree neighbour with the single number of occurrences. That's what combiner should also do.

So you are planning to change the infrastructure, in order to support sending the same message to several vertices on the same worker in a better way? So that in practice we only send the message and the list of destination vertices, and on the destination worker we have only one copy of the message? That sounds like a really good improvement for this and similar applications, where messages are big objects. If messages are not combinable, and if we would have some good partitioning, this could really decrease the amount of traffic and memory usage here.
                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-314) Implement better message grouping to improve performance in SimpleTriangleClosingVertex

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486437#comment-13486437 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

Not too worried about getting this in, its still an experiment and its one I'm not provisioned to stress test at the moment. When things in the message passing plumbing settle down, I will come back to this. The results I got were intriguing in practice, but as someone like Sebastian would tell you, this does not represent a solution to the message growth problems with triangle closing.

If this is a JIRA issue holding up the new release, we can mark this won't fix. Otherwise, I'll come back to it when the messaging code ripens or I have a proper cluster to abuse. Schemes are afoot for both... ;)

                
> Implement better message grouping to improve performance in SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the sendMessageToAllEdges() is pretty in the code, but its not a good idea in practice since each vertex V sends degree(V)^2 messages right in the first superset in this algorithm. Could do something with a combiner etc. but just grouping messages by hand at the application level by using IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira