You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sebastian Schelter <ss...@apache.org> on 2012/05/03 15:44:06 UTC

Out-of-core messaging

Hi,

I'd like to ask whether someone is currently working on out-of-core
messaging for Giraph (e.g. by spilling messages to disk in case of
memory pressure).

I ran some experiments with Giraph on a small 6-machine cluster and got
really nice results for smaller datasets such as the wikipedia pagelink
graph (6M vertices, ~250M edges in its undirected version).

For larger graphs with a even more skewed degree distribution such as
the twitter follower graph from [1], Giraph crashes in the first
superstep unfortunately. My colleagues observed the same, when they ran
benchmarks of Giraph against the Stratosphere system [2], where Giraph
did kind of well for small datasets, but again crashed for larger ones...

I think the lack of out-of-core messages is currently the biggest
obstacle to recommending people to test Giraph in production use.

Best,
Sebastian


[1] http://konect.uni-koblenz.de/networks/twitter
[2] http://www.stratosphere.eu/

Re: Out-of-core messaging

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Claudio,

Great to hear that!

Please send me the 4-liner, maybe I (or my colleagues) can be helpful!

'Out-of-core messaging' would be a great topic for the BB workshop, I'll
keep that in mind :)

Best,
Sebastian



On 03.05.2012 16:03, Claudio Martella wrote:
> Hi Sebastian,
> 
> I definitely agree with you on this one.
> 
> I'm currently working on it, but I'm kind of stuck with a small bug to
> be accounted to some concurrency we can't understand (I have a 4
> liners that can reproduce it, if you want to help out). Avery and I
> are currently discussing on the possibility to write a paper on the
> solution, so hopefully I should be able to let you know better in a
> couple of weeks.
> 
> 
> On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi,
>>
>> I'd like to ask whether someone is currently working on out-of-core
>> messaging for Giraph (e.g. by spilling messages to disk in case of
>> memory pressure).
>>
>> I ran some experiments with Giraph on a small 6-machine cluster and got
>> really nice results for smaller datasets such as the wikipedia pagelink
>> graph (6M vertices, ~250M edges in its undirected version).
>>
>> For larger graphs with a even more skewed degree distribution such as
>> the twitter follower graph from [1], Giraph crashes in the first
>> superstep unfortunately. My colleagues observed the same, when they ran
>> benchmarks of Giraph against the Stratosphere system [2], where Giraph
>> did kind of well for small datasets, but again crashed for larger ones...
>>
>> I think the lack of out-of-core messages is currently the biggest
>> obstacle to recommending people to test Giraph in production use.
>>
>> Best,
>> Sebastian
>>
>>
>> [1] http://konect.uni-koblenz.de/networks/twitter
>> [2] http://www.stratosphere.eu/
> 
> 
> 


Re: Out-of-core messaging

Posted by Claudio Martella <cl...@gmail.com>.
Oh,

forgot to mention that the related JIRA is
https://issues.apache.org/jira/browse/GIRAPH-45

On Thu, May 3, 2012 at 4:03 PM, Claudio Martella
<cl...@gmail.com> wrote:
> Hi Sebastian,
>
> I definitely agree with you on this one.
>
> I'm currently working on it, but I'm kind of stuck with a small bug to
> be accounted to some concurrency we can't understand (I have a 4
> liners that can reproduce it, if you want to help out). Avery and I
> are currently discussing on the possibility to write a paper on the
> solution, so hopefully I should be able to let you know better in a
> couple of weeks.
>
>
> On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi,
>>
>> I'd like to ask whether someone is currently working on out-of-core
>> messaging for Giraph (e.g. by spilling messages to disk in case of
>> memory pressure).
>>
>> I ran some experiments with Giraph on a small 6-machine cluster and got
>> really nice results for smaller datasets such as the wikipedia pagelink
>> graph (6M vertices, ~250M edges in its undirected version).
>>
>> For larger graphs with a even more skewed degree distribution such as
>> the twitter follower graph from [1], Giraph crashes in the first
>> superstep unfortunately. My colleagues observed the same, when they ran
>> benchmarks of Giraph against the Stratosphere system [2], where Giraph
>> did kind of well for small datasets, but again crashed for larger ones...
>>
>> I think the lack of out-of-core messages is currently the biggest
>> obstacle to recommending people to test Giraph in production use.
>>
>> Best,
>> Sebastian
>>
>>
>> [1] http://konect.uni-koblenz.de/networks/twitter
>> [2] http://www.stratosphere.eu/
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: Out-of-core messaging

Posted by Claudio Martella <cl...@gmail.com>.
Hi Sebastian,

I definitely agree with you on this one.

I'm currently working on it, but I'm kind of stuck with a small bug to
be accounted to some concurrency we can't understand (I have a 4
liners that can reproduce it, if you want to help out). Avery and I
are currently discussing on the possibility to write a paper on the
solution, so hopefully I should be able to let you know better in a
couple of weeks.


On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi,
>
> I'd like to ask whether someone is currently working on out-of-core
> messaging for Giraph (e.g. by spilling messages to disk in case of
> memory pressure).
>
> I ran some experiments with Giraph on a small 6-machine cluster and got
> really nice results for smaller datasets such as the wikipedia pagelink
> graph (6M vertices, ~250M edges in its undirected version).
>
> For larger graphs with a even more skewed degree distribution such as
> the twitter follower graph from [1], Giraph crashes in the first
> superstep unfortunately. My colleagues observed the same, when they ran
> benchmarks of Giraph against the Stratosphere system [2], where Giraph
> did kind of well for small datasets, but again crashed for larger ones...
>
> I think the lack of out-of-core messages is currently the biggest
> obstacle to recommending people to test Giraph in production use.
>
> Best,
> Sebastian
>
>
> [1] http://konect.uni-koblenz.de/networks/twitter
> [2] http://www.stratosphere.eu/



-- 
   Claudio Martella
   claudio.martella@gmail.com