You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Andrew Munsell <an...@wizardapps.net> on 2014/09/12 00:10:53 UTC

Lockup During Edge Saving

Now that I have the loading and computation completing
successfully, I am having issues when saving the edges back to
disk. During the saving step, the machines will get to ~1-2
partitions before the cluster freezes up entirely (as in, I
can't even SSH into the machine or view the Hadoop web
console).



As in my message before, I have about 1.3 billion edges total
(600 million undirected, converted using the reverser) and a
cluster of 19 machines, each with 8 cores and 60 GB of RAM.



I am also using a custom linked-list based OutEdges class
because of the computation's high number of mutations of edge
values (the byte array/big data byte array was not efficient
for this use case).



The specific computation I am running has three supersteps (0,
1, 2), and during supersteps 1 and 2 there is extremely high
RAM usage (~97%), but the steps do complete. During saving this
high RAM usage is maintained and does not increase
significantly until the cluster freezes up.



When saving the edges (I am using a custom edge output format
as well, that is basically a CSV), are they flushed to disk
immediately/in batches or is the entire output file held in
memory before being flushed? If the latter, this seems like it
might cause the same sort of behavior I see. Also, if this is
the case, is there a way this can be changed?



If this doesn't seem like the issue, does anyone have any ideas
what may be causing the lockup?



Thanks in advance!



--
Andrew

Re: Lockup During Edge Saving

Posted by Claudio Martella <cl...@gmail.com>.

Looks like you're going out of memory, and it looks like it's your output
format's fault. As you're using a CSV, I feel you're trying to build a
single String line for each vertex, before you write it to HDFS. For
vertices with many edges this string might get big, which could be one of
your problems. Could you test it with a standard edge output format first?

On Fri, Sep 12, 2014 at 12:10 AM, Andrew Munsell <an...@wizardapps.net>
wrote:

>  Now that I have the loading and computation completing successfully, I
> am having issues when saving the edges back to disk. During the saving
> step, the machines will get to ~1-2 partitions before the cluster freezes
> up entirely (as in, I can't even SSH into the machine or view the Hadoop
> web console).
>
> As in my message before, I have about 1.3 billion edges total (600 million
> undirected, converted using the reverser) and a cluster of 19 machines,
> each with 8 cores and 60 GB of RAM.
>
> I am also using a custom linked-list based OutEdges class because of the
> computation's high number of mutations of edge values (the byte array/big
> data byte array was not efficient for this use case).
>
> The specific computation I am running has three supersteps (0, 1, 2), and
> during supersteps 1 and 2 there is extremely high RAM usage (~97%), but the
> steps do complete. During saving this high RAM usage is maintained and does
> not increase significantly until the cluster freezes up.
>
> When saving the edges (I am using a custom edge output format as well,
> that is basically a CSV), are they flushed to disk immediately/in batches
> or is the entire output file held in memory before being flushed? If the
> latter, this seems like it might cause the same sort of behavior I see.
> Also, if this is the case, is there a way this can be changed?
>
> If this doesn't seem like the issue, does anyone have any ideas what may
> be causing the lockup?
>
> Thanks in advance!
>
> --
> Andrew
>
>
>



-- 
   Claudio Martella