You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by Charith Wickramarachchi <ch...@gmail.com> on 2014/11/11 19:26:07 UTC

Giraph Job Get Killed suddenly

Hi Devs,

I am sending this mail to the dev list since I think Giraph developers
might have experienced the issue I am facing.

I am working on extending graph to support a programming model somewhat
similar to giraph++. I got an initial POC version running with in my local
machine in a pseudo distributed mode. But when I run with large graphs in a
cluster, suddenly the map reduce job get killed.

This is because, suddenly the job receives a kill signal. I am still not
sure about what's the root cause.  My hunch is that it has something to do
with progress reporting from mappers. I am attaching part of the log that
might be helpful.

It will be great if you can give me some insights based on your experience.

Giraph Version: 1.1.0
Hadoop version: 2.2.0
Application Type: Map Reduce

Thanks,
Charith

-- 
Charith Dhanushka Wickramaarachchi

Tel  +1 213 447 4253
Web  http://apache.org/~charith <http://www-scf.usc.edu/~cwickram/>
<http://charith.wickramaarachchi.org/>
Blog  http://charith.wickramaarachchi.org/
<http://charithwiki.blogspot.com/>
Twitter  @charithwiki <https://twitter.com/charithwiki>

This communication may contain privileged or other confidential information
and is intended exclusively for the addressee/s. If you are not the
intended recipient/s, or believe that you may have
received this communication in error, please reply to the sender indicating
that fact and delete the copy you received and in addition, you should not
print, copy, retransmit, disseminate, or otherwise use the information
contained in this communication. Internet communications cannot be
guaranteed to be timely, secure, error or virus-free. The sender does not
accept liability for any errors or omissions

Re: Giraph Job Get Killed suddenly

Posted by Charith Wickramarachchi <ch...@gmail.com>.

Thanks for the quick replies.

I did some digging into the logs. It seems like it's due to a GC Overhead
limit exceeded Exception. I think it might be due to some unnessory overheads
in my implementation.  I will  optimize my code to avoid this.

2014-11-11 10:34:22,482 INFO [main] org.apache.giraph.graph.GraphTaskManager:
execute: 8 partitions to process with 1 compute thread(s), originally 1
thread(s) on superstep 0
2014-11-11 10:34:38,266 WARN [netty-client-exec-0]
io.netty.util.concurrent.SingleThreadEventExecutor: Unexpected exception
from an event executor:
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.concurrent.locks.
AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(
AbstractQueuedSynchronizer.java:1857)
        at java.util.concurrent.locks.
AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
AbstractQueuedSynchronizer.java:2073)
        at java.util.concurrent.LinkedBlockingQueue.poll(
LinkedBlockingQueue.java:467)
        at
io.netty.util.concurrent.SingleThreadEventExecutor.takeTask(SingleThreadEventExecutor.java:219)
        at
io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:34)
        at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
        at java.lang.Thread.run(Thread.java:722)


I tried setting mapred.child.java.opts option but then job failed giving
following error.

2014-11-11 11:04:45,780 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1415143619219_0009_m_000004_0:
Container [pid=28984,containerID=container_1415143619219_0009_01_000006]
is running beyond virtual memory limits. Current usage: 156.3 MB of 1
GB physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing
container.


Thanks,

Charith







:



On Tue, Nov 11, 2014 at 10:57 AM, Unmesh Joshi <un...@gmail.com>
wrote:

> Try increasing the memory with -Xmx1024m option.
> 1024 can be replaced with the memory availability and choice. This should
> be set to  mapred.child.java.opts
>
>
>
>
>    Regards,
>    Unmesh Joshi
>
>
> On 11 November 2014 10:26, Charith Wickramarachchi <
> charith.dhanushka@gmail.com> wrote:
>
> > Hi Devs,
> >
> > I am sending this mail to the dev list since I think Giraph developers
> > might have experienced the issue I am facing.
> >
> > I am working on extending graph to support a programming model somewhat
> > similar to giraph++. I got an initial POC version running with in my
> local
> > machine in a pseudo distributed mode. But when I run with large graphs in
> > a cluster, suddenly the map reduce job get killed.
> >
> > This is because, suddenly the job receives a kill signal. I am still not
> > sure about what's the root cause.  My hunch is that it has something to
> > do with progress reporting from mappers. I am attaching part of the log
> > that might be helpful.
> >
> > It will be great if you can give me some insights based on your
> > experience.
> >
> > Giraph Version: 1.1.0
> > Hadoop version: 2.2.0
> > Application Type: Map Reduce
> >
> > Thanks,
> > Charith
> >
> > --
> > Charith Dhanushka Wickramaarachchi
> >
> > Tel  +1 213 447 4253
> > Web  http://apache.org/~charith <http://www-scf.usc.edu/~cwickram/>
> > <http://charith.wickramaarachchi.org/>
> > Blog  http://charith.wickramaarachchi.org/
> > <http://charithwiki.blogspot.com/>
> > Twitter  @charithwiki <https://twitter.com/charithwiki>
> >
> > This communication may contain privileged or other confidential
> information
> > and is intended exclusively for the addressee/s. If you are not the
> > intended recipient/s, or believe that you may have
> > received this communication in error, please reply to the sender
> indicating
> > that fact and delete the copy you received and in addition, you should
> > not print, copy, retransmit, disseminate, or otherwise use the
> > information contained in this communication. Internet communications
> > cannot be guaranteed to be timely, secure, error or virus-free. The
> > sender does not accept liability for any errors or omissions
> >
>



-- 
Charith Dhanushka Wickramaarachchi

Tel  +1 213 447 4253
Web  http://apache.org/~charith <http://www-scf.usc.edu/~cwickram/>
<http://charith.wickramaarachchi.org/>
Blog  http://charith.wickramaarachchi.org/
<http://charithwiki.blogspot.com/>
Twitter  @charithwiki <https://twitter.com/charithwiki>

This communication may contain privileged or other confidential information
and is intended exclusively for the addressee/s. If you are not the
intended recipient/s, or believe that you may have
received this communication in error, please reply to the sender indicating
that fact and delete the copy you received and in addition, you should not
print, copy, retransmit, disseminate, or otherwise use the information
contained in this communication. Internet communications cannot be
guaranteed to be timely, secure, error or virus-free. The sender does not
accept liability for any errors or omissions

Re: Giraph Job Get Killed suddenly

Posted by Unmesh Joshi <un...@gmail.com>.

Try increasing the memory with -Xmx1024m option.
1024 can be replaced with the memory availability and choice. This should
be set to  mapred.child.java.opts




   Regards,
   Unmesh Joshi


On 11 November 2014 10:26, Charith Wickramarachchi <
charith.dhanushka@gmail.com> wrote:

> Hi Devs,
>
> I am sending this mail to the dev list since I think Giraph developers
> might have experienced the issue I am facing.
>
> I am working on extending graph to support a programming model somewhat
> similar to giraph++. I got an initial POC version running with in my local
> machine in a pseudo distributed mode. But when I run with large graphs in
> a cluster, suddenly the map reduce job get killed.
>
> This is because, suddenly the job receives a kill signal. I am still not
> sure about what's the root cause.  My hunch is that it has something to
> do with progress reporting from mappers. I am attaching part of the log
> that might be helpful.
>
> It will be great if you can give me some insights based on your
> experience.
>
> Giraph Version: 1.1.0
> Hadoop version: 2.2.0
> Application Type: Map Reduce
>
> Thanks,
> Charith
>
> --
> Charith Dhanushka Wickramaarachchi
>
> Tel  +1 213 447 4253
> Web  http://apache.org/~charith <http://www-scf.usc.edu/~cwickram/>
> <http://charith.wickramaarachchi.org/>
> Blog  http://charith.wickramaarachchi.org/
> <http://charithwiki.blogspot.com/>
> Twitter  @charithwiki <https://twitter.com/charithwiki>
>
> This communication may contain privileged or other confidential information
> and is intended exclusively for the addressee/s. If you are not the
> intended recipient/s, or believe that you may have
> received this communication in error, please reply to the sender indicating
> that fact and delete the copy you received and in addition, you should
> not print, copy, retransmit, disseminate, or otherwise use the
> information contained in this communication. Internet communications
> cannot be guaranteed to be timely, secure, error or virus-free. The
> sender does not accept liability for any errors or omissions
>

Re: Giraph Job Get Killed suddenly

Posted by Arghya Kusum Das <ar...@gmail.com>.

Can you try adding mapreduce.jobtracker.address property in mapred-site.xml.... 

If that does not fix the issue, try connecting giraph with external zookeeper using giraph.zklist property in your giraph job. Because, giraph cannot work without external zklist on hadoop2

Sent from my iPhone

> On Nov 11, 2014, at 12:26 PM, Charith Wickramarachchi <ch...@gmail.com> wrote:
> 
> Hi Devs, 
> 
> I am sending this mail to the dev list since I think Giraph developers might have experienced the issue I am facing. 
> 
> I am working on extending graph to support a programming model somewhat similar to giraph++. I got an initial POC version running with in my local machine in a pseudo distributed mode. But when I run with large graphs in a cluster, suddenly the map reduce job get killed. 
> 
> This is because, suddenly the job receives a kill signal. I am still not sure about what's the root cause.  My hunch is that it has something to do with progress reporting from mappers. I am attaching part of the log that might be helpful. 
> 
> It will be great if you can give me some insights based on your experience. 
> 
> Giraph Version: 1.1.0
> Hadoop version: 2.2.0
> Application Type: Map Reduce
> 
> Thanks,
> Charith
> 
> -- 
> Charith Dhanushka Wickramaarachchi
> 
> Tel  +1 213 447 4253
> Web  http://apache.org/~charith
> Blog  http://charith.wickramaarachchi.org/
> Twitter  @charithwiki
> 
> This communication may contain privileged or other confidential information and is intended exclusively for the addressee/s. If you are not the intended recipient/s, or believe that you may have
> received this communication in error, please reply to the sender indicating that fact and delete the copy you received and in addition, you should not print, copy, retransmit, disseminate, or otherwise use the information contained in this communication. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions
> <hadoop-job-kill.txt>