You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Frank Astier <fa...@yahoo-inc.com> on 2011/09/15 21:51:48 UTC

Debugging mapper

Hi -

I’m using IntelliJ and the WordCount example in Hadoop (which uses MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint straight into the map function of the mapper? - I’ve tried, but so far, the debugger does not stop at the breakpoint.

Thanks!

Frank

Re: Debugging mapper

Posted by John Armstrong <jo...@ccri.com>.
On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier <fa...@yahoo-inc.com>
wrote:
> I’m using IntelliJ and the WordCount example in Hadoop (which uses
> MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint
> straight into the map function of the mapper? - I’ve tried, but so far,
the
> debugger does not stop at the breakpoint.

The problem is that the mapper is being run in a different JVM than the
one you launch.  Here's what I do (using IntelliJ on cluster nodes running
Ubuntu):

add the following lines to your configuration.xml file: 

    <property>
        <name>mapred.map.child.java.opts</name>
        <value>-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y</value>
    </property>

This adds the JPDA debugging listener to your mappers' JVMs.  There's a
similar property for reducers.

Now to connect, you first need to find them.  You can't specify an address
as usual, since your task JVMs would all be trying to use the same one.  If
there's a particular reducer you want to connect to, use the jobtracker
(:50030) to find which cluster node it's running on and ssh to that one. 
Otherwise, just connect to any cluster node you want.  Then run

    ps awfx | grep debug | awk '{print $1}'

which will give you a whole bunch of process ids whose calls contain the
string "debug".  Some of these are your task JVMs!  Anyway, pick one -- say
it's 2317 -- and run

    sudo netstat -ap | grep 2317

which will tell you what port the task is waiting and listening on.

NOW you can go back into your IntelliJ and configure a remote debugger. 
Tell it to connect to the host you were sshed into, at the port you just
found.  Set your breakpoint; connect your debugger; and you're good to go.

Oh, and by default you've only got about 10 minutes to get this done
before your jobtracker decides that your task node is dead and kills it. 
Set the mapreduce.task.timeout higher if you want to have more time to
work.

hth

Re: Debugging mapper

Posted by Joey Echeverria <jo...@cloudera.com>.
You might also want to look into MRUnit[1]. It lets you mock the
behavior of the framework to test your map and reduce classes in
isolation. Can't discover all bugs, but a useful tool and works nicely
with IDE debuggers.

-Joey

[1] http://incubator.apache.org/mrunit/

On Thu, Sep 15, 2011 at 3:51 PM, Frank Astier <fa...@yahoo-inc.com> wrote:
> Hi -
>
> I’m using IntelliJ and the WordCount example in Hadoop (which uses MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint straight into the map function of the mapper? - I’ve tried, but so far, the debugger does not stop at the breakpoint.
>
> Thanks!
>
> Frank
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434