You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by psdc1978 <ps...@gmail.com> on 2010/05/01 17:43:00 UTC

Re: How to debug reducer thread?

Hi,

I really need to debug the threads the ReduceTask will launch, and not using
unit tests. The reason is that I'm seeing what's happening in the ReduceTask
to do some changes to the code for myself. So, I was trying to debug the
ReduceTask setting the following in mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>local</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

But I can't start mapred, it gives me the error:
2010-05-01 17:35:35,155 FATAL org.apache.hadoop.mapred.JobTracker:3720
java.lang.RuntimeException: Not a host:port pair: local
        at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
        at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
        at
org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1794)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1581)
        at
org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:179)
        at
org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:171)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3717)


I haven't set the fs.default.name parameter, because I will use HDFS and not
the local filesystem.

So, how can I solve this problem?

Thanks,
PSC


On Wed, Apr 28, 2010 at 4:51 AM, Eric Sammer <es...@cloudera.com> wrote:

> If you want to step through a full map / reduce job, the easiest way
> to do this is to run a job using the local job runner in your IDE. The
> local job runner will run the MR job in a single thread making it very
> easy to debug. You will want to use the local file system and a small
> amount of data during this type of testing / debugging. Note that the
> local job runner runs map tasks, sort and shuffle, and reducers
> sequentially with no parallelism.
>
> Set the following properties to enable the local job runner and local
> file system:
>
> mapred.job.tracker = local
> fs.default.name = file:///
>
> Attempting to attach a debugger to a real task tracker is problematic
> because user code is run in separate jvms, etc. It's almost never
> worth it. Most debugging (with a real debugger) is better done using
> MRUnit and the local job runner.
>
> Hope this helps and good luck.
>
> On Tue, Apr 27, 2010 at 7:27 AM, psdc1978 <ps...@gmail.com> wrote:
> > Hi,
> >
> > The reduce tasks are threads that are launched by the Reducer. The print
> > below shows the stacktrace of one reduce task.
> >
> > at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchHashesOutputs(ReduceTask.java:2582)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:395)
> > at org.apache.hadoop.mapred.Child.main(Child.java:194)
> >
> > I would like to debug this thread in a IDE but I don't know how to do it.
> > Should I define properties to do this? Is there a way to do it?
> >
> > Thanks
> >
> > --
> > PSC
> >
>
>
>
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>



-- 
Pedro

Re: How to debug reducer thread?

Posted by psdc1978 <ps...@gmail.com>.
I've other idea that I don't know how to do it. Is it possible to set Xdebug
parameter to the ReduceTask that is instanced by a JVM of the MapRed? If
it's possible, I could connect the debugger to that thread, right?

On Sat, May 1, 2010 at 4:43 PM, psdc1978 <ps...@gmail.com> wrote:

> Hi,
>
> I really need to debug the threads the ReduceTask will launch, and not
> using unit tests. The reason is that I'm seeing what's happening in the
> ReduceTask to do some changes to the code for myself. So, I was trying to
> debug the ReduceTask setting the following in mapred-site.xml
>
> <property>
>   <name>mapred.job.tracker</name>
>   <value>local</value>
>   <description>The host and port that the MapReduce job tracker runs
>   at.  If "local", then jobs are run in-process as a single map
>   and reduce task.
>   </description>
> </property>
>
> But I can't start mapred, it gives me the error:
> 2010-05-01 17:35:35,155 FATAL org.apache.hadoop.mapred.JobTracker:3720
> java.lang.RuntimeException: Not a host:port pair: local
>         at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
>         at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
>         at
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1794)
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1581)
>         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:179)
>         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:171)
>         at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3717)
>
>
> I haven't set the fs.default.name parameter, because I will use HDFS and
> not the local filesystem.
>
> So, how can I solve this problem?
>
> Thanks,
> PSC
>
>
> On Wed, Apr 28, 2010 at 4:51 AM, Eric Sammer <es...@cloudera.com> wrote:
>
>> If you want to step through a full map / reduce job, the easiest way
>> to do this is to run a job using the local job runner in your IDE. The
>> local job runner will run the MR job in a single thread making it very
>> easy to debug. You will want to use the local file system and a small
>> amount of data during this type of testing / debugging. Note that the
>> local job runner runs map tasks, sort and shuffle, and reducers
>> sequentially with no parallelism.
>>
>> Set the following properties to enable the local job runner and local
>> file system:
>>
>> mapred.job.tracker = local
>> fs.default.name = file:///
>>
>> Attempting to attach a debugger to a real task tracker is problematic
>> because user code is run in separate jvms, etc. It's almost never
>> worth it. Most debugging (with a real debugger) is better done using
>> MRUnit and the local job runner.
>>
>> Hope this helps and good luck.
>>
>> On Tue, Apr 27, 2010 at 7:27 AM, psdc1978 <ps...@gmail.com> wrote:
>> > Hi,
>> >
>> > The reduce tasks are threads that are launched by the Reducer. The print
>> > below shows the stacktrace of one reduce task.
>> >
>> > at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchHashesOutputs(ReduceTask.java:2582)
>> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:395)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:194)
>> >
>> > I would like to debug this thread in a IDE but I don't know how to do
>> it.
>> > Should I define properties to do this? Is there a way to do it?
>> >
>> > Thanks
>> >
>> > --
>> > PSC
>> >
>>
>>
>>
>> --
>> Eric Sammer
>> phone: +1-917-287-2675
>> twitter: esammer
>> data: www.cloudera.com
>>
>
>
>
> --
> Pedro
>



-- 
Pedro