You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Björn-Elmar Macek <ma...@cs.uni-kassel.de> on 2012/05/22 11:42:24 UTC

Hadoop Debugging in LocalMode (Breakpoints not reached)

Hi there,


i am currently trying to get rid of bugs in my Hadoop program by 
debugging it. Everything went fine til some point yesterday. I dont know 
what exactly happened, but my program does not stop at breakpoints 
within the Reducer and also not within the RawComparator for the values 
which i do use for sorting my inputs in the ReducerIterator.
(see the classes set for the conf below:)

conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);

The log looks like this:

Warning: $HADOOP_HOME is deprecated.

Listening for transport dt_socket at address: 5002

12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop 
library

12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.

12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded

12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
process : 2

12/05/21 19:24:20 WARN conf.Configuration: 
file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
attempt to override final parameter: fs.default.name;Ignoring.

12/05/21 19:24:20 WARN conf.Configuration: 
file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
attempt to override final parameter: mapred.job.tracker;Ignoring.

12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001

12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0

12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c

12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%

12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output

12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0

12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 
is done. And is in the process of commiting

12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
file:/home/ema/INPUT-H/tweets_ext:0+968

12/05/21 19:24:23 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' 
done.

12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585

12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output

12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 
is done. And is in the process of commiting

12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%

12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
file:/home/ema/INPUT-H/tweets~:0+0

12/05/21 19:24:26 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' 
done.

12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c

12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
MemoryLimit=709551680, MaxSingleShuffleLimit=177387920

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Need another 2 map output(s) where 0 is already in progress

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for merging on-disk files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread waiting: Thread for merging on-disk files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Scheduled 0 outputs (0 slow hosts and0 dup hosts)

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for merging in memory files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for polling Map Completion Events

12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >

... etc ...

Is there something i have missed?

Thanks for your help in advance!

Best regards,
Björn-Elmar



Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.
Although the reactions did not give me the feeling there was much 
interest in my case, i have found a "solution" and some reasons for my 
problem. You might be interested in the discussion on Stackoverflow:
http://stackoverflow.com/questions/10720132/hadoop-reducer-is-waiting-for-mapper-inputs


Am 23.05.2012 10:47, schrieb Björn-Elmar Macek:
> Ok, i have look at the logs some further and googled every tiny bit of 
> them, hoping to find an answer out there.
> I fear that the following line nails my problem at a big scale:
>
> 12/05/22 01:30:21 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Need another 2 map output(s)
> where 0 is already in progress
>
> I found several discussions to problems, that also had this line in 
> their logs. I have checked my code for the following:
>
> * All inputs are collected in the mapper (tho not all would be neccessary)
> * The Comparators run well and return proper values for all inputs
> * The Partitioner always returns proper values
>
> Please, i would really need a hint, to where i have to look.
> Am 22.05.2012 16:57, schrieb Björn-Elmar Macek:
>> Hi Jayaseelan,
>>
>> thanks for the bump! ;)
>>
>> I have continued working on the problem, but with no further success. 
>> I emptied the log directory and started the debugging all over again, 
>> resulting in no new logfiles, so i guess the program did not run into 
>> serious problems. Also all the code other classes, namely ...
>>
>> * Mapper
>> * Partitioner
>> * OutputKeyComparatorClass
>>
>> is executed and can easily be debugged. Stil the Reducer and the 
>> OutputValueGroupingComparator do NOT work. After the execution of the 
>> comparisons made by OutputKeyComparatorClass i get alot of active 
>> processes in my debugging view in eclipse:
>>
>> OpenJDK Client VM[localhost:5002]
>>     Thread [main] (Running)
>>     Thread [Thread-2] (Running)
>>     Daemon Thread [communication thread] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
>>     Daemon Thread [Thread for merging in memory files] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
>>     Daemon Thread [Thread for merging on-disk files] (Running)
>>     Daemon Thread [Thread for polling Map Completion Events] (Running)
>>
>> And those processes are running, but obviously waiting for something, 
>> since no output is produced. And it is not due to the havy load of 
>> input data, since this is a 10 line csv file, which shouldnt make any 
>> problems.
>>
>> I somehow have the feeling that the framework cannot handle my 
>> classes, but i dont understand why.
>>
>> I would really appreciate a decent hint, how to fix that.
>>
>> Thanks you for your time and help!
>> Björn-Elmar
>> Am 22.05.2012 12:38, schrieb Jayaseelan E:
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
>>> *Sent:* Tuesday, May 22, 2012 3:12 PM
>>> *To:* hdfs-user@hadoop.apache.org
>>> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>>>
>>> Hi there,
>>>
>>>
>>> i am currently trying to get rid of bugs in my Hadoop program by 
>>> debugging it. Everything went fine til some point yesterday. I dont 
>>> know what exactly happened, but my program does not stop at 
>>> breakpoints within the Reducer and also not within the RawComparator 
>>> for the values which i do use for sorting my inputs in the 
>>> ReducerIterator.
>>> (see the classes set for the conf below:)
>>>
>>> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
>>> conf.setReducerClass(RetweetReducer.class);
>>>
>>> The log looks like this:
>>>
>>> Warning: $HADOOP_HOME is deprecated.
>>>
>>> Listening for transport dt_socket at address: 5002
>>>
>>> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the 
>>> native-hadoop library
>>>
>>> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser 
>>> for parsing the arguments. Applications should implement Tool for 
>>> the same.
>>>
>>> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not 
>>> loaded
>>>
>>> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
>>> process : 2
>>>
>>> 12/05/21 19:24:20 WARN conf.Configuration: 
>>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>>> attempt to override final parameter: fs.default.name;Ignoring.
>>>
>>> 12/05/21 19:24:20 WARN conf.Configuration: 
>>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>>> attempt to override final parameter: mapred.job.tracker;Ignoring.
>>>
>>> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>>>
>>> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>>>
>>> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>>>
>>> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>>>
>>> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>>>
>>> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>>>
>>> 12/05/21 19:24:22 INFO mapred.Task: 
>>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>>> commiting
>>>
>>> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
>>> file:/home/ema/INPUT-H/tweets_ext:0+968
>>>
>>> 12/05/21 19:24:23 INFO mapred.Task: Task 
>>> 'attempt_local_0001_m_000000_0' done.
>>>
>>> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>>>
>>> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>>>
>>> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>>>
>>> 12/05/21 19:24:24 INFO mapred.Task: 
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of 
>>> commiting
>>>
>>> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>>>
>>> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
>>> file:/home/ema/INPUT-H/tweets~:0+0
>>>
>>> 12/05/21 19:24:26 INFO mapred.Task: Task 
>>> 'attempt_local_0001_m_000001_0' done.
>>>
>>> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>>>
>>> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
>>> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 
>>> is already in progress
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
>>> on-disk files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
>>> on-disk files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
>>> dup hosts)
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
>>> memory files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
>>> Completion Events
>>>
>>> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> ... etc ...
>>>
>>> Is there something i have missed?
>>>
>>> Thanks for your help in advance!
>>>
>>> Best regards,
>>> Björn-Elmar
>>>
>>>
>>
>


Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.
Ok, i have look at the logs some further and googled every tiny bit of 
them, hoping to find an answer out there.
I fear that the following line nails my problem at a big scale:

12/05/22 01:30:21 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Need another 2 map output(s)
where 0 is already in progress

I found several discussions to problems, that also had this line in 
their logs. I have checked my code for the following:

* All inputs are collected in the mapper (tho not all would be neccessary)
* The Comparators run well and return proper values for all inputs
* The Partitioner always returns proper values

Please, i would really need a hint, to where i have to look.
Am 22.05.2012 16:57, schrieb Björn-Elmar Macek:
> Hi Jayaseelan,
>
> thanks for the bump! ;)
>
> I have continued working on the problem, but with no further success. 
> I emptied the log directory and started the debugging all over again, 
> resulting in no new logfiles, so i guess the program did not run into 
> serious problems. Also all the code other classes, namely ...
>
> * Mapper
> * Partitioner
> * OutputKeyComparatorClass
>
> is executed and can easily be debugged. Stil the Reducer and the 
> OutputValueGroupingComparator do NOT work. After the execution of the 
> comparisons made by OutputKeyComparatorClass i get alot of active 
> processes in my debugging view in eclipse:
>
> OpenJDK Client VM[localhost:5002]
>     Thread [main] (Running)
>     Thread [Thread-2] (Running)
>     Daemon Thread [communication thread] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
>     Daemon Thread [Thread for merging in memory files] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
>     Daemon Thread [Thread for merging on-disk files] (Running)
>     Daemon Thread [Thread for polling Map Completion Events] (Running)
>
> And those processes are running, but obviously waiting for something, 
> since no output is produced. And it is not due to the havy load of 
> input data, since this is a 10 line csv file, which shouldnt make any 
> problems.
>
> I somehow have the feeling that the framework cannot handle my 
> classes, but i dont understand why.
>
> I would really appreciate a decent hint, how to fix that.
>
> Thanks you for your time and help!
> Björn-Elmar
> Am 22.05.2012 12:38, schrieb Jayaseelan E:
>>
>> ------------------------------------------------------------------------
>> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
>> *Sent:* Tuesday, May 22, 2012 3:12 PM
>> *To:* hdfs-user@hadoop.apache.org
>> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>>
>> Hi there,
>>
>>
>> i am currently trying to get rid of bugs in my Hadoop program by 
>> debugging it. Everything went fine til some point yesterday. I dont 
>> know what exactly happened, but my program does not stop at 
>> breakpoints within the Reducer and also not within the RawComparator 
>> for the values which i do use for sorting my inputs in the 
>> ReducerIterator.
>> (see the classes set for the conf below:)
>>
>> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
>> conf.setReducerClass(RetweetReducer.class);
>>
>> The log looks like this:
>>
>> Warning: $HADOOP_HOME is deprecated.
>>
>> Listening for transport dt_socket at address: 5002
>>
>> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the 
>> native-hadoop library
>>
>> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
>> parsing the arguments. Applications should implement Tool for the same.
>>
>> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not 
>> loaded
>>
>> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
>> process : 2
>>
>> 12/05/21 19:24:20 WARN conf.Configuration: 
>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>> attempt to override final parameter: fs.default.name;Ignoring.
>>
>> 12/05/21 19:24:20 WARN conf.Configuration: 
>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>> attempt to override final parameter: mapred.job.tracker;Ignoring.
>>
>> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>>
>> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>>
>> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>>
>> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>>
>> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>>
>> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>>
>> 12/05/21 19:24:22 INFO mapred.Task: 
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>> commiting
>>
>> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
>> file:/home/ema/INPUT-H/tweets_ext:0+968
>>
>> 12/05/21 19:24:23 INFO mapred.Task: Task 
>> 'attempt_local_0001_m_000000_0' done.
>>
>> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>>
>> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>>
>> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>>
>> 12/05/21 19:24:24 INFO mapred.Task: 
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of 
>> commiting
>>
>> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>>
>> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
>> file:/home/ema/INPUT-H/tweets~:0+0
>>
>> 12/05/21 19:24:26 INFO mapred.Task: Task 
>> 'attempt_local_0001_m_000001_0' done.
>>
>> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>>
>> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
>> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is 
>> already in progress
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
>> on-disk files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
>> on-disk files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
>> dup hosts)
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
>> memory files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
>> Completion Events
>>
>> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> ... etc ...
>>
>> Is there something i have missed?
>>
>> Thanks for your help in advance!
>>
>> Best regards,
>> Björn-Elmar
>>
>>
>


Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.
Hi Jayaseelan,

thanks for the bump! ;)

I have continued working on the problem, but with no further success. I 
emptied the log directory and started the debugging all over again, 
resulting in no new logfiles, so i guess the program did not run into 
serious problems. Also all the code other classes, namely ...

* Mapper
* Partitioner
* OutputKeyComparatorClass

is executed and can easily be debugged. Stil the Reducer and the 
OutputValueGroupingComparator do NOT work. After the execution of the 
comparisons made by OutputKeyComparatorClass i get alot of active 
processes in my debugging view in eclipse:

OpenJDK Client VM[localhost:5002]
     Thread [main] (Running)
     Thread [Thread-2] (Running)
     Daemon Thread [communication thread] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
     Daemon Thread [Thread for merging in memory files] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
     Daemon Thread [Thread for merging on-disk files] (Running)
     Daemon Thread [Thread for polling Map Completion Events] (Running)

And those processes are running, but obviously waiting for something, 
since no output is produced. And it is not due to the havy load of input 
data, since this is a 10 line csv file, which shouldnt make any problems.

I somehow have the feeling that the framework cannot handle my classes, 
but i dont understand why.

I would really appreciate a decent hint, how to fix that.

Thanks you for your time and help!
Björn-Elmar
Am 22.05.2012 12:38, schrieb Jayaseelan E:
>
> ------------------------------------------------------------------------
> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
> *Sent:* Tuesday, May 22, 2012 3:12 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>
> Hi there,
>
>
> i am currently trying to get rid of bugs in my Hadoop program by 
> debugging it. Everything went fine til some point yesterday. I dont 
> know what exactly happened, but my program does not stop at 
> breakpoints within the Reducer and also not within the RawComparator 
> for the values which i do use for sorting my inputs in the 
> ReducerIterator.
> (see the classes set for the conf below:)
>
> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
> conf.setReducerClass(RetweetReducer.class);
>
> The log looks like this:
>
> Warning: $HADOOP_HOME is deprecated.
>
> Listening for transport dt_socket at address: 5002
>
> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
>
> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
>
> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded
>
> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
> process : 2
>
> 12/05/21 19:24:20 WARN conf.Configuration: 
> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
> attempt to override final parameter: fs.default.name;Ignoring.
>
> 12/05/21 19:24:20 WARN conf.Configuration: 
> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
> attempt to override final parameter: mapred.job.tracker;Ignoring.
>
> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>
> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>
> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>
> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>
> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>
> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>
> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>
> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>
> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>
> 12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 
> is done. And is in the process of commiting
>
> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
> file:/home/ema/INPUT-H/tweets_ext:0+968
>
> 12/05/21 19:24:23 INFO mapred.Task: Task 
> 'attempt_local_0001_m_000000_0' done.
>
> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>
> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>
> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>
> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>
> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>
> 12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 
> is done. And is in the process of commiting
>
> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>
> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
> file:/home/ema/INPUT-H/tweets~:0+0
>
> 12/05/21 19:24:26 INFO mapred.Task: Task 
> 'attempt_local_0001_m_000001_0' done.
>
> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>
> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is 
> already in progress
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
> on-disk files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
> on-disk files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
> dup hosts)
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
> memory files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
> Completion Events
>
> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>
> ... etc ...
>
> Is there something i have missed?
>
> Thanks for your help in advance!
>
> Best regards,
> Björn-Elmar
>
>


RE: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Jayaseelan E <ja...@ericsson.com>.

________________________________
From: Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
Sent: Tuesday, May 22, 2012 3:12 PM
To: hdfs-user@hadoop.apache.org
Subject: Hadoop Debugging in LocalMode (Breakpoints not reached)

Hi there,


i am currently trying to get rid of bugs in my Hadoop program by debugging it. Everything went fine til some point yesterday. I dont know what exactly happened, but my program does not stop at breakpoints within the Reducer and also not within the RawComparator for the values which i do use for sorting my inputs in the ReducerIterator.
(see the classes set for the conf below:)

conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);

The log looks like this:

Warning: $HADOOP_HOME is deprecated.
Listening for transport dt_socket at address: 5002
12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded
12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to process : 2
12/05/21 19:24:20 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/21 19:24:20 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
12/05/21 19:24:21 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
12/05/21 19:24:22 INFO mapred.JobClient:  map 0% reduce 0%
12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/05/21 19:24:23 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets_ext:0+968
12/05/21 19:24:23 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/05/21 19:24:23 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/05/21 19:24:24 INFO mapred.JobClient:  map 100% reduce 0%
12/05/21 19:24:26 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets~:0+0
12/05/21 19:24:26 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/05/21 19:24:26 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is already in progress
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging on-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread waiting: Thread for merging on-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging in memory files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for polling Map Completion Events
12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
... etc ...

Is there something i have missed?

Thanks for your help in advance!

Best regards,
Björn-Elmar