You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Martinus Martinus <ma...@gmail.com> on 2012/05/22 08:24:59 UTC

Hadoop HA

Hi,

Is there any hadoop HA distribution out there?

Thanks.

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.

Although the reactions did not give me the feeling there was much 
interest in my case, i have found a "solution" and some reasons for my 
problem. You might be interested in the discussion on Stackoverflow:
http://stackoverflow.com/questions/10720132/hadoop-reducer-is-waiting-for-mapper-inputs


Am 23.05.2012 10:47, schrieb Björn-Elmar Macek:
> Ok, i have look at the logs some further and googled every tiny bit of 
> them, hoping to find an answer out there.
> I fear that the following line nails my problem at a big scale:
>
> 12/05/22 01:30:21 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Need another 2 map output(s)
> where 0 is already in progress
>
> I found several discussions to problems, that also had this line in 
> their logs. I have checked my code for the following:
>
> * All inputs are collected in the mapper (tho not all would be neccessary)
> * The Comparators run well and return proper values for all inputs
> * The Partitioner always returns proper values
>
> Please, i would really need a hint, to where i have to look.
> Am 22.05.2012 16:57, schrieb Björn-Elmar Macek:
>> Hi Jayaseelan,
>>
>> thanks for the bump! ;)
>>
>> I have continued working on the problem, but with no further success. 
>> I emptied the log directory and started the debugging all over again, 
>> resulting in no new logfiles, so i guess the program did not run into 
>> serious problems. Also all the code other classes, namely ...
>>
>> * Mapper
>> * Partitioner
>> * OutputKeyComparatorClass
>>
>> is executed and can easily be debugged. Stil the Reducer and the 
>> OutputValueGroupingComparator do NOT work. After the execution of the 
>> comparisons made by OutputKeyComparatorClass i get alot of active 
>> processes in my debugging view in eclipse:
>>
>> OpenJDK Client VM[localhost:5002]
>>     Thread [main] (Running)
>>     Thread [Thread-2] (Running)
>>     Daemon Thread [communication thread] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
>>     Daemon Thread [Thread for merging in memory files] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
>>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
>>     Daemon Thread [Thread for merging on-disk files] (Running)
>>     Daemon Thread [Thread for polling Map Completion Events] (Running)
>>
>> And those processes are running, but obviously waiting for something, 
>> since no output is produced. And it is not due to the havy load of 
>> input data, since this is a 10 line csv file, which shouldnt make any 
>> problems.
>>
>> I somehow have the feeling that the framework cannot handle my 
>> classes, but i dont understand why.
>>
>> I would really appreciate a decent hint, how to fix that.
>>
>> Thanks you for your time and help!
>> Björn-Elmar
>> Am 22.05.2012 12:38, schrieb Jayaseelan E:
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
>>> *Sent:* Tuesday, May 22, 2012 3:12 PM
>>> *To:* hdfs-user@hadoop.apache.org
>>> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>>>
>>> Hi there,
>>>
>>>
>>> i am currently trying to get rid of bugs in my Hadoop program by 
>>> debugging it. Everything went fine til some point yesterday. I dont 
>>> know what exactly happened, but my program does not stop at 
>>> breakpoints within the Reducer and also not within the RawComparator 
>>> for the values which i do use for sorting my inputs in the 
>>> ReducerIterator.
>>> (see the classes set for the conf below:)
>>>
>>> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
>>> conf.setReducerClass(RetweetReducer.class);
>>>
>>> The log looks like this:
>>>
>>> Warning: $HADOOP_HOME is deprecated.
>>>
>>> Listening for transport dt_socket at address: 5002
>>>
>>> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the 
>>> native-hadoop library
>>>
>>> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser 
>>> for parsing the arguments. Applications should implement Tool for 
>>> the same.
>>>
>>> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not 
>>> loaded
>>>
>>> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
>>> process : 2
>>>
>>> 12/05/21 19:24:20 WARN conf.Configuration: 
>>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>>> attempt to override final parameter: fs.default.name;Ignoring.
>>>
>>> 12/05/21 19:24:20 WARN conf.Configuration: 
>>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>>> attempt to override final parameter: mapred.job.tracker;Ignoring.
>>>
>>> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>>>
>>> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>>>
>>> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>>>
>>> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>>>
>>> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>>>
>>> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>>>
>>> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>>>
>>> 12/05/21 19:24:22 INFO mapred.Task: 
>>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>>> commiting
>>>
>>> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
>>> file:/home/ema/INPUT-H/tweets_ext:0+968
>>>
>>> 12/05/21 19:24:23 INFO mapred.Task: Task 
>>> 'attempt_local_0001_m_000000_0' done.
>>>
>>> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>>>
>>> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>>>
>>> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>>>
>>> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>>>
>>> 12/05/21 19:24:24 INFO mapred.Task: 
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of 
>>> commiting
>>>
>>> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>>>
>>> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
>>> file:/home/ema/INPUT-H/tweets~:0+0
>>>
>>> 12/05/21 19:24:26 INFO mapred.Task: Task 
>>> 'attempt_local_0001_m_000001_0' done.
>>>
>>> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>>>
>>> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
>>> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 
>>> is already in progress
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
>>> on-disk files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
>>> on-disk files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
>>> dup hosts)
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
>>> memory files
>>>
>>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>>> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
>>> Completion Events
>>>
>>> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>>>
>>> ... etc ...
>>>
>>> Is there something i have missed?
>>>
>>> Thanks for your help in advance!
>>>
>>> Best regards,
>>> Björn-Elmar
>>>
>>>
>>
>

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.

Ok, i have look at the logs some further and googled every tiny bit of 
them, hoping to find an answer out there.
I fear that the following line nails my problem at a big scale:

12/05/22 01:30:21 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Need another 2 map output(s)
where 0 is already in progress

I found several discussions to problems, that also had this line in 
their logs. I have checked my code for the following:

* All inputs are collected in the mapper (tho not all would be neccessary)
* The Comparators run well and return proper values for all inputs
* The Partitioner always returns proper values

Please, i would really need a hint, to where i have to look.
Am 22.05.2012 16:57, schrieb Björn-Elmar Macek:
> Hi Jayaseelan,
>
> thanks for the bump! ;)
>
> I have continued working on the problem, but with no further success. 
> I emptied the log directory and started the debugging all over again, 
> resulting in no new logfiles, so i guess the program did not run into 
> serious problems. Also all the code other classes, namely ...
>
> * Mapper
> * Partitioner
> * OutputKeyComparatorClass
>
> is executed and can easily be debugged. Stil the Reducer and the 
> OutputValueGroupingComparator do NOT work. After the execution of the 
> comparisons made by OutputKeyComparatorClass i get alot of active 
> processes in my debugging view in eclipse:
>
> OpenJDK Client VM[localhost:5002]
>     Thread [main] (Running)
>     Thread [Thread-2] (Running)
>     Daemon Thread [communication thread] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
>     Daemon Thread [Thread for merging in memory files] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
>     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
>     Daemon Thread [Thread for merging on-disk files] (Running)
>     Daemon Thread [Thread for polling Map Completion Events] (Running)
>
> And those processes are running, but obviously waiting for something, 
> since no output is produced. And it is not due to the havy load of 
> input data, since this is a 10 line csv file, which shouldnt make any 
> problems.
>
> I somehow have the feeling that the framework cannot handle my 
> classes, but i dont understand why.
>
> I would really appreciate a decent hint, how to fix that.
>
> Thanks you for your time and help!
> Björn-Elmar
> Am 22.05.2012 12:38, schrieb Jayaseelan E:
>>
>> ------------------------------------------------------------------------
>> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
>> *Sent:* Tuesday, May 22, 2012 3:12 PM
>> *To:* hdfs-user@hadoop.apache.org
>> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>>
>> Hi there,
>>
>>
>> i am currently trying to get rid of bugs in my Hadoop program by 
>> debugging it. Everything went fine til some point yesterday. I dont 
>> know what exactly happened, but my program does not stop at 
>> breakpoints within the Reducer and also not within the RawComparator 
>> for the values which i do use for sorting my inputs in the 
>> ReducerIterator.
>> (see the classes set for the conf below:)
>>
>> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
>> conf.setReducerClass(RetweetReducer.class);
>>
>> The log looks like this:
>>
>> Warning: $HADOOP_HOME is deprecated.
>>
>> Listening for transport dt_socket at address: 5002
>>
>> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the 
>> native-hadoop library
>>
>> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
>> parsing the arguments. Applications should implement Tool for the same.
>>
>> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not 
>> loaded
>>
>> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
>> process : 2
>>
>> 12/05/21 19:24:20 WARN conf.Configuration: 
>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>> attempt to override final parameter: fs.default.name;Ignoring.
>>
>> 12/05/21 19:24:20 WARN conf.Configuration: 
>> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
>> attempt to override final parameter: mapred.job.tracker;Ignoring.
>>
>> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>>
>> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>>
>> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>>
>> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>>
>> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>>
>> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>>
>> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>>
>> 12/05/21 19:24:22 INFO mapred.Task: 
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>> commiting
>>
>> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
>> file:/home/ema/INPUT-H/tweets_ext:0+968
>>
>> 12/05/21 19:24:23 INFO mapred.Task: Task 
>> 'attempt_local_0001_m_000000_0' done.
>>
>> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>>
>> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>>
>> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>>
>> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>>
>> 12/05/21 19:24:24 INFO mapred.Task: 
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of 
>> commiting
>>
>> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>>
>> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
>> file:/home/ema/INPUT-H/tweets~:0+0
>>
>> 12/05/21 19:24:26 INFO mapred.Task: Task 
>> 'attempt_local_0001_m_000001_0' done.
>>
>> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>>
>> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
>> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is 
>> already in progress
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
>> on-disk files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
>> on-disk files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
>> dup hosts)
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
>> memory files
>>
>> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
>> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
>> Completion Events
>>
>> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>>
>> ... etc ...
>>
>> Is there something i have missed?
>>
>> Thanks for your help in advance!
>>
>> Best regards,
>> Björn-Elmar
>>
>>
>

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.

Hi Jayaseelan,

thanks for the bump! ;)

I have continued working on the problem, but with no further success. I 
emptied the log directory and started the debugging all over again, 
resulting in no new logfiles, so i guess the program did not run into 
serious problems. Also all the code other classes, namely ...

* Mapper
* Partitioner
* OutputKeyComparatorClass

is executed and can easily be debugged. Stil the Reducer and the 
OutputValueGroupingComparator do NOT work. After the execution of the 
comparisons made by OutputKeyComparatorClass i get alot of active 
processes in my debugging view in eclipse:

OpenJDK Client VM[localhost:5002]
     Thread [main] (Running)
     Thread [Thread-2] (Running)
     Daemon Thread [communication thread] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
     Daemon Thread [Thread for merging in memory files] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
     Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
     Daemon Thread [Thread for merging on-disk files] (Running)
     Daemon Thread [Thread for polling Map Completion Events] (Running)

And those processes are running, but obviously waiting for something, 
since no output is produced. And it is not due to the havy load of input 
data, since this is a 10 line csv file, which shouldnt make any problems.

I somehow have the feeling that the framework cannot handle my classes, 
but i dont understand why.

I would really appreciate a decent hint, how to fix that.

Thanks you for your time and help!
Björn-Elmar
Am 22.05.2012 12:38, schrieb Jayaseelan E:
>
> ------------------------------------------------------------------------
> *From:* Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
> *Sent:* Tuesday, May 22, 2012 3:12 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)
>
> Hi there,
>
>
> i am currently trying to get rid of bugs in my Hadoop program by 
> debugging it. Everything went fine til some point yesterday. I dont 
> know what exactly happened, but my program does not stop at 
> breakpoints within the Reducer and also not within the RawComparator 
> for the values which i do use for sorting my inputs in the 
> ReducerIterator.
> (see the classes set for the conf below:)
>
> conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
> conf.setReducerClass(RetweetReducer.class);
>
> The log looks like this:
>
> Warning: $HADOOP_HOME is deprecated.
>
> Listening for transport dt_socket at address: 5002
>
> 12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
>
> 12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
>
> 12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded
>
> 12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
> process : 2
>
> 12/05/21 19:24:20 WARN conf.Configuration: 
> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
> attempt to override final parameter: fs.default.name;Ignoring.
>
> 12/05/21 19:24:20 WARN conf.Configuration: 
> file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
> attempt to override final parameter: mapred.job.tracker;Ignoring.
>
> 12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
>
> 12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
>
> 12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
>
> 12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
>
> 12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
>
> 12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%
>
> 12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
>
> 12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
>
> 12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
>
> 12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 
> is done. And is in the process of commiting
>
> 12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
> file:/home/ema/INPUT-H/tweets_ext:0+968
>
> 12/05/21 19:24:23 INFO mapred.Task: Task 
> 'attempt_local_0001_m_000000_0' done.
>
> 12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
>
> 12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
>
> 12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
>
> 12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
>
> 12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
>
> 12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 
> is done. And is in the process of commiting
>
> 12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
>
> 12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
> file:/home/ema/INPUT-H/tweets~:0+0
>
> 12/05/21 19:24:26 INFO mapred.Task: Task 
> 'attempt_local_0001_m_000001_0' done.
>
> 12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
>
> 12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
> MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is 
> already in progress
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for merging 
> on-disk files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread waiting: Thread for merging 
> on-disk files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 
> dup hosts)
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for merging in 
> memory files
>
> 12/05/21 19:24:27 INFO mapred.ReduceTask: 
> attempt_local_0001_r_000000_0 Thread started: Thread for polling Map 
> Completion Events
>
> 12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
>
> 12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
>
> ... etc ...
>
> Is there something i have missed?
>
> Thanks for your help in advance!
>
> Best regards,
> Björn-Elmar
>
>

RE: Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Jayaseelan E <ja...@ericsson.com>.


________________________________
From: Björn-Elmar Macek [mailto:macek@cs.uni-kassel.de]
Sent: Tuesday, May 22, 2012 3:12 PM
To: hdfs-user@hadoop.apache.org
Subject: Hadoop Debugging in LocalMode (Breakpoints not reached)

Hi there,


i am currently trying to get rid of bugs in my Hadoop program by debugging it. Everything went fine til some point yesterday. I dont know what exactly happened, but my program does not stop at breakpoints within the Reducer and also not within the RawComparator for the values which i do use for sorting my inputs in the ReducerIterator.
(see the classes set for the conf below:)

conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);

The log looks like this:

Warning: $HADOOP_HOME is deprecated.
Listening for transport dt_socket at address: 5002
12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded
12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to process : 2
12/05/21 19:24:20 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/21 19:24:20 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001
12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
12/05/21 19:24:21 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1
12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100
12/05/21 19:24:22 INFO mapred.JobClient:  map 0% reduce 0%
12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720
12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680
12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output
12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/05/21 19:24:23 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets_ext:0+968
12/05/21 19:24:23 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/05/21 19:24:23 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1
12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100
12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720
12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680
12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/05/21 19:24:24 INFO mapred.JobClient:  map 100% reduce 0%
12/05/21 19:24:26 INFO mapred.LocalJobRunner: file:/home/ema/INPUT-H/tweets~:0+0
12/05/21 19:24:26 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/05/21 19:24:26 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0 is already in progress
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging on-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread waiting: Thread for merging on-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for merging in memory files
12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Thread started: Thread for polling Map Completion Events
12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >
12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >
... etc ...

Is there something i have missed?

Thanks for your help in advance!

Best regards,
Björn-Elmar

Hadoop Debugging in LocalMode (Breakpoints not reached)

Posted by Björn-Elmar Macek <ma...@cs.uni-kassel.de>.

Hi there,


i am currently trying to get rid of bugs in my Hadoop program by 
debugging it. Everything went fine til some point yesterday. I dont know 
what exactly happened, but my program does not stop at breakpoints 
within the Reducer and also not within the RawComparator for the values 
which i do use for sorting my inputs in the ReducerIterator.
(see the classes set for the conf below:)

conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);

The log looks like this:

Warning: $HADOOP_HOME is deprecated.

Listening for transport dt_socket at address: 5002

12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded the native-hadoop 
library

12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.

12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library not loaded

12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths to 
process : 2

12/05/21 19:24:20 WARN conf.Configuration: 
file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
attempt to override final parameter: fs.default.name;Ignoring.

12/05/21 19:24:20 WARN conf.Configuration: 
file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a 
attempt to override final parameter: mapred.job.tracker;Ignoring.

12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001

12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0

12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c

12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%

12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output

12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0

12/05/21 19:24:22 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 
is done. And is in the process of commiting

12/05/21 19:24:23 INFO mapred.LocalJobRunner: 
file:/home/ema/INPUT-H/tweets_ext:0+968

12/05/21 19:24:23 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' 
done.

12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585

12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output

12/05/21 19:24:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 
is done. And is in the process of commiting

12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%

12/05/21 19:24:26 INFO mapred.LocalJobRunner: 
file:/home/ema/INPUT-H/tweets~:0+0

12/05/21 19:24:26 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' 
done.

12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c

12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager: 
MemoryLimit=709551680, MaxSingleShuffleLimit=177387920

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Need another 2 map output(s) where 0 is already in progress

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for merging on-disk files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread waiting: Thread for merging on-disk files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Scheduled 0 outputs (0 slow hosts and0 dup hosts)

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for merging in memory files

12/05/21 19:24:27 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 
Thread started: Thread for polling Map Completion Events

12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >

... etc ...

Is there something i have missed?

Thanks for your help in advance!

Best regards,
Björn-Elmar

Re: Hadoop HA

Posted by Ted Dunning <td...@maprtech.com>.

No. 2.0.0 will not have the same level of ha as MapR. Specifically, the job tracker hasn't been addressed and the name node Issues have only been partially addressed. 

On May 22, 2012, at 8:08 AM, Martinus Martinus <ma...@gmail.com> wrote:

> Hi Todd,
> 
> Thanks for your answer. Is that will have the same capability as the commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
> 
> Thanks.
> 
> On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Martinus,
> 
> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> being voted on in the community.
> 
> You can read more here:
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> 
> -Todd
> 
> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> <ma...@gmail.com> wrote:
> > Hi,
> >
> > Is there any hadoop HA distribution out there?
> >
> > Thanks.
> 
> 
> 
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hadoop HA

Posted by Todd Lipcon <to...@cloudera.com>.

On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
<ma...@gmail.com> wrote:
> Hi Todd,
>
> Thanks for your answer. Is that will have the same capability as the
> commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?

I can't speak to a closed source product's feature set. But, the 2.0.0
release has failover support between an active and passive namenode,
and an upcoming release will include automatic failover using Apache
ZooKeeper for failure detection and coordination. These have been
tested significantly under HBase workloads and should fail over
quickly and seamlessly based on our testing results.

Furthermore, they are Apache 2 licensed open source, free of vendor lock-in.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop HA

Posted by Konstantin Boudnik <co...@apache.org>.

Makes it two, countless enough. It's not that I disagree that the gasket
between the keyboard and the chair (aka user) is a typical source of most of
the troubles ;)

Cos

On Sat, May 26, 2012 at 08:44AM, M. C. Srivas wrote:
> On Fri, May 25, 2012 at 8:03 AM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > BTW, Srivas,
> >
> > I could find a single countless example of horror story of 'hadoop fs
> > -rmr' in
> > a form of hypothetical question (and not on this list ;)
> > http://is.gd/55KD1E
> >
> >
> Hi Cos,  accidentally deleting files is one of the most common user errors.
> Here's a real one from just last month
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCAPwpkBvEx4OTUbf6mf8t43oOjZM%2BExUths7XNn3UidqsN3Y8hA%40mail.gmail.com%3E
> 
> 
> As Patrick says in the follow-up, the only way to recover in this situation
> is to shutdown the cluster:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCANS822ga1ivAPi2C9PJsyz6nZgft4msKkH%3Dyj06-i_V%2Bu1B1AA%40mail.gmail.com%3E
> 
> 
> 
> In fact, the above procedure is well-known and well-documented.  Here's
> even an excerpt from Jason's book ProHadoop where he says "it is not
> uncommon for a user to accidentally delete large portions of the HDFS file
> system due to a program error or a command-line error ... best bet is to
> terminate the NN and 2-N immediately, and then shutdown the DNs as fast as
> possible"
> 
> http://books.google.com/books?id=8DV-EzeKigQC&pg=PA122&lpg=PA122&dq=how+to+recover+deleted+files+%2B+hadoop&source=bl&ots=prgSMk1SHL&sig=LPJ0j5MFwJ3zUAcOrvR6FbiWQuQ&hl=en&sa=X&ei=UfXAT76HJuabiALbkdn8Bw&ved=0CLQBEOgBMAQ#v=onepage&q=how%20to%20recover%20deleted%20files%20%2B%20hadoop&f=false
> 
> 
> 
> Just for the sake of full disclosure, of course.
> 
> >
> > Enjoy,
> >  Cos
> >
> > On Tue, May 22, 2012 at 09:45PM, M. C. Srivas wrote:
> > > On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
> > > <ma...@gmail.com>wrote:
> > >
> > > > Hi Todd,
> > > >
> > > > Thanks for your answer. Is that will have the same capability as the
> > > > commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
> > > >
> > > > Thanks.
> > >
> > >
> > > Hi Martinus,   some major differences in HA between MapR's M5 and Apache
> > > Hadoop
> > >
> > > 1. with M5, any node become master at any time. It is a fully
> > active-active
> > > system. You can get create a fully bomb-proof cluster, such that in a
> > > 20-node cluster, you can configure to survive even if 19 of the 20 nodes
> > > are lost. With Apache, it is a 1-1 active-passive system.
> > >
> > > 2. M5 does not require a NFS filer in the backend. Apache Hadoop
> > requires a
> > > Netapp or similar NFS filer to assist in saving the NN data, even in its
> > HA
> > > configuration.  Note that for true HA, the Netapp or similar also will
> > need
> > > to be HA.
> > >
> > > 3. M5 has full HA for the Job-Tracker as well.
> > >
> > > Of course, HA is only a small part of the total business continuity
> > story.
> > >  Full recovery in the face of any kind of failures is critical:
> > >
> > > With M5:
> > >
> > > -  If there is a complete cluster crash and reboot (eg, a full
> > > power-failure of the entire cluster), M5 will recover in 5-10 minutes,
> > and
> > > submitted jobs will resume from where they were.
> > >
> > > - with snapshots, if you upgrade your software and it corrupts data, M5
> > > provides snapshots to help you recover. The number of times I've seen
> > > someone running  "hadoop fs -rmr /" accidentally and asking for help on
> > > this mailing list is beyond counting. With M5, it is completely
> > recoverable
> > >
> > > - full disaster-recovery across clusters by mirroring.
> > >
> > > Hope that clarifies some of the differences.
> > >
> > >
> > > >
> > > >
> > > > On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com>
> > wrote:
> > > >
> > > >> Hi Martinus,
> > > >>
> > > >> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> > > >> being voted on in the community.
> > > >>
> > > >> You can read more here:
> > > >>
> > > >>
> > http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> > > >>
> > > >> -Todd
> > > >>
> > > >> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> > > >> <ma...@gmail.com> wrote:
> > > >> > Hi,
> > > >> >
> > > >> > Is there any hadoop HA distribution out there?
> > > >> >
> > > >> > Thanks.
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Todd Lipcon
> > > >> Software Engineer, Cloudera
> > > >>
> > > >
> > > >
> >

Re: Hadoop HA

Posted by highpointe <hi...@gmail.com>.

Here is my SS:  259 71 2451

On May 26, 2012, at 8:44 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> 
> On Fri, May 25, 2012 at 8:03 AM, Konstantin Boudnik <co...@apache.org> wrote:
> BTW, Srivas,
> 
> I could find a single countless example of horror story of 'hadoop fs -rmr' in
> a form of hypothetical question (and not on this list ;) http://is.gd/55KD1E
> 
>  
> Hi Cos,  accidentally deleting files is one of the most common user errors. Here's a real one from just last month
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCAPwpkBvEx4OTUbf6mf8t43oOjZM%2BExUths7XNn3UidqsN3Y8hA%40mail.gmail.com%3E 
> 
> As Patrick says in the follow-up, the only way to recover in this situation is to shutdown the cluster:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCANS822ga1ivAPi2C9PJsyz6nZgft4msKkH%3Dyj06-i_V%2Bu1B1AA%40mail.gmail.com%3E 
> 
> 
> In fact, the above procedure is well-known and well-documented.  Here's even an excerpt from Jason's book ProHadoop where he says "it is not uncommon for a user to accidentally delete large portions of the HDFS file system due to a program error or a command-line error ... best bet is to terminate the NN and 2-N immediately, and then shutdown the DNs as fast as possible"
> 
> http://books.google.com/books?id=8DV-EzeKigQC&pg=PA122&lpg=PA122&dq=how+to+recover+deleted+files+%2B+hadoop&source=bl&ots=prgSMk1SHL&sig=LPJ0j5MFwJ3zUAcOrvR6FbiWQuQ&hl=en&sa=X&ei=UfXAT76HJuabiALbkdn8Bw&ved=0CLQBEOgBMAQ#v=onepage&q=how%20to%20recover%20deleted%20files%20%2B%20hadoop&f=false 
> 
> 
> Just for the sake of full disclosure, of course.
> 
> Enjoy,
>  Cos
> 
> On Tue, May 22, 2012 at 09:45PM, M. C. Srivas wrote:
> > On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
> > <ma...@gmail.com>wrote:
> >
> > > Hi Todd,
> > >
> > > Thanks for your answer. Is that will have the same capability as the
> > > commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
> > >
> > > Thanks.
> >
> >
> > Hi Martinus,   some major differences in HA between MapR's M5 and Apache
> > Hadoop
> >
> > 1. with M5, any node become master at any time. It is a fully active-active
> > system. You can get create a fully bomb-proof cluster, such that in a
> > 20-node cluster, you can configure to survive even if 19 of the 20 nodes
> > are lost. With Apache, it is a 1-1 active-passive system.
> >
> > 2. M5 does not require a NFS filer in the backend. Apache Hadoop requires a
> > Netapp or similar NFS filer to assist in saving the NN data, even in its HA
> > configuration.  Note that for true HA, the Netapp or similar also will need
> > to be HA.
> >
> > 3. M5 has full HA for the Job-Tracker as well.
> >
> > Of course, HA is only a small part of the total business continuity story.
> >  Full recovery in the face of any kind of failures is critical:
> >
> > With M5:
> >
> > -  If there is a complete cluster crash and reboot (eg, a full
> > power-failure of the entire cluster), M5 will recover in 5-10 minutes, and
> > submitted jobs will resume from where they were.
> >
> > - with snapshots, if you upgrade your software and it corrupts data, M5
> > provides snapshots to help you recover. The number of times I've seen
> > someone running  "hadoop fs -rmr /" accidentally and asking for help on
> > this mailing list is beyond counting. With M5, it is completely recoverable
> >
> > - full disaster-recovery across clusters by mirroring.
> >
> > Hope that clarifies some of the differences.
> >
> >
> > >
> > >
> > > On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com> wrote:
> > >
> > >> Hi Martinus,
> > >>
> > >> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> > >> being voted on in the community.
> > >>
> > >> You can read more here:
> > >>
> > >> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> > >>
> > >> -Todd
> > >>
> > >> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> > >> <ma...@gmail.com> wrote:
> > >> > Hi,
> > >> >
> > >> > Is there any hadoop HA distribution out there?
> > >> >
> > >> > Thanks.
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
>

Re: Hadoop HA

Posted by "M. C. Srivas" <mc...@gmail.com>.

On Fri, May 25, 2012 at 8:03 AM, Konstantin Boudnik <co...@apache.org> wrote:

> BTW, Srivas,
>
> I could find a single countless example of horror story of 'hadoop fs
> -rmr' in
> a form of hypothetical question (and not on this list ;)
> http://is.gd/55KD1E
>
>
Hi Cos,  accidentally deleting files is one of the most common user errors.
Here's a real one from just last month

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCAPwpkBvEx4OTUbf6mf8t43oOjZM%2BExUths7XNn3UidqsN3Y8hA%40mail.gmail.com%3E


As Patrick says in the follow-up, the only way to recover in this situation
is to shutdown the cluster:

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201204.mbox/%3CCANS822ga1ivAPi2C9PJsyz6nZgft4msKkH%3Dyj06-i_V%2Bu1B1AA%40mail.gmail.com%3E



In fact, the above procedure is well-known and well-documented.  Here's
even an excerpt from Jason's book ProHadoop where he says "it is not
uncommon for a user to accidentally delete large portions of the HDFS file
system due to a program error or a command-line error ... best bet is to
terminate the NN and 2-N immediately, and then shutdown the DNs as fast as
possible"

http://books.google.com/books?id=8DV-EzeKigQC&pg=PA122&lpg=PA122&dq=how+to+recover+deleted+files+%2B+hadoop&source=bl&ots=prgSMk1SHL&sig=LPJ0j5MFwJ3zUAcOrvR6FbiWQuQ&hl=en&sa=X&ei=UfXAT76HJuabiALbkdn8Bw&ved=0CLQBEOgBMAQ#v=onepage&q=how%20to%20recover%20deleted%20files%20%2B%20hadoop&f=false



Just for the sake of full disclosure, of course.

>
> Enjoy,
>  Cos
>
> On Tue, May 22, 2012 at 09:45PM, M. C. Srivas wrote:
> > On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
> > <ma...@gmail.com>wrote:
> >
> > > Hi Todd,
> > >
> > > Thanks for your answer. Is that will have the same capability as the
> > > commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
> > >
> > > Thanks.
> >
> >
> > Hi Martinus,   some major differences in HA between MapR's M5 and Apache
> > Hadoop
> >
> > 1. with M5, any node become master at any time. It is a fully
> active-active
> > system. You can get create a fully bomb-proof cluster, such that in a
> > 20-node cluster, you can configure to survive even if 19 of the 20 nodes
> > are lost. With Apache, it is a 1-1 active-passive system.
> >
> > 2. M5 does not require a NFS filer in the backend. Apache Hadoop
> requires a
> > Netapp or similar NFS filer to assist in saving the NN data, even in its
> HA
> > configuration.  Note that for true HA, the Netapp or similar also will
> need
> > to be HA.
> >
> > 3. M5 has full HA for the Job-Tracker as well.
> >
> > Of course, HA is only a small part of the total business continuity
> story.
> >  Full recovery in the face of any kind of failures is critical:
> >
> > With M5:
> >
> > -  If there is a complete cluster crash and reboot (eg, a full
> > power-failure of the entire cluster), M5 will recover in 5-10 minutes,
> and
> > submitted jobs will resume from where they were.
> >
> > - with snapshots, if you upgrade your software and it corrupts data, M5
> > provides snapshots to help you recover. The number of times I've seen
> > someone running  "hadoop fs -rmr /" accidentally and asking for help on
> > this mailing list is beyond counting. With M5, it is completely
> recoverable
> >
> > - full disaster-recovery across clusters by mirroring.
> >
> > Hope that clarifies some of the differences.
> >
> >
> > >
> > >
> > > On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hi Martinus,
> > >>
> > >> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> > >> being voted on in the community.
> > >>
> > >> You can read more here:
> > >>
> > >>
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> > >>
> > >> -Todd
> > >>
> > >> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> > >> <ma...@gmail.com> wrote:
> > >> > Hi,
> > >> >
> > >> > Is there any hadoop HA distribution out there?
> > >> >
> > >> > Thanks.
> > >>
> > >>
> > >>
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
>

Re: Hadoop HA

Posted by Konstantin Boudnik <co...@apache.org>.

BTW, Srivas,

I could find a single countless example of horror story of 'hadoop fs -rmr' in
a form of hypothetical question (and not on this list ;) http://is.gd/55KD1E

Just for the sake of full disclosure, of course.

Enjoy,
  Cos

On Tue, May 22, 2012 at 09:45PM, M. C. Srivas wrote:
> On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
> <ma...@gmail.com>wrote:
> 
> > Hi Todd,
> >
> > Thanks for your answer. Is that will have the same capability as the
> > commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
> >
> > Thanks.
> 
> 
> Hi Martinus,   some major differences in HA between MapR's M5 and Apache
> Hadoop
> 
> 1. with M5, any node become master at any time. It is a fully active-active
> system. You can get create a fully bomb-proof cluster, such that in a
> 20-node cluster, you can configure to survive even if 19 of the 20 nodes
> are lost. With Apache, it is a 1-1 active-passive system.
> 
> 2. M5 does not require a NFS filer in the backend. Apache Hadoop requires a
> Netapp or similar NFS filer to assist in saving the NN data, even in its HA
> configuration.  Note that for true HA, the Netapp or similar also will need
> to be HA.
> 
> 3. M5 has full HA for the Job-Tracker as well.
> 
> Of course, HA is only a small part of the total business continuity story.
>  Full recovery in the face of any kind of failures is critical:
> 
> With M5:
> 
> -  If there is a complete cluster crash and reboot (eg, a full
> power-failure of the entire cluster), M5 will recover in 5-10 minutes, and
> submitted jobs will resume from where they were.
> 
> - with snapshots, if you upgrade your software and it corrupts data, M5
> provides snapshots to help you recover. The number of times I've seen
> someone running  "hadoop fs -rmr /" accidentally and asking for help on
> this mailing list is beyond counting. With M5, it is completely recoverable
> 
> - full disaster-recovery across clusters by mirroring.
> 
> Hope that clarifies some of the differences.
> 
> 
> >
> >
> > On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> >> Hi Martinus,
> >>
> >> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> >> being voted on in the community.
> >>
> >> You can read more here:
> >>
> >> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> >>
> >> -Todd
> >>
> >> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> >> <ma...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > Is there any hadoop HA distribution out there?
> >> >
> >> > Thanks.
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >

Re: Hadoop HA

Posted by highpointe <hi...@gmail.com>.

Here is my SS:  259 71 2451

On May 26, 2012, at 8:53 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> 
> 
> On Fri, May 25, 2012 at 8:43 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Srivas,
> 
> On May 22, 2012, at 9:45 PM, M. C. Srivas wrote:
>> 
>> 3. M5 has full HA for the Job-Tracker as well. 
> 
> Curious. Can you please share some information about what this means?
> 
> The JT will be restarted (perhaps on another node if the node where it's running has died).  On recovery, JT will resume currently running jobs from where they were, ie., they are not lost or abandoned like is the case today with Apache Hadoop or CDH.
> 
>  
> Will tasks continue to run if JT bounces?
> 
> Yes.
> 
>  
> Will jobs start from scratch?
> 
> No.  Works even across entire cluster reboots.  As I said in my original posting,
> 
>  "-  If there is a complete cluster crash and reboot (eg, a full power-failure of the entire cluster), M5 will recover in 5-10 minutes, and submitted jobs will resume from where they were."
> 
> 
> 
> thanks,
> Arun
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
>

Re: Hadoop HA

Posted by "M. C. Srivas" <mc...@gmail.com>.

On Fri, May 25, 2012 at 8:43 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Srivas,
>
> On May 22, 2012, at 9:45 PM, M. C. Srivas wrote:
>
>
> 3. M5 has full HA for the Job-Tracker as well.
>
>
> Curious. Can you please share some information about what this means?
>

The JT will be restarted (perhaps on another node if the node where it's
running has died).  On recovery, JT will resume currently running jobs from
where they were, ie., they are not lost or abandoned like is the case today
with Apache Hadoop or CDH.

> Will tasks continue to run if JT bounces?
>

Yes.

> Will jobs start from scratch?
>

No.  Works even across entire cluster reboots.  As I said in my original
posting,

 "-  If there is a complete cluster crash and reboot (eg, a full
power-failure of the entire cluster), M5 will recover in 5-10 minutes, and
submitted jobs will resume from where they were."

> thanks,
> Arun
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Hadoop HA

Posted by Arun C Murthy <ac...@hortonworks.com>.

Srivas,

On May 22, 2012, at 9:45 PM, M. C. Srivas wrote:
> 
> 3. M5 has full HA for the Job-Tracker as well. 

Curious. Can you please share some information about what this means? Will tasks continue to run if JT bounces? Will jobs start from scratch?

thanks,
Arun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Hadoop HA

Posted by "M. C. Srivas" <mc...@gmail.com>.

On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus
<ma...@gmail.com>wrote:

> Hi Todd,
>
> Thanks for your answer. Is that will have the same capability as the
> commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?
>
> Thanks.

Hi Martinus,   some major differences in HA between MapR's M5 and Apache
Hadoop

1. with M5, any node become master at any time. It is a fully active-active
system. You can get create a fully bomb-proof cluster, such that in a
20-node cluster, you can configure to survive even if 19 of the 20 nodes
are lost. With Apache, it is a 1-1 active-passive system.

2. M5 does not require a NFS filer in the backend. Apache Hadoop requires a
Netapp or similar NFS filer to assist in saving the NN data, even in its HA
configuration.  Note that for true HA, the Netapp or similar also will need
to be HA.

3. M5 has full HA for the Job-Tracker as well.

Of course, HA is only a small part of the total business continuity story.
 Full recovery in the face of any kind of failures is critical:

With M5:

-  If there is a complete cluster crash and reboot (eg, a full
power-failure of the entire cluster), M5 will recover in 5-10 minutes, and
submitted jobs will resume from where they were.

- with snapshots, if you upgrade your software and it corrupts data, M5
provides snapshots to help you recover. The number of times I've seen
someone running  "hadoop fs -rmr /" accidentally and asking for help on
this mailing list is beyond counting. With M5, it is completely recoverable

- full disaster-recovery across clusters by mirroring.

Hope that clarifies some of the differences.

>
>
> On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hi Martinus,
>>
>> Hadoop HA is available in Hadoop 2.0.0. This release is currently
>> being voted on in the community.
>>
>> You can read more here:
>>
>> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
>>
>> -Todd
>>
>> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
>> <ma...@gmail.com> wrote:
>> > Hi,
>> >
>> > Is there any hadoop HA distribution out there?
>> >
>> > Thanks.
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: Hadoop HA

Posted by Martinus Martinus <ma...@gmail.com>.

Hi Todd,

Thanks for your answer. Is that will have the same capability as the
commercial M5 of MapR : http://www.mapr.com/products/why-mapr ?

Thanks.

On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Martinus,
>
> Hadoop HA is available in Hadoop 2.0.0. This release is currently
> being voted on in the community.
>
> You can read more here:
>
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
>
> -Todd
>
> On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
> <ma...@gmail.com> wrote:
> > Hi,
> >
> > Is there any hadoop HA distribution out there?
> >
> > Thanks.
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hadoop HA

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Martinus,

Hadoop HA is available in Hadoop 2.0.0. This release is currently
being voted on in the community.

You can read more here:
http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/

-Todd

On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus
<ma...@gmail.com> wrote:
> Hi,
>
> Is there any hadoop HA distribution out there?
>
> Thanks.



-- 
Todd Lipcon
Software Engineer, Cloudera