You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by aa...@buffalo.edu on 2009/11/22 23:48:36 UTC

Re: Re: Help in Hadoop

Hellow,
       If I write the output of the 10 tasks in 10 different files then how do I
go about merging the output ? Is there some in built functionality or do I have
to write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  5:40 PM , Gang Luo lgpublic@yahoo.com.cn sent:
> Hi. If the output path already exists, it seems you could not execute any
> task with the same output path. I think you can output the results of the
> 10 tasks to 10 different paths, and then do sth more (by the 11th task, for
> example) to merge the 10 results into 1 file. 
> 
> Gang Luo
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> gang.luo@du
> ke.edu
> 
> 
> ----- ���件 ----
> �件人� "aa225@buffa
> lo.edu" <aa225@buffa
> lo.edu>�件人� common-user@hadoop.apache.org����� 2009/11/22
> (��) 5:25:55 ��主   �� Help in Hadoop
> 
> Hello Everybody,
> I have a doubt in a map reduce program and I would appreciate any
> help. I run the program using the command bin/hadoop jar HomeWork.jar prg1
> inputoutput. Ideally from within prg1, I want to sequentially launch 10 map-
> reducetasks. I want to store the output of all these map reduce tasks in some
> file.Currently I have kept the input format and output format of the jobs as
> TextInputFormat and TextOutputFormat respectively. Now I have the
> followingquestions.
> 
> 1. When I run more than 1 task from the same program, the output file of
> all thetasks is same. The framework does not allows the 2   map reduce task to
> have thesame output file as task 1.
> 
> 2. Before the 2 task launches I also get this error 
> 
> Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> alreadyinitialized
> 
> 3. When the 2 map reduce tasks writes its output to file
> "output", wont theprevious content of this file get over written ?
> 
> Thank You
> 
> Abhishek Agrawal
> 
> SUNY- Buffalo
> (716-435-7122)
> 
> 
> ___________________________________________________________ 
> 好�贺�&cce
> dil;­�ä½ å��ï¼�é�®ç
> ;®±è´ºå�¡å�¨æ�°
> ;�线� http://card.mail.cn.yahoo.com/
> 
> 
> 


Re: Re: Help in Hadoop

Posted by Gang Luo <lg...@yahoo.com.cn>.
aha, I think you don't need to do that using mapreduce. Just a small program to read the 10 files and wirte what you read to one output file is enough. Use the hdfs api. 

--Gang






________________________________
发件人: "aa225@buffalo.edu" <aa...@buffalo.edu>
收件人: common-user@hadoop.apache.org; aa225@buffalo.edu; Gang Luo <lg...@yahoo.com.cn>
发送日期: 2009/11/22 (周日) 5:48:36 下午
主   题: Re: Re: Help in Hadoop

Hellow,
       If I write the output of the 10 tasks in 10 different files then how do I
go about merging the output ? Is there some in built functionality or do I have
to write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  5:40 PM , Gang Luo lgpublic@yahoo.com.cn sent:
> Hi. If the output path already exists, it seems you could not execute any
> task with the same output path. I think you can output the results of the
> 10 tasks to 10 different paths, and then do sth more (by the 11th task, for
> example) to merge the 10 results into 1 file. 
> 
> Gang Luo
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> gang.luo@du
> ke.edu
> 
> 
> ----- ���件 ----
> �件人� "aa225@buffa
> lo.edu" <aa225@buffa
> lo.edu>�件人� common-user@hadoop.apache.org����� 2009/11/22
> (��) 5:25:55 ��主   �� Help in Hadoop
> 
> Hello Everybody,
> I have a doubt in a map reduce program and I would appreciate any
> help. I run the program using the command bin/hadoop jar HomeWork.jar prg1
> inputoutput. Ideally from within prg1, I want to sequentially launch 10 map-
> reducetasks. I want to store the output of all these map reduce tasks in some
> file.Currently I have kept the input format and output format of the jobs as
> TextInputFormat and TextOutputFormat respectively. Now I have the
> followingquestions.
> 
> 1. When I run more than 1 task from the same program, the output file of
> all thetasks is same. The framework does not allows the 2   map reduce task to
> have thesame output file as task 1.
> 
> 2. Before the 2 task launches I also get this error 
> 
> Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> alreadyinitialized
> 
> 3. When the 2 map reduce tasks writes its output to file
> "output", wont theprevious content of this file get over written ?
> 
> Thank You
> 
> Abhishek Agrawal
> 
> SUNY- Buffalo
> (716-435-7122)
> 
> 
> ___________________________________________________________ 
> 好�贺�&cce
> dil;­�ä½ å��ï¼�é�®ç
> ;®±è´ºå�¡å�¨æ�°
> ;�线� http://card.mail.cn.yahoo.com/
> 
> 
> 


      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/