You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by aa...@buffalo.edu on 2009/11/22 23:48:36 UTC

Re: Re： Help in Hadoop

Hellow,
       If I write the output of the 10 tasks in 10 different files then how do I
go about merging the output ? Is there some in built functionality or do I have
to write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  5:40 PM , Gang Luo lgpublic@yahoo.com.cn sent:
> Hi. If the output path already exists, it seems you could not execute any
> task with the same output path. I think you can output the results of the
> 10 tasks to 10 different paths, and then do sth more (by the 11th task, for
> example) to merge the 10 results into 1 file. 
> 
> Gang Luo
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> gang.luo@du
> ke.edu
> 
> 
> ----- å��å§�é�®ä»¶ ----
> å��ä»¶äººï¼� "aa225@buffa
> lo.edu" <aa225@buffa
> lo.edu>æ�¶ä»¶äººï¼� common-user@hadoop.apache.orgå��é��æ�¥æ��ï¼� 2009/11/22
> (å�¨æ�¥) 5:25:55 ä¸�å��ä¸»   é¢�ï¼� Help in Hadoop
> 
> Hello Everybody,
> I have a doubt in a map reduce program and I would appreciate any
> help. I run the program using the command bin/hadoop jar HomeWork.jar prg1
> inputoutput. Ideally from within prg1, I want to sequentially launch 10 map-
> reducetasks. I want to store the output of all these map reduce tasks in some
> file.Currently I have kept the input format and output format of the jobs as
> TextInputFormat and TextOutputFormat respectively. Now I have the
> followingquestions.
> 
> 1. When I run more than 1 task from the same program, the output file of
> all thetasks is same. The framework does not allows the 2   map reduce task to
> have thesame output file as task 1.
> 
> 2. Before the 2 task launches I also get this error 
> 
> Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> alreadyinitialized
> 
> 3. When the 2 map reduce tasks writes its output to file
> "output", wont theprevious content of this file get over written ?
> 
> Thank You
> 
> Abhishek Agrawal
> 
> SUNY- Buffalo
> (716-435-7122)
> 
> 
> ___________________________________________________________ 
> å¥½ç�©è´ºå�¡&cce
> dil;�ä½ å��ï¼�é�®ç
> ;®±è´ºå�¡å�¨æ�°
> ;ä¸�çº¿ï¼� http://card.mail.cn.yahoo.com/
> 
> 
>

Re： Re： Help in Hadoop

Posted by Gang Luo <lg...@yahoo.com.cn>.

aha, I think you don't need to do that using mapreduce. Just a small program to read the 10 files and wirte what you read to one output file is enough. Use the hdfs api. 

--Gang






________________________________
发件人： "aa225@buffalo.edu" <aa...@buffalo.edu>
收件人： common-user@hadoop.apache.org; aa225@buffalo.edu; Gang Luo <lg...@yahoo.com.cn>
发送日期： 2009/11/22 (周日) 5:48:36 下午
主   题： Re: Re： Help in Hadoop

Hellow,
       If I write the output of the 10 tasks in 10 different files then how do I
go about merging the output ? Is there some in built functionality or do I have
to write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  5:40 PM , Gang Luo lgpublic@yahoo.com.cn sent:
> Hi. If the output path already exists, it seems you could not execute any
> task with the same output path. I think you can output the results of the
> 10 tasks to 10 different paths, and then do sth more (by the 11th task, for
> example) to merge the 10 results into 1 file. 
> 
> Gang Luo
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> gang.luo@du
> ke.edu
> 
> 
> ----- å��å§�é�®ä»¶ ----
> å��ä»¶äººï¼� "aa225@buffa
> lo.edu" <aa225@buffa
> lo.edu>æ�¶ä»¶äººï¼� common-user@hadoop.apache.orgå��é��æ�¥æ��ï¼� 2009/11/22
> (å�¨æ�¥) 5:25:55 ä¸�å��ä¸»   é¢�ï¼� Help in Hadoop
> 
> Hello Everybody,
> I have a doubt in a map reduce program and I would appreciate any
> help. I run the program using the command bin/hadoop jar HomeWork.jar prg1
> inputoutput. Ideally from within prg1, I want to sequentially launch 10 map-
> reducetasks. I want to store the output of all these map reduce tasks in some
> file.Currently I have kept the input format and output format of the jobs as
> TextInputFormat and TextOutputFormat respectively. Now I have the
> followingquestions.
> 
> 1. When I run more than 1 task from the same program, the output file of
> all thetasks is same. The framework does not allows the 2   map reduce task to
> have thesame output file as task 1.
> 
> 2. Before the 2 task launches I also get this error 
> 
> Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> alreadyinitialized
> 
> 3. When the 2 map reduce tasks writes its output to file
> "output", wont theprevious content of this file get over written ?
> 
> Thank You
> 
> Abhishek Agrawal
> 
> SUNY- Buffalo
> (716-435-7122)
> 
> 
> ___________________________________________________________ 
> å¥½ç�©è´ºå�¡&cce
> dil;�ä½ å��ï¼�é�®ç
> ;®±è´ºå�¡å�¨æ�°
> ;ä¸�çº¿ï¼� http://card.mail.cn.yahoo.com/
> 
> 
> 


      ___________________________________________________________ 
  好玩贺卡等你发，邮箱贺卡全新上线！ 
http://card.mail.cn.yahoo.com/