You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2009/03/16 16:59:45 UTC
Task Side Effect files and copying(getWorkOutputPath)
Hello,
I would like to produce side effect files which will be later copied
to the outputfolder.
I am using FileOuputFormat, and in the Map's close() method i copy
files (from the local tmp/ folder) to
FileOutputFormat.getWorkOutputPath(job);
void close() .... {
if (shouldcopy) {
ArrayList<Path> lop = new ArrayList<Path>();
for(String ff : tempdir.list()){
lop.add(new Path(temppfx+ff));
}
dstFS.moveFromLocalFile(lop.toArray(new Path[]{}), dstPath);
}
However, this throws an error java.io.IOException:
`hdfs://X:54310/tmp/testseq/_temporary/_attempt_200903160945_0010_m_000000_0':
specified destination directory doest not exist
I though this is the right to place to drop side effect files. Prior
to this I was copying o the output folder, but many were not copied,
or in fact all were, but during the reduce output stage many were
deleted - am not sure(I used NullOutputFormat and all the files were
present in the output folder) So i resorted to getWorkOutputPath
which threw the above exception.
So if I'm using FileOutputFormat, and my maps and/or reduces produce
side effects files on the localFS
1)when should I copy them to the DFS (e.g the close method? or one at
a time in the map/reduce method)
2) Where should i copy them to.
I am using Hadoop 0.19 and have set jobConf.setNumTasksToExecutePerJvm(-1);
Also, each side effect file produced has a unique name, i.e there is
no overwriting.
Thank you
Saptarshi Guha
Re: Task Side Effect files and copying(getWorkOutputPath)
Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Saptarshi Guha wrote:
> Hello,
> I would like to produce side effect files which will be later copied
> to the outputfolder.
> I am using FileOuputFormat, and in the Map's close() method i copy
> files (from the local tmp/ folder) to
> FileOutputFormat.getWorkOutputPath(job);
>
>
FileOutputFormat.getWorkOutputPath(job) is the correct method to get directory for task-side effect files.
You should not use close() method, because promotion to output directory
happens before close(). You can use configure() method.
See org.apache.hadoop.tools.HadoopArchives.
> void close() .... {
> if (shouldcopy) {
> ArrayList<Path> lop = new ArrayList<Path>();
> for(String ff : tempdir.list()){
> lop.add(new Path(temppfx+ff));
> }
> dstFS.moveFromLocalFile(lop.toArray(new Path[]{}), dstPath);
> }
>
> However, this throws an error java.io.IOException:
> `hdfs://X:54310/tmp/testseq/_temporary/_attempt_200903160945_0010_m_000000_0':
> specified destination directory doest not exist
>
> I though this is the right to place to drop side effect files. Prior
> to this I was copying o the output folder, but many were not copied,
> or in fact all were, but during the reduce output stage many were
> deleted - am not sure(I used NullOutputFormat and all the files were
> present in the output folder) So i resorted to getWorkOutputPath
> which threw the above exception.
>
> So if I'm using FileOutputFormat, and my maps and/or reduces produce
> side effects files on the localFS
> 1)when should I copy them to the DFS (e.g the close method? or one at
> a time in the map/reduce method)
> 2) Where should i copy them to.
>
> I am using Hadoop 0.19 and have set jobConf.setNumTasksToExecutePerJvm(-1);
> Also, each side effect file produced has a unique name, i.e there is
> no overwriting.
>
You need not set jobConf.setNumTasksToExecutePerJvm(-1), even otherwise,
each attempt will have unique work output path.
Thanks
Amareshwari