You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shi Yu <sh...@uchicago.edu> on 2012/04/10 23:59:37 UTC

New question: Passing files and directory structures to the map reduce cluster via hadoop streaming?

Hi,

I looked back to the old post trying to find out a solution to my 
problem.  I am using hadoop 0.20.203 streaming for a C++ program. The 
program loads many dictionaries stored in local folders. For example,

mainfolder - dir1 ->  dicfile 1
mainfolder - dir1 ->  dicfile 2
mainfolder - dir2 ->  dicfile 3
mainfolder - dir2 ->  dicfile 4

I didn't change those dictionary loading functions in C++ based on the 
assumption that the whole directory at mainfolder level could be passed 
to streaming.  However, it seems not working well cause I observed the 
following error:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)


It seems the program failed to load the dictionaries. What is the most 
efficient way to do pass multiple files with directory dependencies in 
hadoop streaming?  I guess I don't need to change the C++ code, or 
should I remove all the directory dependencies in dictionary loading?

Thanks!

Shi

On 6/29/2011 1:44 AM, Guang-Nan Cheng wrote:
> Well, my bad. I made a simple test and confirmed that  -files works that way
> already.
>
> On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:

> I'm fancied about passing a whole ruby app to streaming, so I don't need
> to
> bother with ruby file dependencies.
>
> For example,
>
> ./streaming
>
> ...
> -mapper 'ruby aaa/bbb/ccc'
> -files  aaa<--- pass the folder
>
>
>
>
> Is this supported already? If not, any tips on how to make this work?
>> I'm
>>>> willing to add some code by myself and rebuild the streaming jar.
>>>>
>>> --
>>> Nick Jones
>>>
>>>
>>>