You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2008/03/11 10:10:46 UTC

[jira] Updated: (HADOOP-2622) Fix -file option in Streaming to use Distributed Cache

     [ https://issues.apache.org/jira/browse/HADOOP-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-2622:
--------------------------------------------

    Attachment: patch-2622.txt

bq. if users have their own inputFormat, they would have to jar it with streaming jar and use the custom jar because setInputFormat is done at client side. So, passing that via -file does not work. It would be really helpful if this is also address.

If we add Inputformat class (hierarchy also if any) using -file will work with current code, if we add the jar to the classpath. 
For example, If you have a.b.c.MyInputFormat as the inputformat, and dir hierarchy is dir/a/b/c/MyInputFormat.class then inputformat can be added to the jar using following command:
{noformat}
bin/hadoop jar build/contrib/streaming/hadoop-0.17.0-dev-streaming.jar -mapper my.pl -input t.txt -output output -file my.pl -file dir/ -inputformat a.b.c.MyInputFormat
{noformat}

Here is patch which will add the jar file to the classpath.
I tested this to add an inputformat, and this worked fine.
Lohit, Can you apply this patch and check if use of -file works for adding inputformat ?

> Fix -file option in Streaming to use Distributed Cache
> ------------------------------------------------------
>
>                 Key: HADOOP-2622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2622
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.17.0
>
>         Attachments: patch-2622.txt
>
>
> The -file option works by putting the script into the job's jar file by unjar-ing, copying and then jar-ing it again.
> We should rework the -file option to use the DistributedCache and the symlink option it provides.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.