You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/07/01 00:26:50 UTC

[jira] Created: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

job jar file is not distributed via DistributedCache
----------------------------------------------------

                 Key: MAPREDUCE-1902
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Joydeep Sen Sarma


The main jar file for an job is not distributed via the distributed cache. It would be more efficient if that were the case.

It would also allow us to comprehensively tackle the inefficiencies in distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884081#action_12884081 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1902:
----------------------------------------------

Perhaps there's some history to why things are this way - if anyone knows - please do share.

> job jar file is not distributed via DistributedCache
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-1902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>
> The main jar file for an job is not distributed via the distributed cache. It would be more efficient if that were the case.
> It would also allow us to comprehensively tackle the inefficiencies in distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891532#action_12891532 ] 

Vinod K V commented on MAPREDUCE-1902:
--------------------------------------

Both are equally efficient I think, unless you bring in sharing of job jars across jobs also.

It'd definitely help code reuse.

I checked trunk and realized that only a minor difference exists between the present way and the dist-cache way. We also un-jar the job.jar so that classes inside sub-directories (according to a job-configurable pattern), for e.g., lib/, classes/, are also made to be available on class-path. Accommodating it should be straight forward.

> job jar file is not distributed via DistributedCache
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-1902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>
> The main jar file for an job is not distributed via the distributed cache. It would be more efficient if that were the case.
> It would also allow us to comprehensively tackle the inefficiencies in distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884435#action_12884435 ] 

Doug Cutting commented on MAPREDUCE-1902:
-----------------------------------------

This sounds like a good thing to explore.  DistributedCache was added when folks wanted to distribute stuff besides the job jar, but merging the two might be good.


> job jar file is not distributed via DistributedCache
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-1902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>
> The main jar file for an job is not distributed via the distributed cache. It would be more efficient if that were the case.
> It would also allow us to comprehensively tackle the inefficiencies in distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.