You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2007/01/12 11:51:27 UTC

[jira] Commented: (HADOOP-452) Adding caching to Hadoop which is independent of the task trackers.

    [ https://issues.apache.org/jira/browse/HADOOP-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464166 ] 

Alejandro Abdelnur commented on HADOOP-452:
-------------------------------------------

Need to run MR jobs that add up to 10Mb in the size of the of the JAR file being submitted (dependent jars being the main culprits).

This slows down things as the JAR has to be copied to all nodes participating of the MR in order to run the job.

Many of these jobs are the same MR using different input/output and arguments.

These MR are fired thousand of times a day, even if copying them with high priority it is a few seconds per job.

If somehow we could upload a Job JAR with and ID and then use it repeatedly just by sending the JOB JAR ID and a JobConf file it would be great.

It is not possible/practical to set the JARs in hadoop/lib as the clusters are shared and we need to be able to udpate JARs without bringing down the cluster.


> Adding caching to Hadoop which is independent of the task trackers.
> -------------------------------------------------------------------
>
>                 Key: HADOOP-452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-452
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Mahadev konar
>         Assigned To: Owen O'Malley
>            Priority: Minor
>
> It would be nice to have a feature in Hadoop that could cache files locally that is independent of TaskTrackers and JobTrackers. Hadoop-288 caching is dependent on the tasktrackers. In an environment where you would dynamically bring up and down the TaskTrackers for resource sharing, that is problematic. It would be good to have this feature wherein you can install tasktrackers/jobtrackers on these machines using this caching mechanism. The caching feature could use something like Bittorent /http/rsync to copy the main hadoop.jar.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira