You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Milind Bhandarkar (JIRA)" <ji...@apache.org> on 2007/10/28 20:39:50 UTC

[jira] Created: (HADOOP-2116) Job.local.dir to be exposed to tasks

Job.local.dir to be exposed to tasks
------------------------------------

                 Key: HADOOP-2116
                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.14.3
         Environment: All
            Reporter: Milind Bhandarkar
             Fix For: 0.16.0


Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Attachment: patch-2116.txt

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556966#action_12556966 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

There are several advantages to have job.local.dir to be empty when the first task from that job starts on a tasktracker. (It would simplify the logic for user code to populate it with job-specific cached data that cannot use jobCache functionality.)

That is why I suggest that mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557918#action_12557918 ] 

Owen O'Malley commented on HADOOP-2116:
---------------------------------------

*Ugh* is right.

I'd propose some better names:

$local/work/$jobid/
       cache/               -- file cache
       jars/                    -- expanded jar
       job.xml               -- the generic job conf
       $taskid/
             job.xml        -- task localized job conf
             output/         -- map outputs
             work/            -- cwd for task
 
with each of the leaf directories being placed independently on the partitions.

We should define localized attributes to point to where each of the leaf directories is.

I agree with Arun that we should re-work the -file option to use the file cache with symlinks.
    

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560621#action_12560621 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

As long as ../work works currently from task cwd to a shared job-specific directory, I am okay with punting this.
So, +1.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Attachment: patch-2116.txt

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Status: Patch Available  (was: Open)

This patch has the empty work directory available as scratch space through environment variable "job.local.dir".
The directory layout is as described earlier. 
I did thourough testing ; tested wordcount, sort and streaming job.


> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2116:
----------------------------------

    Status: Open  (was: Patch Available)

This patch sets a system property 'job.local.dir', I'm assuming that it is inherited by the children?

{noformat}
+        System.setProperty("job.local.dir", workDir.toString());
{noformat}

Even so, I think we should set a property in the JobConf to be consistent.

----

Overall, I'm a little concerned that this is quite late (w.r.t 0.16.0) to be getting this in. I spoke to Milind and he is happy with the HADOOP-2570 (the symlink to ../work) - especially given the number of changes we need to make where we use something.getParent().{}. Hence I propose we push this to 0.17.0 and also make it a bigger change incorporating wider changes to the task's local directories proposed by Owen. Thoughts?

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2116:
----------------------------------

    Status: Open  (was: Patch Available)

I light of HADOOP-2570, I'm cancelling this patch.

Reasoning:

The *-file* option works by putting the script into the job's jar file by unjar-ing, copying and then jar-ing it again. (yuck!) 

This means that on the TaskTracker the script has moved from jobCache/work to jobCache/job_jar_xml (I propose we rename that to *private*, heh). Clearly user-scripts which rely on "../work/<script_name>" will break again...

Having said that we need to debate whether this feature is an incompatible-change, what do folks think?

If people say otherwise we need to ensure all files in jobCache/private are smylinked into jobCache/work... ugh!

----

I'd like to take this opportunity to take a hard look at streaming's *-file* option too. The unjar/jar way is completely backwards! We _should_ rework the -file option to use the DistributedCache and the symlink option it provides.
So, user-scripts can simply be "./<script>" rather than "../work/<script>". Yes, the way to maintain compatibility (if we want) is to use the previous option of symlinking files into jobCache/work also. I'd strongly vote for this option.

Thoughts?

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554603 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

I would prefer separating the two. I.e. where job.jar goes, versus where the job.local.dir goes. Especially for streaming, where side-effect tasks are common, the mapper and reducer commands would need to have a clean directory (empty) where they can cache job-specific data (dictionaries downloaded off the network etc, that cannot be packaged as distributed archives). If job.jar also lives there, it might someday clash with the files downloaded, and cause issues.

So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.

Is jobCacheDir available via a config variable ?

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560633#action_12560633 ] 

Hadoop QA commented on HADOOP-2116:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12373478/patch-2116.txt
against trunk revision r613115.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1644/console

This message is automatically generated.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555287#action_12555287 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2116:
--------------------------------------------------

In the current state of art, jobCacheDir is "mapred/local/taskTracker/jobcache/<job_id>/work". 
I far as I understood, this needs to be accessible as "job.local.dir", a job-specific shared directory for use as scratch space. 

bq. So, mapred.jar, jobCacheDir, and job.local.dir all need to be different locations.
here, jobCacheDir (existing) would become job.local.dir now.

If you want jobCacheDir to point to "mapred/local/taskTracker/jobcache/<job_id>/", this cannot be available via a config variable. Because it cannot take a unique value as it can be present in more than one disk. For example, we can have task directory ( mapred/local/taskTracker/jobcache/<job_id>/<taskid>) on a disk otherthan job.local.dir. 

Finally, we will have mapred.jar and job.local.dir (earlier jobCachedir) , both at different locations.

Thoughts? 

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558045#action_12558045 ] 

Devaraj Das commented on HADOOP-2116:
-------------------------------------

The only problem with this are the incompatible changes (like ../work and ../work/script); code, especially scripts that assume paths will break. So,  is everyone okay with this for 0.16? Should we do the symlink stuff to maintain backward compatibility. As an aside, in the directory organization Owen suggested, one thing that needs to be added is the common scratch space for all tasks (like the file cache). 

Another thing IMO is that we should probably just do the basic dir organization as was proposed by Amareshwari earlier and the streaming fix. The magnitude of the change required by the dir organization proposed by Owen seems pretty significant and seems aggressive for 0.16. Maybe we can do the remaining for 0.17. Thoughts?

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559476#action_12559476 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2116:
--------------------------------------------------

bq. I'd like to take this opportunity to take a hard look at streaming's -file option too. The unjar/jar way is completely backwards! We should rework the -file option to use the DistributedCache and the symlink option it provides.

I created HADOOP-2622 to look at -file option. 
For 16.0, this issue will address the directory structure proposed earlier rather than eloberated structure proposed later.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557149#action_12557149 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2116:
--------------------------------------------------

I propose the following  the job cache directory structure to address the above needs:
{noformat}
mapred/local/tasktracker/jobcache/<jobid>/
                                  --------> job_jar_xml/
                                             ---------> job.jar
                                             ---------> job.xml
                                             ---------> unJarred directory
                                  --------> work/
                                  --------><taskdir>

{noformat}
And we can have the directories job_jar_xml, work and taskdir on different disks.
 mapred/local/tasktracker/jobcache/<jobid>/job_jar_xml/job.jar is available via mapred.jar
and mapred/local/tasktracker/jobcache/<jobid>/work is available via job.local.dir , which is an empty directory.

Thoughts?


> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554006 ] 

Konstantin Shvachko commented on HADOOP-2116:
---------------------------------------------

This is also practically fixed by HADOOP-2227. The only thing left is to expose the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache/<job_id>/job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
    return get("job.local.dir") + "job.xml";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}


> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554412 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2116:
--------------------------------------------------

bq. So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".

We cannot replace mapred.jar by job.local.dir because mapred.jar can be set and get by setJar() and getJar() from client side. For example, launchWordCount in TestMiniMRClassPath gives a different path for jar file. 

To expose the shared directory through the configuration,  We can set
localJobConf.set("job.local.dir", jobDir) in localizeJob()
and job Cache directory can be obtained as
File jobCacheDir = new File(new File(conf.get("job.local.dir")), "work");

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Status: Patch Available  (was: Open)

Submitting the patch with the proposed approach. 

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554006 ] 

shv edited comment on HADOOP-2116 at 12/21/07 11:32 AM:
------------------------------------------------------------------------

This is also practically fixed by HADOOP-2227. The only thing left is to expose the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache/<job_id>/job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
    return get("job.local.dir") + "/job.jar";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}


      was (Author: shv):
    This is also practically fixed by HADOOP-2227. The only thing left is to expose the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache/<job_id>/job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
    return get("job.local.dir") + "job.xml";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}

  
> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Attachment: patch-2116.txt

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Status: Patch Available  (was: Open)

Submiting again with fix for streaming and isolation runner.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558150#action_12558150 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

Since this bug is scheduled for 0.16, having incompatible changes in that release is fine (of course, as long as it is flagged such in the release notes.)

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557418#action_12557418 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

Does this mean that all the taskdir will again use the same partition ?
It would be opposite of HADOOP-2227, right ?
Thats not good performance-wise too, since all tasks will be using the same spindle.


> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu reassigned HADOOP-2116:
------------------------------------------------

    Assignee: Amareshwari Sri Ramadasu

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557918#action_12557918 ] 

owen.omalley edited comment on HADOOP-2116 at 1/10/08 10:42 PM:
-----------------------------------------------------------------

*Ugh* is right.

I'd propose some better names:

{code}
$local/work/$jobid/
       cache/               -- file cache
       jars/                    -- expanded jar
       job.xml               -- the generic job conf
       $taskid/
             job.xml        -- task localized job conf
             output/         -- map outputs
             work/            -- cwd for task
{code}

with each of the leaf directories being placed independently on the partitions.

We should define localized attributes to point to where each of the leaf directories is.

I agree with Arun that we should re-work the -file option to use the file cache with symlinks.
    

      was (Author: owen.omalley):
    *Ugh* is right.

I'd propose some better names:

$local/work/$jobid/
       cache/               -- file cache
       jars/                    -- expanded jar
       job.xml               -- the generic job conf
       $taskid/
             job.xml        -- task localized job conf
             output/         -- map outputs
             work/            -- cwd for task
 
with each of the leaf directories being placed independently on the partitions.

We should define localized attributes to point to where each of the leaf directories is.

I agree with Arun that we should re-work the -file option to use the file cache with symlinks.
    
  
> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Status: Open  (was: Patch Available)

has to fix the streaming jobCacheDir.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Attachment: patch-2116.txt

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561229#action_12561229 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2116:
--------------------------------------------------

A clarification regarding distributed cache:
The current behavior of distributed cache is that the distributed cache is shared among the jobs. The cache is localized under mapred/local/tasktracker/archive. i.e If two jobs want to localize files with same name, they actually share them unless they have different file timestamps.  Whenever a task releases cache, it decrements the reference count for the cache-id. Cache is cleaned up only when the cache size exceeds the allowed lize (local.cache.size).  
Is it the intended behavior, or should the cache be job specific? With the directory structure that Owen has suggested, it seems like cache should be job specific.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557429#action_12557429 ] 

Milind Bhandarkar commented on HADOOP-2116:
-------------------------------------------

In that case, +1 for this approach !

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558209#action_12558209 ] 

Hadoop QA commented on HADOOP-2116:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372896/patch-2116.txt
against trunk revision r611361.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1552/console

This message is automatically generated.

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Amareshwari Sri Ramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sri Ramadasu updated HADOOP-2116:
---------------------------------------------

    Fix Version/s: 0.17.0

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.17.0
>
>         Attachments: patch-2116.txt, patch-2116.txt, patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557425#action_12557425 ] 

Devaraj Das commented on HADOOP-2116:
-------------------------------------

bq. Does this mean that all the taskdir will again use the same partition ?
No, the taskdir will be on different disks (using the LocalDirAllocator). The common directories for all tasks of a given job will  the job.local.dir/jobCacheDir and the job_jar_xml (they will be configured/setup once per job using the LocalDirAllocator). 

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557807#action_12557807 ] 

lohit vijayarenu commented on HADOOP-2116:
------------------------------------------

Hi Amareshwari,

I tested this patch against trunk for resolution of HADOOP-2570. This solves the problem mentioned in HADOOP-2570. Should this patch be marked to go in 0.15.3 ? 

Thanks,
Lohit

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.