You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/02/21 23:38:34 UTC

[jira] Created: (HADOOP-52) mapred input and output dirs must be absolute

mapred input and output dirs must be absolute
---------------------------------------------

         Key: HADOOP-52
         URL: http://issues.apache.org/jira/browse/HADOOP-52
     Project: Hadoop
        Type: Bug
  Components: mapred  
    Versions: 0.1    
    Reporter: Doug Cutting
     Fix For: 0.1


DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "chris (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371761 ] 

chris commented on HADOOP-52:
-----------------------------

Sorry
but It seems this code will thown IOException in windows cygwin env......

it can't get file path correctly.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch, cwd3.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Sameer Paranjpye reassigned HADOOP-52:
--------------------------------------

    Assign To: Owen O'Malley  (was: Sameer Paranjpye)

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1

>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371476 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

The synchronized (fs) will help the case where the application submits two jobs to the LocalJobRunner with different working directories. Because the LocalJobRunner obeys the synchronization with respect to the file system object, it should work. (I don't have any parallel map/reduce job applications to test it with and testing for missing synchronization is almost impossible anyways.)

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371765 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

Can you please include the text from the exception and the stack trace? Depending on where the exception was thrown, it might be in the server log files.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch, cwd3.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Doug Cutting updated HADOOP-52:
-------------------------------

    Assign To: Sameer Paranjpye

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Sameer Paranjpye
>      Fix For: 0.1

>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371471 ] 

Doug Cutting commented on HADOOP-52:
------------------------------------

The 'synchronize (fs) ...' is worthless unless everyone does it, which they don't, so I wouldn't bother.

I think you're right that my concern about LocalJobRunner is misplaced.  It is not a public class, and a new instance is created for each job, so there's currently no way to have multiple jobs sharing a LocalJobRunner.

Longer term, if this becomes an issue, I think making FileSystem cloneable is preferable to synchronization.


> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371466 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

I'll go ahead and update the variable names to camelCast.

The LocalJobRunner only has a problem when the user is submitting multiple jobs at the same time, right?

I knew I was opening a can of worms with local directories with respect to threads. There are lots of potential solutions including making the current directory thread local. (Although the LocalFileSystem would have discontinuities, because it also sets the system property, which is by definition global.) I think for now that we are better off following the unix semantics of treating the working directory as global.

For now, I'm proposing synchronizing on the file system, like:

        synchronize (fs) {
          setWorkingDirectory(job, fs);
          MapTask map = new MapTask(file, (String)mapIds.get(i), splits[i]);
        }

around each of the places where the LocalJobRunner sets the working directory.

On a side note, if we are worried about local runners with multiple jobs, we should put synchronization in around the updates of the jobs list and other fields of the LocalJobRunner.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371483 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

By the way, i didn't see your comment from 10:17 when I submitted my patch at 10:18.

I'll go ahead and unsynchronize LocalJobRunner and roll a new patch.

I guess it does make sense to make FileSystem's clonable, even if LocalFileSystem sets the global system property. We can do that later.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Owen O'Malley updated HADOOP-52:
--------------------------------

    Attachment: cwd.patch

Here is a patch that fixes the problem. 

It does:
   1. It adds {set,get}WorkingDirectory to FileSystem.
   2. It implements them in both LocalFileSystem and DFS.
   3. The LocalFileSystem implementation both sets the System property user.dir and does an explicit
        conversion to absolute filenames at the API.
   4. Added new junit test cases to test the WorkingDirectory functionality.
   5. Added a utility class in the test directory to create a single-process DFS cluster for junit tests.
   6. Added the user name into the JobConf.
   7. Added the user name into the JobProfile.
   8. Added the user name into the webapp, so you can see who ran the job.
   9. Added the working directory in the default file system to the JobConf.
   10. Set the job's working directory before starting the user's Map or Reduce code. (The input splitter is given an absolute pathname
          for the input directory, but the working directory is not set, since it is done in the context of the JobTracker.)
   11. Changed the format of the percentage complete in the webapp to be ##0.00 so that you don't get 16 digits of meaningless precision
          about your job status.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Work started: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]
     
Work on HADOOP-52 started by Owen O'Malley

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371470 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

I'll go ahead and update the variable names to camelCast.

The LocalJobRunner only has a problem when the user is submitting multiple jobs at the same time, right?

I knew I was opening a can of worms with local directories with respect to threads. There are lots of potential solutions including making the current directory thread local. (Although the LocalFileSystem would have discontinuities, because it also sets the system property, which is by definition global.) I think for now that we are better off following the unix semantics of treating the working directory as global.

For now, I'm proposing synchronizing on the file system, like:

        synchronize (fs) {
          setWorkingDirectory(job, fs);
          MapTask map = new MapTask(file, (String)mapIds.get(i), splits[i]);
        }

around each of the places where the LocalJobRunner sets the working directory.

On a side note, if we are worried about local runners with multiple jobs, we should put synchronization in around the updates of the jobs list and other fields of the LocalJobRunner.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "chris (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371823 ] 

chris commented on HADOOP-52:
-----------------------------

I just update to the lately, start all the namenode, datanode, jobtracker and tasktracker

run nutch command (crawl / readdb etc.), and jobtracker thrown exception,


060325 001911 Property 'file.separator' is \
060325 001911 Property 'java.vendor.url.bug' is http://java.sun.com/cgi-bin/bugr
eport.cgi
060325 001911 Property 'sun.io.unicode.encoding' is UnicodeLittle
060325 001911 Property 'sun.cpu.endian' is little
060325 001911 Property 'sun.desktop' is windows
060325 001911 Property 'sun.cpu.isalist' is
060325 001911 Version Jetty/5.1.4
060325 001911 Checking Resource aliases
060325 001912 Started org.mortbay.jetty.servlet.WebApplicationHandler@1d7fbfb
060325 001912 Started WebApplicationContext[/,/]
060325 001912 Started SocketListener on 0.0.0.0:50030
060325 001912 Started org.mortbay.jetty.Server@c88440
060325 001921 Server connection on port 50020 from 192.168.1.10: starting
060325 001927 Server connection on port 50020 from 192.168.1.11: starting
060325 001933 Server connection on port 50020 from 192.168.1.12: starting
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-default.xml
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/mapred-default.xml
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-site.xml
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-default.xml
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/mapred-default.xml
060325 002240 parsing \tmp\hadoop\mapred\local\job_3dgnm3.xml\jobTracker
060325 002240 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-site.xml
060325 002258 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-default.xml
060325 002258 parsing file:/D:/jobcall_trunk/hadoop/conf/mapred-default.xml
060325 002258 parsing \tmp\hadoop\mapred\local\job_3dgnm3.xml\jobTracker
060325 002258 parsing file:/D:/jobcall_trunk/hadoop/conf/hadoop-site.xml
060325 002258 job init failed
java.io.IOException: No input directories specified in: Configuration: defaults:
 hadoop-default.xml , mapred-default.xml , \tmp\hadoop\mapred\local\job_3dgnm3.x
ml\jobTrackerfinal: hadoop-site.xml
        at org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.ja
va:90)
        at org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.ja
va:100)
        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:1
30)
        at org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java
:204)
        at java.lang.Thread.run(Thread.java:595)
060325 002300 Server connection on port 50020 from 192.168.1.12: exiting


> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch, cwd3.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Owen O'Malley updated HADOOP-52:
--------------------------------

    Attachment: cwd2.patch

Ok, here is an updated patch that uses camelCase everywhere for variable names.

I also put in synchronization between the setWorkingDirectory and running the 
application code in the LocalJobRunner.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Owen O'Malley updated HADOOP-52:
--------------------------------

    Comment: was deleted

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371451 ] 

Doug Cutting commented on HADOOP-52:
------------------------------------

Overall this looks great!  A few minor comments:

. your new local variables and non-static fields use underscores as word separators, rather than camelCase, as is both the Java convention and the convention used in the files you're changing.  Can you please switch these to use camelCase?

. In LocalJobRunner there are potentially multiple threads changing the working directory of a single fs.  Perhaps we should instead have a different fs instance for each job thread?  Should we make FileSystem support clone()?

Thanks!


> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]

Owen O'Malley updated HADOOP-52:
--------------------------------

    Attachment: cwd3.patch

Ok, this patch includes cwd2.patch with the synchronization in LocalJobRunner rolled back.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch, cwd3.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-52?page=all ]
     
Doug Cutting resolved HADOOP-52:
--------------------------------

    Resolution: Fixed

I have committed this.  Thanks for adding unit tests! I had to add a Thread.sleep(1000) to the dfs test code before the tests would pass, to give the datanode a chance to introduce itself to the namenode before the client code started using the fs.  Perhaps this is just masking a bug.  Perhaps DFSClient should sleep and retry when NameNode.create() throws an exception?

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch, cwd3.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371479 ] 

Doug Cutting commented on HADOOP-52:
------------------------------------

The synchronization fixes the problem, but causes everything to run sequentially.  I think with the same amount of code we could add a clone() method to FileSystem and provide a better fix.  Currently one cannot pass a FileSystem instance to JobClient, but if one could, then we would be changing the CWD of that filesystem underneath other processes that might be using it w/o synchronization.

If we want to be totally safe, we could make setWorkingDirectory return a clone, with no side-effects.  Cloning an FS should be cheap enough.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch, cwd2.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted by a different user than is running the jobtracker and tasktracker.  Thus relative paths must be resolved before a job is submitted, so that only absolute paths are seen on the job tracker and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(), etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira