You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Parth Savani <pa...@sensenetworks.com> on 2012/10/23 19:32:39 UTC

File Permissions on s3 FileSystem

Hello Everyone,
        I am trying to run a hadoop job with s3n as my filesystem.
I changed the following properties in my hdfs-site.xml

fs.default.name=s3n://KEY:VALUE@bucket/
mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp

When i run the job from ec2, I get the following error

The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user
at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)

I am using cloudera CDH4 hadoop distribution. The error is thrown from
JobSubmissionFiles.java class
 public static Path getStagingDir(JobClient client, Configuration conf)
  throws IOException, InterruptedException {
    Path stagingArea = client.getStagingAreaDir();
    FileSystem fs = stagingArea.getFileSystem(conf);
    String realUser;
    String currentUser;
    UserGroupInformation ugi = UserGroupInformation.getLoginUser();
    realUser = ugi.getShortUserName();
    currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
    if (fs.exists(stagingArea)) {
      FileStatus fsStatus = fs.getFileStatus(stagingArea);
      String owner = fsStatus.*getOwner();*
      if (!(owner.equals(currentUser) || owner.equals(realUser))) {
         throw new IOException("*The ownership on the staging directory " +*
*                      stagingArea + " is not as expected. " + *
*                      "It is owned by " + owner + ". The directory must " +
*
*                      "be owned by the submitter " + currentUser + " or " +
*
*                      "by " + realUser*);
      }
      if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
        LOG.info("Permissions on staging directory " + stagingArea + " are
" +
          "incorrect: " + fsStatus.getPermission() + ". Fixing permissions
" +
          "to correct value " + JOB_DIR_PERMISSION);
        fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
      }
    } else {
      fs.mkdirs(stagingArea,
          new FsPermission(JOB_DIR_PERMISSION));
    }
    return stagingArea;
  }



I think my job calls getOwner() which returns NULL since s3 does not have
file permissions which results in the IO exception that i am getting.

Any workaround for this? Any idea how i could you s3 as the filesystem with
hadoop on distributed mode?

Re: File Permissions on s3 FileSystem

Posted by Marcos Ortiz <ml...@uci.cu>.
El 23/10/12 13:32, Parth Savani escribió:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/
A good practice to this is to use these two properties in the 
core-site.xml, if you will use S3 often:
<property>
     <name>fs.s3.awsAccessKeyId</name>
     <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
     <name>fs.s3.awsSecretAccessKey</name>
     <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
  s3://<s3-bucket>/<s3-filepath>

S3n:
  s3n://<s3-bucket>/<s3-filepath>

> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory 
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is 
> owned by   The directory must be owned by the submitter ec2-user or by 
> ec2-user
> at 
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from 
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = 
> UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.*getOwner();*
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("*The ownership on the staging 
> directory " +*
> *                      stagingArea + " is not as expected. " + *
> *                      "It is owned by " + owner + ". The directory 
> must " +*
> *                      "be owned by the submitter " + currentUser + " 
> or " +*
> *                      "by " + realUser*);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " 
> are " +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing 
> permissions " +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
> I think my job calls getOwner() which returns NULL since s3 does not 
> have file permissions which results in the IO exception that i am 
> getting.
Which what user are you launching the job in EC2?


>
> Any workaround for this? Any idea how i could you s3 as the filesystem 
> with hadoop on distributed mode?

Look here:
http://wiki.apache.org/hadoop/AmazonS3



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Parth,

I think your problems are easier to solve if you run a 1-node HDFS as
the stage area for MR (i.e. JT FS is HDFS), and just do the I/O of
actual data over S3 (i.e. input and output paths for jobs are s3 or
s3n prefixed).

The JIRA you mention does have interesting workarounds, but the mere
fact that S3 doesn't support permission models, may break other places
in MR where we do permission logic for security reasons. You could get
away with doing one of the mentioned source hacks, but that won't be
necessarily guaranteeing you'll solve all problems, cause we don't
test MR running atop S3, though I think we do test S3 as a general FS
for I/O.

On Fri, Oct 26, 2012 at 1:22 AM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Harsh,
>          I am following steps based on this link:
> http://wiki.apache.org/hadoop/AmazonS3
>
> When i run the job, I am seeing that the hadoop places all the jars required
> for the job on s3. However, when it tries to run the job, it complains
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
>
> Some people have seemed to solved this problem of permissions here ->
> https://issues.apache.org/jira/browse/HDFS-1333
> But they have made changes to some hadoop java classes and I wonder if
> there's an easy workaround.
>
>
> On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Parth,
>>
>> I don't think its possible to run MR by basing the FS over S3
>> completely. You can use S3 for I/O for your files, but your
>> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
>> filesystems. This way, your MR framework can run/distribute its files
>> well, and also still be able to process S3 URLs passed as input or
>> output locations.
>>
>> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
>> wrote:
>> > Hello Everyone,
>> >         I am trying to run a hadoop job with s3n as my filesystem.
>> > I changed the following properties in my hdfs-site.xml
>> >
>> > fs.default.name=s3n://KEY:VALUE@bucket/
>> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>> >
>> > When i run the job from ec2, I get the following error
>> >
>> > The ownership on the staging directory
>> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
>> > owned
>> > by   The directory must be owned by the submitter ec2-user or by
>> > ec2-user
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>> >
>> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
>> > JobSubmissionFiles.java class
>> >  public static Path getStagingDir(JobClient client, Configuration conf)
>> >   throws IOException, InterruptedException {
>> >     Path stagingArea = client.getStagingAreaDir();
>> >     FileSystem fs = stagingArea.getFileSystem(conf);
>> >     String realUser;
>> >     String currentUser;
>> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>> >     realUser = ugi.getShortUserName();
>> >     currentUser =
>> > UserGroupInformation.getCurrentUser().getShortUserName();
>> >     if (fs.exists(stagingArea)) {
>> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>> >       String owner = fsStatus.getOwner();
>> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>> >          throw new IOException("The ownership on the staging directory "
>> > +
>> >                       stagingArea + " is not as expected. " +
>> >                       "It is owned by " + owner + ". The directory must
>> > " +
>> >                       "be owned by the submitter " + currentUser + " or
>> > " +
>> >                       "by " + realUser);
>> >       }
>> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>> >         LOG.info("Permissions on staging directory " + stagingArea + "
>> > are "
>> > +
>> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
>> > permissions "
>> > +
>> >           "to correct value " + JOB_DIR_PERMISSION);
>> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>> >       }
>> >     } else {
>> >       fs.mkdirs(stagingArea,
>> >           new FsPermission(JOB_DIR_PERMISSION));
>> >     }
>> >     return stagingArea;
>> >   }
>> >
>> >
>> >
>> > I think my job calls getOwner() which returns NULL since s3 does not
>> > have
>> > file permissions which results in the IO exception that i am getting.
>> >
>> > Any workaround for this? Any idea how i could you s3 as the filesystem
>> > with
>> > hadoop on distributed mode?
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Parth,

I think your problems are easier to solve if you run a 1-node HDFS as
the stage area for MR (i.e. JT FS is HDFS), and just do the I/O of
actual data over S3 (i.e. input and output paths for jobs are s3 or
s3n prefixed).

The JIRA you mention does have interesting workarounds, but the mere
fact that S3 doesn't support permission models, may break other places
in MR where we do permission logic for security reasons. You could get
away with doing one of the mentioned source hacks, but that won't be
necessarily guaranteeing you'll solve all problems, cause we don't
test MR running atop S3, though I think we do test S3 as a general FS
for I/O.

On Fri, Oct 26, 2012 at 1:22 AM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Harsh,
>          I am following steps based on this link:
> http://wiki.apache.org/hadoop/AmazonS3
>
> When i run the job, I am seeing that the hadoop places all the jars required
> for the job on s3. However, when it tries to run the job, it complains
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
>
> Some people have seemed to solved this problem of permissions here ->
> https://issues.apache.org/jira/browse/HDFS-1333
> But they have made changes to some hadoop java classes and I wonder if
> there's an easy workaround.
>
>
> On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Parth,
>>
>> I don't think its possible to run MR by basing the FS over S3
>> completely. You can use S3 for I/O for your files, but your
>> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
>> filesystems. This way, your MR framework can run/distribute its files
>> well, and also still be able to process S3 URLs passed as input or
>> output locations.
>>
>> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
>> wrote:
>> > Hello Everyone,
>> >         I am trying to run a hadoop job with s3n as my filesystem.
>> > I changed the following properties in my hdfs-site.xml
>> >
>> > fs.default.name=s3n://KEY:VALUE@bucket/
>> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>> >
>> > When i run the job from ec2, I get the following error
>> >
>> > The ownership on the staging directory
>> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
>> > owned
>> > by   The directory must be owned by the submitter ec2-user or by
>> > ec2-user
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>> >
>> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
>> > JobSubmissionFiles.java class
>> >  public static Path getStagingDir(JobClient client, Configuration conf)
>> >   throws IOException, InterruptedException {
>> >     Path stagingArea = client.getStagingAreaDir();
>> >     FileSystem fs = stagingArea.getFileSystem(conf);
>> >     String realUser;
>> >     String currentUser;
>> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>> >     realUser = ugi.getShortUserName();
>> >     currentUser =
>> > UserGroupInformation.getCurrentUser().getShortUserName();
>> >     if (fs.exists(stagingArea)) {
>> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>> >       String owner = fsStatus.getOwner();
>> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>> >          throw new IOException("The ownership on the staging directory "
>> > +
>> >                       stagingArea + " is not as expected. " +
>> >                       "It is owned by " + owner + ". The directory must
>> > " +
>> >                       "be owned by the submitter " + currentUser + " or
>> > " +
>> >                       "by " + realUser);
>> >       }
>> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>> >         LOG.info("Permissions on staging directory " + stagingArea + "
>> > are "
>> > +
>> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
>> > permissions "
>> > +
>> >           "to correct value " + JOB_DIR_PERMISSION);
>> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>> >       }
>> >     } else {
>> >       fs.mkdirs(stagingArea,
>> >           new FsPermission(JOB_DIR_PERMISSION));
>> >     }
>> >     return stagingArea;
>> >   }
>> >
>> >
>> >
>> > I think my job calls getOwner() which returns NULL since s3 does not
>> > have
>> > file permissions which results in the IO exception that i am getting.
>> >
>> > Any workaround for this? Any idea how i could you s3 as the filesystem
>> > with
>> > hadoop on distributed mode?
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Parth,

I think your problems are easier to solve if you run a 1-node HDFS as
the stage area for MR (i.e. JT FS is HDFS), and just do the I/O of
actual data over S3 (i.e. input and output paths for jobs are s3 or
s3n prefixed).

The JIRA you mention does have interesting workarounds, but the mere
fact that S3 doesn't support permission models, may break other places
in MR where we do permission logic for security reasons. You could get
away with doing one of the mentioned source hacks, but that won't be
necessarily guaranteeing you'll solve all problems, cause we don't
test MR running atop S3, though I think we do test S3 as a general FS
for I/O.

On Fri, Oct 26, 2012 at 1:22 AM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Harsh,
>          I am following steps based on this link:
> http://wiki.apache.org/hadoop/AmazonS3
>
> When i run the job, I am seeing that the hadoop places all the jars required
> for the job on s3. However, when it tries to run the job, it complains
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
>
> Some people have seemed to solved this problem of permissions here ->
> https://issues.apache.org/jira/browse/HDFS-1333
> But they have made changes to some hadoop java classes and I wonder if
> there's an easy workaround.
>
>
> On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Parth,
>>
>> I don't think its possible to run MR by basing the FS over S3
>> completely. You can use S3 for I/O for your files, but your
>> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
>> filesystems. This way, your MR framework can run/distribute its files
>> well, and also still be able to process S3 URLs passed as input or
>> output locations.
>>
>> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
>> wrote:
>> > Hello Everyone,
>> >         I am trying to run a hadoop job with s3n as my filesystem.
>> > I changed the following properties in my hdfs-site.xml
>> >
>> > fs.default.name=s3n://KEY:VALUE@bucket/
>> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>> >
>> > When i run the job from ec2, I get the following error
>> >
>> > The ownership on the staging directory
>> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
>> > owned
>> > by   The directory must be owned by the submitter ec2-user or by
>> > ec2-user
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>> >
>> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
>> > JobSubmissionFiles.java class
>> >  public static Path getStagingDir(JobClient client, Configuration conf)
>> >   throws IOException, InterruptedException {
>> >     Path stagingArea = client.getStagingAreaDir();
>> >     FileSystem fs = stagingArea.getFileSystem(conf);
>> >     String realUser;
>> >     String currentUser;
>> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>> >     realUser = ugi.getShortUserName();
>> >     currentUser =
>> > UserGroupInformation.getCurrentUser().getShortUserName();
>> >     if (fs.exists(stagingArea)) {
>> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>> >       String owner = fsStatus.getOwner();
>> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>> >          throw new IOException("The ownership on the staging directory "
>> > +
>> >                       stagingArea + " is not as expected. " +
>> >                       "It is owned by " + owner + ". The directory must
>> > " +
>> >                       "be owned by the submitter " + currentUser + " or
>> > " +
>> >                       "by " + realUser);
>> >       }
>> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>> >         LOG.info("Permissions on staging directory " + stagingArea + "
>> > are "
>> > +
>> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
>> > permissions "
>> > +
>> >           "to correct value " + JOB_DIR_PERMISSION);
>> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>> >       }
>> >     } else {
>> >       fs.mkdirs(stagingArea,
>> >           new FsPermission(JOB_DIR_PERMISSION));
>> >     }
>> >     return stagingArea;
>> >   }
>> >
>> >
>> >
>> > I think my job calls getOwner() which returns NULL since s3 does not
>> > have
>> > file permissions which results in the IO exception that i am getting.
>> >
>> > Any workaround for this? Any idea how i could you s3 as the filesystem
>> > with
>> > hadoop on distributed mode?
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Parth,

I think your problems are easier to solve if you run a 1-node HDFS as
the stage area for MR (i.e. JT FS is HDFS), and just do the I/O of
actual data over S3 (i.e. input and output paths for jobs are s3 or
s3n prefixed).

The JIRA you mention does have interesting workarounds, but the mere
fact that S3 doesn't support permission models, may break other places
in MR where we do permission logic for security reasons. You could get
away with doing one of the mentioned source hacks, but that won't be
necessarily guaranteeing you'll solve all problems, cause we don't
test MR running atop S3, though I think we do test S3 as a general FS
for I/O.

On Fri, Oct 26, 2012 at 1:22 AM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Harsh,
>          I am following steps based on this link:
> http://wiki.apache.org/hadoop/AmazonS3
>
> When i run the job, I am seeing that the hadoop places all the jars required
> for the job on s3. However, when it tries to run the job, it complains
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
>
> Some people have seemed to solved this problem of permissions here ->
> https://issues.apache.org/jira/browse/HDFS-1333
> But they have made changes to some hadoop java classes and I wonder if
> there's an easy workaround.
>
>
> On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Parth,
>>
>> I don't think its possible to run MR by basing the FS over S3
>> completely. You can use S3 for I/O for your files, but your
>> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
>> filesystems. This way, your MR framework can run/distribute its files
>> well, and also still be able to process S3 URLs passed as input or
>> output locations.
>>
>> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
>> wrote:
>> > Hello Everyone,
>> >         I am trying to run a hadoop job with s3n as my filesystem.
>> > I changed the following properties in my hdfs-site.xml
>> >
>> > fs.default.name=s3n://KEY:VALUE@bucket/
>> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>> >
>> > When i run the job from ec2, I get the following error
>> >
>> > The ownership on the staging directory
>> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
>> > owned
>> > by   The directory must be owned by the submitter ec2-user or by
>> > ec2-user
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>> >
>> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
>> > JobSubmissionFiles.java class
>> >  public static Path getStagingDir(JobClient client, Configuration conf)
>> >   throws IOException, InterruptedException {
>> >     Path stagingArea = client.getStagingAreaDir();
>> >     FileSystem fs = stagingArea.getFileSystem(conf);
>> >     String realUser;
>> >     String currentUser;
>> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>> >     realUser = ugi.getShortUserName();
>> >     currentUser =
>> > UserGroupInformation.getCurrentUser().getShortUserName();
>> >     if (fs.exists(stagingArea)) {
>> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>> >       String owner = fsStatus.getOwner();
>> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>> >          throw new IOException("The ownership on the staging directory "
>> > +
>> >                       stagingArea + " is not as expected. " +
>> >                       "It is owned by " + owner + ". The directory must
>> > " +
>> >                       "be owned by the submitter " + currentUser + " or
>> > " +
>> >                       "by " + realUser);
>> >       }
>> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>> >         LOG.info("Permissions on staging directory " + stagingArea + "
>> > are "
>> > +
>> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
>> > permissions "
>> > +
>> >           "to correct value " + JOB_DIR_PERMISSION);
>> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>> >       }
>> >     } else {
>> >       fs.mkdirs(stagingArea,
>> >           new FsPermission(JOB_DIR_PERMISSION));
>> >     }
>> >     return stagingArea;
>> >   }
>> >
>> >
>> >
>> > I think my job calls getOwner() which returns NULL since s3 does not
>> > have
>> > file permissions which results in the IO exception that i am getting.
>> >
>> > Any workaround for this? Any idea how i could you s3 as the filesystem
>> > with
>> > hadoop on distributed mode?
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Parth Savani <pa...@sensenetworks.com>.
Hello Harsh,
         I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here ->
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Parth,
>
> I don't think its possible to run MR by basing the FS over S3
> completely. You can use S3 for I/O for your files, but your
> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
> filesystems. This way, your MR framework can run/distribute its files
> well, and also still be able to process S3 URLs passed as input or
> output locations.
>
> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
> wrote:
> > Hello Everyone,
> >         I am trying to run a hadoop job with s3n as my filesystem.
> > I changed the following properties in my hdfs-site.xml
> >
> > fs.default.name=s3n://KEY:VALUE@bucket/
> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
> >
> > When i run the job from ec2, I get the following error
> >
> > The ownership on the staging directory
> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned
> > by   The directory must be owned by the submitter ec2-user or by ec2-user
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
> >
> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
> > JobSubmissionFiles.java class
> >  public static Path getStagingDir(JobClient client, Configuration conf)
> >   throws IOException, InterruptedException {
> >     Path stagingArea = client.getStagingAreaDir();
> >     FileSystem fs = stagingArea.getFileSystem(conf);
> >     String realUser;
> >     String currentUser;
> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> >     realUser = ugi.getShortUserName();
> >     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
> >     if (fs.exists(stagingArea)) {
> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
> >       String owner = fsStatus.getOwner();
> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
> >          throw new IOException("The ownership on the staging directory "
> +
> >                       stagingArea + " is not as expected. " +
> >                       "It is owned by " + owner + ". The directory must
> " +
> >                       "be owned by the submitter " + currentUser + " or
> " +
> >                       "by " + realUser);
> >       }
> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
> >         LOG.info("Permissions on staging directory " + stagingArea + "
> are "
> > +
> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions "
> > +
> >           "to correct value " + JOB_DIR_PERMISSION);
> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> >       }
> >     } else {
> >       fs.mkdirs(stagingArea,
> >           new FsPermission(JOB_DIR_PERMISSION));
> >     }
> >     return stagingArea;
> >   }
> >
> >
> >
> > I think my job calls getOwner() which returns NULL since s3 does not have
> > file permissions which results in the IO exception that i am getting.
> >
> > Any workaround for this? Any idea how i could you s3 as the filesystem
> with
> > hadoop on distributed mode?
>
>
>
> --
> Harsh J
>

Re: File Permissions on s3 FileSystem

Posted by Parth Savani <pa...@sensenetworks.com>.
Hello Harsh,
         I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here ->
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Parth,
>
> I don't think its possible to run MR by basing the FS over S3
> completely. You can use S3 for I/O for your files, but your
> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
> filesystems. This way, your MR framework can run/distribute its files
> well, and also still be able to process S3 URLs passed as input or
> output locations.
>
> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
> wrote:
> > Hello Everyone,
> >         I am trying to run a hadoop job with s3n as my filesystem.
> > I changed the following properties in my hdfs-site.xml
> >
> > fs.default.name=s3n://KEY:VALUE@bucket/
> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
> >
> > When i run the job from ec2, I get the following error
> >
> > The ownership on the staging directory
> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned
> > by   The directory must be owned by the submitter ec2-user or by ec2-user
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
> >
> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
> > JobSubmissionFiles.java class
> >  public static Path getStagingDir(JobClient client, Configuration conf)
> >   throws IOException, InterruptedException {
> >     Path stagingArea = client.getStagingAreaDir();
> >     FileSystem fs = stagingArea.getFileSystem(conf);
> >     String realUser;
> >     String currentUser;
> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> >     realUser = ugi.getShortUserName();
> >     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
> >     if (fs.exists(stagingArea)) {
> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
> >       String owner = fsStatus.getOwner();
> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
> >          throw new IOException("The ownership on the staging directory "
> +
> >                       stagingArea + " is not as expected. " +
> >                       "It is owned by " + owner + ". The directory must
> " +
> >                       "be owned by the submitter " + currentUser + " or
> " +
> >                       "by " + realUser);
> >       }
> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
> >         LOG.info("Permissions on staging directory " + stagingArea + "
> are "
> > +
> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions "
> > +
> >           "to correct value " + JOB_DIR_PERMISSION);
> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> >       }
> >     } else {
> >       fs.mkdirs(stagingArea,
> >           new FsPermission(JOB_DIR_PERMISSION));
> >     }
> >     return stagingArea;
> >   }
> >
> >
> >
> > I think my job calls getOwner() which returns NULL since s3 does not have
> > file permissions which results in the IO exception that i am getting.
> >
> > Any workaround for this? Any idea how i could you s3 as the filesystem
> with
> > hadoop on distributed mode?
>
>
>
> --
> Harsh J
>

Re: File Permissions on s3 FileSystem

Posted by Parth Savani <pa...@sensenetworks.com>.
Hello Harsh,
         I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here ->
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Parth,
>
> I don't think its possible to run MR by basing the FS over S3
> completely. You can use S3 for I/O for your files, but your
> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
> filesystems. This way, your MR framework can run/distribute its files
> well, and also still be able to process S3 URLs passed as input or
> output locations.
>
> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
> wrote:
> > Hello Everyone,
> >         I am trying to run a hadoop job with s3n as my filesystem.
> > I changed the following properties in my hdfs-site.xml
> >
> > fs.default.name=s3n://KEY:VALUE@bucket/
> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
> >
> > When i run the job from ec2, I get the following error
> >
> > The ownership on the staging directory
> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned
> > by   The directory must be owned by the submitter ec2-user or by ec2-user
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
> >
> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
> > JobSubmissionFiles.java class
> >  public static Path getStagingDir(JobClient client, Configuration conf)
> >   throws IOException, InterruptedException {
> >     Path stagingArea = client.getStagingAreaDir();
> >     FileSystem fs = stagingArea.getFileSystem(conf);
> >     String realUser;
> >     String currentUser;
> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> >     realUser = ugi.getShortUserName();
> >     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
> >     if (fs.exists(stagingArea)) {
> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
> >       String owner = fsStatus.getOwner();
> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
> >          throw new IOException("The ownership on the staging directory "
> +
> >                       stagingArea + " is not as expected. " +
> >                       "It is owned by " + owner + ". The directory must
> " +
> >                       "be owned by the submitter " + currentUser + " or
> " +
> >                       "by " + realUser);
> >       }
> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
> >         LOG.info("Permissions on staging directory " + stagingArea + "
> are "
> > +
> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions "
> > +
> >           "to correct value " + JOB_DIR_PERMISSION);
> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> >       }
> >     } else {
> >       fs.mkdirs(stagingArea,
> >           new FsPermission(JOB_DIR_PERMISSION));
> >     }
> >     return stagingArea;
> >   }
> >
> >
> >
> > I think my job calls getOwner() which returns NULL since s3 does not have
> > file permissions which results in the IO exception that i am getting.
> >
> > Any workaround for this? Any idea how i could you s3 as the filesystem
> with
> > hadoop on distributed mode?
>
>
>
> --
> Harsh J
>

Re: File Permissions on s3 FileSystem

Posted by Parth Savani <pa...@sensenetworks.com>.
Hello Harsh,
         I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here ->
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Parth,
>
> I don't think its possible to run MR by basing the FS over S3
> completely. You can use S3 for I/O for your files, but your
> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
> filesystems. This way, your MR framework can run/distribute its files
> well, and also still be able to process S3 URLs passed as input or
> output locations.
>
> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
> wrote:
> > Hello Everyone,
> >         I am trying to run a hadoop job with s3n as my filesystem.
> > I changed the following properties in my hdfs-site.xml
> >
> > fs.default.name=s3n://KEY:VALUE@bucket/
> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
> >
> > When i run the job from ec2, I get the following error
> >
> > The ownership on the staging directory
> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned
> > by   The directory must be owned by the submitter ec2-user or by ec2-user
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
> >
> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
> > JobSubmissionFiles.java class
> >  public static Path getStagingDir(JobClient client, Configuration conf)
> >   throws IOException, InterruptedException {
> >     Path stagingArea = client.getStagingAreaDir();
> >     FileSystem fs = stagingArea.getFileSystem(conf);
> >     String realUser;
> >     String currentUser;
> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> >     realUser = ugi.getShortUserName();
> >     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
> >     if (fs.exists(stagingArea)) {
> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
> >       String owner = fsStatus.getOwner();
> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
> >          throw new IOException("The ownership on the staging directory "
> +
> >                       stagingArea + " is not as expected. " +
> >                       "It is owned by " + owner + ". The directory must
> " +
> >                       "be owned by the submitter " + currentUser + " or
> " +
> >                       "by " + realUser);
> >       }
> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
> >         LOG.info("Permissions on staging directory " + stagingArea + "
> are "
> > +
> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions "
> > +
> >           "to correct value " + JOB_DIR_PERMISSION);
> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> >       }
> >     } else {
> >       fs.mkdirs(stagingArea,
> >           new FsPermission(JOB_DIR_PERMISSION));
> >     }
> >     return stagingArea;
> >   }
> >
> >
> >
> > I think my job calls getOwner() which returns NULL since s3 does not have
> > file permissions which results in the IO exception that i am getting.
> >
> > Any workaround for this? Any idea how i could you s3 as the filesystem
> with
> > hadoop on distributed mode?
>
>
>
> --
> Harsh J
>

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Marcos Ortiz <ml...@uci.cu>.
El 23/10/12 13:32, Parth Savani escribió:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/
A good practice to this is to use these two properties in the 
core-site.xml, if you will use S3 often:
<property>
     <name>fs.s3.awsAccessKeyId</name>
     <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
     <name>fs.s3.awsSecretAccessKey</name>
     <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
  s3://<s3-bucket>/<s3-filepath>

S3n:
  s3n://<s3-bucket>/<s3-filepath>

> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory 
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is 
> owned by   The directory must be owned by the submitter ec2-user or by 
> ec2-user
> at 
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from 
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = 
> UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.*getOwner();*
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("*The ownership on the staging 
> directory " +*
> *                      stagingArea + " is not as expected. " + *
> *                      "It is owned by " + owner + ". The directory 
> must " +*
> *                      "be owned by the submitter " + currentUser + " 
> or " +*
> *                      "by " + realUser*);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " 
> are " +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing 
> permissions " +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
> I think my job calls getOwner() which returns NULL since s3 does not 
> have file permissions which results in the IO exception that i am 
> getting.
Which what user are you launching the job in EC2?


>
> Any workaround for this? Any idea how i could you s3 as the filesystem 
> with hadoop on distributed mode?

Look here:
http://wiki.apache.org/hadoop/AmazonS3



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: File Permissions on s3 FileSystem

Posted by Marcos Ortiz <ml...@uci.cu>.
El 23/10/12 13:32, Parth Savani escribió:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/
A good practice to this is to use these two properties in the 
core-site.xml, if you will use S3 often:
<property>
     <name>fs.s3.awsAccessKeyId</name>
     <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
     <name>fs.s3.awsSecretAccessKey</name>
     <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
  s3://<s3-bucket>/<s3-filepath>

S3n:
  s3n://<s3-bucket>/<s3-filepath>

> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory 
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is 
> owned by   The directory must be owned by the submitter ec2-user or by 
> ec2-user
> at 
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from 
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = 
> UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.*getOwner();*
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("*The ownership on the staging 
> directory " +*
> *                      stagingArea + " is not as expected. " + *
> *                      "It is owned by " + owner + ". The directory 
> must " +*
> *                      "be owned by the submitter " + currentUser + " 
> or " +*
> *                      "by " + realUser*);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " 
> are " +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing 
> permissions " +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
> I think my job calls getOwner() which returns NULL since s3 does not 
> have file permissions which results in the IO exception that i am 
> getting.
Which what user are you launching the job in EC2?


>
> Any workaround for this? Any idea how i could you s3 as the filesystem 
> with hadoop on distributed mode?

Look here:
http://wiki.apache.org/hadoop/AmazonS3



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: File Permissions on s3 FileSystem

Posted by Marcos Ortiz <ml...@uci.cu>.
El 23/10/12 13:32, Parth Savani escribió:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/
A good practice to this is to use these two properties in the 
core-site.xml, if you will use S3 often:
<property>
     <name>fs.s3.awsAccessKeyId</name>
     <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
     <name>fs.s3.awsSecretAccessKey</name>
     <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
  s3://<s3-bucket>/<s3-filepath>

S3n:
  s3n://<s3-bucket>/<s3-filepath>

> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory 
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is 
> owned by   The directory must be owned by the submitter ec2-user or by 
> ec2-user
> at 
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from 
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = 
> UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.*getOwner();*
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("*The ownership on the staging 
> directory " +*
> *                      stagingArea + " is not as expected. " + *
> *                      "It is owned by " + owner + ". The directory 
> must " +*
> *                      "be owned by the submitter " + currentUser + " 
> or " +*
> *                      "by " + realUser*);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " 
> are " +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing 
> permissions " +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
> I think my job calls getOwner() which returns NULL since s3 does not 
> have file permissions which results in the IO exception that i am 
> getting.
Which what user are you launching the job in EC2?


>
> Any workaround for this? Any idea how i could you s3 as the filesystem 
> with hadoop on distributed mode?

Look here:
http://wiki.apache.org/hadoop/AmazonS3



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?



-- 
Harsh J

Re: File Permissions on s3 FileSystem

Posted by Harsh J <ha...@cloudera.com>.
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?



-- 
Harsh J