You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Asaf Lahav <as...@gmail.com> on 2014/04/10 14:14:30 UTC

Executing spark jobs with predefined Hadoop user

Hi,

We are using Spark with data files on HDFS. The files are stored as files
for predefined hadoop user ("hdfs").

The folder is permitted with

·         read write, executable and read permission for the hdfs user

·         executable and read permission for users in the group

·         just read permission for all other users



now the Spark write operation fails, due to a user mismatch of the spark
context and the Hadoop user permission.

Is there a way to start the Spark Context with another user than the one
configured on the local machine?







Please the technical details below:











The permission on the hdfs folder "/tmp/Iris" is as follows:

drwxr-xr-x   - hdfs      hadoop          0 2014-04-10 14:12 /tmp/Iris





The Spark context is initiated on my local machine and according to the
configured hdfs permission "rwxr-xr-x" there is no problem in loading the
Hadoop hdfs file into a rdd:

final JavaRDD<String> rdd = sparkContext.textFile(filePath);



But saving the resulted rdd back to Hadoop resulst in an Hadoop security
exception:

rdd.saveAsTextFile("/tmp/Iris/output");



Then the I receive the following Hadoop security exception:

org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: *Permission denied:
user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x*

          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)

          at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

          at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

          at java.lang.reflect.Constructor.newInstance(Constructor.java:525)

          at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)

          at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)

          at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1428)

          at
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:332)

          at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)

          at
org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)

          at
org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:65)

          at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713)

          at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686)

          at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)

          at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)

          at
org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:355)

          at
org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:27)

          at org.apache.spark.reader.FileSpliter.split(FileSpliter.java:73)

          at
org.apache.spark.reader.FileReaderMain.main(FileReaderMain.java:17)

          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

          at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

          at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

          at java.lang.reflect.Method.invoke(Method.java:601)

          at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x

          at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)

          at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)

          at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:151)

          at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5951)

          at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5924)

          at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2628)

          at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2593)

          at
org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:927)

          at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)

          at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

          at java.lang.reflect.Method.invoke(Method.java:606)

          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)

          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)

          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)

          at java.security.AccessController.doPrivileged(Native Method)

          at javax.security.auth.Subject.doAs(Subject.java:415)

          at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)

          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)



          at org.apache.hadoop.ipc.Client.call(Client.java:1107)

          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)

          at $Proxy7.mkdirs(Unknown Source)

          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

          at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

          at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

          at java.lang.reflect.Method.invoke(Method.java:601)

          at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)

          at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)

          at $Proxy7.mkdirs(Unknown Source)

          at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1426)

          ... 17 more





Apparently is the Spark context is initiated with the user on the local
machine.

Is there a way to start the Spark Context with another user then the one
configured on the local machine?

Re: Executing spark jobs with predefined Hadoop user

Posted by Asaf Lahav <as...@gmail.com>.

Thank you all very much for your responses....


We are going to test these recommendations.
Adnan, in regards to the HDFS URI, this is actually the manner in which we
are accessing the file system already. It was simply removed from the post.

Thank you,
Asaf


On Thu, Apr 10, 2014 at 5:33 PM, Shao, Saisai <sa...@intel.com> wrote:

>  Hi Asaf,
>
>
>
> The user who run SparkContext is decided by the below code in
> SparkContext, normally this user.name is the user who started JVM, you
> can start your application with -Duser.name=xxx to specify a username you
> want, this specified username will be the user to communicate with HDFS.
>
>
>
>  *val* sparkUser *=* *Option* *{*
>
>     *Option**(**System**.*getProperty*(*"user.name"*)).*getOrElse*(*
> *System**.*getenv*(*"SPARK_USER"*))*
>
>   *}.*getOrElse *{*
>
>     *SparkContext**.**SPARK_UNKNOWN_USER*
>
>   *}*
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Asaf Lahav [mailto:asaf.lahav@gmail.com]
> *Sent:* Thursday, April 10, 2014 8:15 PM
> *To:* user@spark.apache.org
> *Subject:* Executing spark jobs with predefined Hadoop user
>
>
>
> Hi,
>
> We are using Spark with data files on HDFS. The files are stored as files
> for predefined hadoop user ("hdfs").
>
> The folder is permitted with
>
> ·         read write, executable and read permission for the hdfs user
>
> ·         executable and read permission for users in the group
>
> ·         just read permission for all other users
>
>
>
> now the Spark write operation fails, due to a user mismatch of the spark
> context and the Hadoop user permission.
>
> Is there a way to start the Spark Context with another user than the one
> configured on the local machine?
>
>
>
>
>
>
>
> Please the technical details below:
>
>
>
>
>
>
>
>
>
>
>
> The permission on the hdfs folder "/tmp/Iris" is as follows:
>
> drwxr-xr-x   - hdfs      hadoop          0 2014-04-10 14:12 /tmp/Iris
>
>
>
>
>
> The Spark context is initiated on my local machine and according to the
> configured hdfs permission "rwxr-xr-x" there is no problem in loading the
> Hadoop hdfs file into a rdd:
>
> final JavaRDD<String> rdd = sparkContext.textFile(filePath);
>
>
>
> But saving the resulted rdd back to Hadoop resulst in an Hadoop security
> exception:
>
> rdd.saveAsTextFile("/tmp/Iris/output");
>
>
>
> Then the I receive the following Hadoop security exception:
>
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: *Permission denied:
> user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x*
>
>           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>           at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
>           at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>           at
> java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>
>           at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>
>           at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
>
>           at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1428)
>
>           at
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:332)
>
>           at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
>
>           at
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
>
>           at
> org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:65)
>
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713)
>
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686)
>
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)
>
>           at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)
>
>           at
> org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:355)
>
>           at
> org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:27)
>
>           at org.apache.spark.reader.FileSpliter.split(FileSpliter.java:73)
>
>           at
> org.apache.spark.reader.FileReaderMain.main(FileReaderMain.java:17)
>
>           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>           at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>           at java.lang.reflect.Method.invoke(Method.java:601)
>
>           at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
>
> Caused by: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:151)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5951)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5924)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2628)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2593)
>
>           at
> org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:927)
>
>           at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
>
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>           at java.lang.reflect.Method.invoke(Method.java:606)
>
>           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>
>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
>
>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
>
>           at java.security.AccessController.doPrivileged(Native Method)
>
>           at javax.security.auth.Subject.doAs(Subject.java:415)
>
>           at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>
>           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
>
>
>
>           at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>
>           at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>
>           at $Proxy7.mkdirs(Unknown Source)
>
>           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>           at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>           at java.lang.reflect.Method.invoke(Method.java:601)
>
>           at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>
>           at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>
>           at $Proxy7.mkdirs(Unknown Source)
>
>           at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1426)
>
>           ... 17 more
>
>
>
>
>
> Apparently is the Spark context is initiated with the user on the local
> machine.
>
> Is there a way to start the Spark Context with another user then the one
> configured on the local machine?
>

RE: Executing spark jobs with predefined Hadoop user

Posted by "Shao, Saisai" <sa...@intel.com>.

Hi Asaf,

The user who run SparkContext is decided by the below code in SparkContext, normally this user.name is the user who started JVM, you can start your application with -Duser.name=xxx to specify a username you want, this specified username will be the user to communicate with HDFS.

 val sparkUser = Option {
    Option(System.getProperty("user.name")).getOrElse(System.getenv("SPARK_USER"))
  }.getOrElse {
    SparkContext.SPARK_UNKNOWN_USER
  }

Thanks
Jerry

From: Asaf Lahav [mailto:asaf.lahav@gmail.com]
Sent: Thursday, April 10, 2014 8:15 PM
To: user@spark.apache.org
Subject: Executing spark jobs with predefined Hadoop user

Hi,
We are using Spark with data files on HDFS. The files are stored as files for predefined hadoop user ("hdfs").
The folder is permitted with
*         read write, executable and read permission for the hdfs user
*         executable and read permission for users in the group
*         just read permission for all other users

now the Spark write operation fails, due to a user mismatch of the spark context and the Hadoop user permission.
Is there a way to start the Spark Context with another user than the one configured on the local machine?



Please the technical details below:





The permission on the hdfs folder "/tmp/Iris" is as follows:
drwxr-xr-x   - hdfs      hadoop          0 2014-04-10 14:12 /tmp/Iris


The Spark context is initiated on my local machine and according to the configured hdfs permission "rwxr-xr-x" there is no problem in loading the Hadoop hdfs file into a rdd:
final JavaRDD<String> rdd = sparkContext.textFile(filePath);

But saving the resulted rdd back to Hadoop resulst in an Hadoop security exception:
rdd.saveAsTextFile("/tmp/Iris/output");

Then the I receive the following Hadoop security exception:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
          at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
          at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
          at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
          at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1428)
          at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:332)
          at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
          at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
          at org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:65)
          at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713)
          at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686)
          at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)
          at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)
          at org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:355)
          at org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:27)
          at org.apache.spark.reader.FileSpliter.split(FileSpliter.java:73)
          at org.apache.spark.reader.FileReaderMain.main(FileReaderMain.java:17)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:601)
          at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:151)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5951)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5924)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2628)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2593)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:927)
          at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)

          at org.apache.hadoop.ipc.Client.call(Client.java:1107)
          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
          at $Proxy7.mkdirs(Unknown Source)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:601)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
          at $Proxy7.mkdirs(Unknown Source)
          at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1426)
          ... 17 more


Apparently is the Spark context is initiated with the user on the local machine.
Is there a way to start the Spark Context with another user then the one configured on the local machine?

Re: Executing spark jobs with predefined Hadoop user

Posted by Adnan <ns...@gmail.com>.

Then problem is not on spark side, you have three options, choose any one of
them:

1. Change permissions on /tmp/Iris folder from shell on NameNode with "hdfs
dfs -chmod" command.
2. Run your hadoop service with hdfs user.
3. Disable dfs.permissions in conf/hdfs-site.xml.

Regards,
Adnan


avito wrote
> Thanks Adam for the quick answer. You are absolutely right. 
> We are indeed using the entire HDFS URI. Just for the post I have removed
> the name node details.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Executing-spark-jobs-with-predefined-Hadoop-user-tp4059p4063.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Executing spark jobs with predefined Hadoop user

Posted by Adnan <ns...@gmail.com>.

You need to use proper HDFS URI with saveAsTextFile.

For Example:

rdd.saveAsTextFile("hdfs://NameNode:Port/tmp/Iris/output.tmp")

Regards,
Adnan


Asaf Lahav wrote
> Hi,
> 
> We are using Spark with data files on HDFS. The files are stored as files
> for predefined hadoop user ("hdfs").
> 
> The folder is permitted with
> 
> ·         read write, executable and read permission for the hdfs user
> 
> ·         executable and read permission for users in the group
> 
> ·         just read permission for all other users
> 
> 
> 
> now the Spark write operation fails, due to a user mismatch of the spark
> context and the Hadoop user permission.
> 
> Is there a way to start the Spark Context with another user than the one
> configured on the local machine?
> 
> 
> 
> 
> 
> 
> 
> Please the technical details below:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The permission on the hdfs folder "/tmp/Iris" is as follows:
> 
> drwxr-xr-x   - hdfs      hadoop          0 2014-04-10 14:12 /tmp/Iris
> 
> 
> 
> 
> 
> The Spark context is initiated on my local machine and according to the
> configured hdfs permission "rwxr-xr-x" there is no problem in loading the
> Hadoop hdfs file into a rdd:
> 
> final JavaRDD
> <String>
>  rdd = sparkContext.textFile(filePath);
> 
> 
> 
> But saving the resulted rdd back to Hadoop resulst in an Hadoop security
> exception:
> 
> rdd.saveAsTextFile("/tmp/Iris/output");
> 
> 
> 
> Then the I receive the following Hadoop security exception:
> 
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: *Permission denied:
> user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x*
> 
>           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> 
>           at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 
>           at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 
>           at
> java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> 
>           at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> 
>           at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
> 
>           at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1428)
> 
>           at
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:332)
> 
>           at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
> 
>           at
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
> 
>           at
> org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:65)
> 
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713)
> 
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686)
> 
>           at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)
> 
>           at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)
> 
>           at
> org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:355)
> 
>           at
> org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:27)
> 
>           at
> org.apache.spark.reader.FileSpliter.split(FileSpliter.java:73)
> 
>           at
> org.apache.spark.reader.FileReaderMain.main(FileReaderMain.java:17)
> 
>           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>           at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>           at java.lang.reflect.Method.invoke(Method.java:601)
> 
>           at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
> 
> Caused by: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=halbani, access=WRITE, inode="/tmp/Iris":hdfs:hadoop:drwxr-xr-x
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:151)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5951)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5924)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2628)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2593)
> 
>           at
> org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:927)
> 
>           at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
> 
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>           at java.lang.reflect.Method.invoke(Method.java:606)
> 
>           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
> 
>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
> 
>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
> 
>           at java.security.AccessController.doPrivileged(Native Method)
> 
>           at javax.security.auth.Subject.doAs(Subject.java:415)
> 
>           at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> 
>           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
> 
> 
> 
>           at org.apache.hadoop.ipc.Client.call(Client.java:1107)
> 
>           at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> 
>           at $Proxy7.mkdirs(Unknown Source)
> 
>           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>           at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>           at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>           at java.lang.reflect.Method.invoke(Method.java:601)
> 
>           at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
> 
>           at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
> 
>           at $Proxy7.mkdirs(Unknown Source)
> 
>           at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1426)
> 
>           ... 17 more
> 
> 
> 
> 
> 
> Apparently is the Spark context is initiated with the user on the local
> machine.
> 
> Is there a way to start the Spark Context with another user then the one
> configured on the local machine?





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Executing-spark-jobs-with-predefined-Hadoop-user-tp4059p4061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.