You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Greg Hill <gr...@RACKSPACE.COM> on 2014/09/09 21:30:16 UTC

spark on yarn history server + hdfs permissions issue

I am running Spark on Yarn with the HDP 2.1 technical preview.  I'm having issues getting the spark history server permissions to read the spark event logs from hdfs.  Both sides are configured to write/read logs from:

hdfs:///apps/spark/events

The history server is running as user spark, the jobs are running as user lavaqe.  Both users are in the  hdfs group on all the nodes in the cluster.

That root logs folder is globally writeable, but owned by the spark user:

drwxrwxrwx   - spark hdfs          0 2014-09-09 18:19 /apps/spark/events

All good so far.  Spark jobs create subfolders and put their event logs in there just fine.  The problem is that the history server, running as the spark user, cannot read those logs.  They're written as the user that initiates the job, but still in the same hdfs group:

drwxrwx---   - lavaqe hdfs          0 2014-09-09 19:24 /apps/spark/events/spark-pi-1410290714996

The files are group readable/writable, but this is the error I get:

Permission denied: user=spark, access=READ_EXECUTE, inode="/apps/spark/events/spark-pi-1410290714996":lavaqe:hdfs:drwxrwx---

So, two questions, I guess:

1. Do group permissions just plain not work in hdfs or am I missing something?
2. Is there a way to tell Spark to log with more permissive permissions so the history server can read the generated logs?

Greg

Re: spark on yarn history server + hdfs permissions issue

Posted by Greg Hill <gr...@RACKSPACE.COM>.
To answer my own question, in case someone else runs into this.  The spark user needs to be in the same group on the namenode, and hdfs caches that information for it seems like at least an hour.  Magically started working on its own.

Greg

From: Greg <gr...@rackspace.com>>
Date: Tuesday, September 9, 2014 2:30 PM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: spark on yarn history server + hdfs permissions issue

I am running Spark on Yarn with the HDP 2.1 technical preview.  I'm having issues getting the spark history server permissions to read the spark event logs from hdfs.  Both sides are configured to write/read logs from:

hdfs:///apps/spark/events

The history server is running as user spark, the jobs are running as user lavaqe.  Both users are in the  hdfs group on all the nodes in the cluster.

That root logs folder is globally writeable, but owned by the spark user:

drwxrwxrwx   - spark hdfs          0 2014-09-09 18:19 /apps/spark/events

All good so far.  Spark jobs create subfolders and put their event logs in there just fine.  The problem is that the history server, running as the spark user, cannot read those logs.  They're written as the user that initiates the job, but still in the same hdfs group:

drwxrwx---   - lavaqe hdfs          0 2014-09-09 19:24 /apps/spark/events/spark-pi-1410290714996

The files are group readable/writable, but this is the error I get:

Permission denied: user=spark, access=READ_EXECUTE, inode="/apps/spark/events/spark-pi-1410290714996":lavaqe:hdfs:drwxrwx---

So, two questions, I guess:

1. Do group permissions just plain not work in hdfs or am I missing something?
2. Is there a way to tell Spark to log with more permissive permissions so the history server can read the generated logs?

Greg