You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "paul mackles (JIRA)" <ji...@apache.org> on 2017/11/15 13:35:00 UTC

[jira] [Created] (SPARK-22528) History service and non-HDFS filesystems

paul mackles created SPARK-22528:
------------------------------------

             Summary: History service and non-HDFS filesystems
                 Key: SPARK-22528
                 URL: https://issues.apache.org/jira/browse/SPARK-22528
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: paul mackles
            Priority: Minor


We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 2.1.x but in 2.2.0, the event logs are no longer visible to the history server. I tracked it down to the call to:

{code}
SparkHadoopUtil.get.checkAccessPermission()
{code}

which was added to "FSHistoryProvider" in 2.2.0.

I was able to workaround it by:
* setting the files to world readable
* setting HADOOP_PROXY to the Azure objectId of the service principal that owns file

Neither of those workaround are particularly desirable in our environment. That said, I am not sure how this should be addressed:
* Is this an issue with the Azure/Hadoop bindings not setting up the user context correctly so that the "checkAccessPermission()" call succeeds w/out having to use the username under which the process is running?
* Is this an issue with "checkAccessPermission()" not really accounting for all of the possible FileSystem implementations? If so, I would imagine that there are similar issues with using S3.

In spite of this check, I know the files are accessible through the underlying FileSystem object so it feels like the latter but I don't that the FileSystem object alone could be used to implement this check.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org