You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Vijay Singh (JIRA)" <ji...@apache.org> on 2015/09/03 20:42:46 UTC

[jira] [Commented] (SPARK-9042) Spark SQL incompatibility with Apache Sentry

    [ https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729548#comment-14729548 ] 

Vijay Singh commented on SPARK-9042:
------------------------------------

This is since Sentry locks down access to hive metastore server and hence hivecontext based execution fails. This is on similar lines to hiveCLI starting hadoop 2.6 and hive 1.0.

I am going to take a look at this and see if we can resolve by issuring all metadata operations through hiveserver2 jdbc and provide a configuration so leverage metastore or hiveserver during configuration time. Two of my customers are affected due to hive metastore lock down.

Please let me know in case of any issues or concerns.

> Spark SQL incompatibility with Apache Sentry
> --------------------------------------------
>
>                 Key: SPARK-9042
>                 URL: https://issues.apache.org/jira/browse/SPARK-9042
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the query plan and then access the Hive table directories(under /user/hive/warehouse/) directly. This gives AccessContolException if Apache Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: user=kakn, access=READ_EXECUTE, inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the permissions to access the hive warehouse directory. After Sentry installations all the queries are directed to HiveServer2 which translates the changes the invoking user to "hive" and then access the hive warehouse directory. However, HiveContext does not execute the query through HiveServer2 which is leading to the issue. Here is an example of executing hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org