You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ranger.apache.org by Julien Carme <ju...@gmail.com> on 2016/01/18 18:27:04 UTC

Spark + Hive + Ranger

Hello,

I try to access Hive from Spark in an Hadoop cluster where I use Ranger to
control Hive access.

As Ranger is installed, I have setup hive accordingly:

hive.security.authorization.manager=
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

When I run Spark and I request it to access Hive table, it is using this
class to access it but I get several errors:

16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit
configuration found. Trying v2 audit configurations
16/01/18 17:51:50 ERROR util.PolicyRefresher:
PolicyRefresher(serviceName=null): failed to refresh policies. Will
continue to use last known version of policies (-1)
com.sun.jersey.api.client.ClientHandlerException:
java.lang.IllegalArgumentException: URI is not absolute
        at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at
com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at
com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
        at
org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
        at
org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)



--

And then (but it is not clear at all the two errors are connected) :

16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
filterListCmdObjects: Internal error: null RangerAccessResult object
received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
filterListCmdObjects: Internal error: null RangerAccessResult object
received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
filterListCmdObjects: Internal error: null RangerAccessResult object
received back from isAccessAllowed()!
1
-- 

And then the access to Hive tables fails.

I am not sure where to go from there. Any help would be appreciated.

Best Regards,

Julien

Re: Spark + Hive + Ranger

Posted by Dilli Dorai <di...@apache.org>.
Julien,
Interesting.
Thanks for sharing.
I was under the impression Spark would not be aware of hive.security.
authorization.manager.
Regards
Dilli


On Tue, Jan 19, 2016 at 7:10 AM, Julien Carme <ju...@gmail.com>
wrote:

> Hello,
>
> I answer to myself as I am happy to say that I solved my problem and I
> have been able to access Hive tables from SparkSQL with Ranger enabled.
> Policies defined in Ranger are properly enforced in Spark.
>
> So here is how to do it (assuming you have been able to make it work
> without Ranger):
> - Check that you have set hive.security.authorization.manager=
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
> - Get ranger-hive-security.xml and ranger-hive-audit.xml from your ranger
> hive plugin folder and copy them in you spark conf directory.
> - Add these jars from your ranger distribution to your classpath (or use
> the --driver-class-path argument for spark): ranger-hive-plugin,
> ranger-plugins-common, ranger-plugins-audit, guava
>
> That's all. It should work.
>
> The only thing which bothers me a little bit now is that SparkSQL does not
> handle 'doAs=F'. It is not surprising considering Spark is run by the user
> and not by a server process own by a system user. So I am afraid it will be
> an issue with Ranger, as all tables written with hive will be owned by hive
> but all tables written with Spark will be owned by the user who wrote them.
> We have to find a solution for that.
>
> Regards,
>
> Julien
>
>
> 2016-01-19 13:38 GMT+01:00 Julien Carme <ju...@gmail.com>:
>
>> Hello,
>>
>> Thanks Madhan and Bosco for your answers.
>>
>> I am using HDP 2.3 and installed Ranger from Ambari. I suppose Ambari
>> does run enable-hive-plugin, as Ranger does work correctly with Hive when I
>> use Hive through the hiveserver2. It is only when I try to use it from
>> Spark (using SparkSQL) that it does not work.
>>
>> SparkSQL does not use Hiveserver2, but it does not use HiveCLI either (at
>> least not directly). Hive engine is not used at all. SparkSQL is a
>> standalone SQL engine which is part of Spark, it gets Hive tables directly
>> from where they are stored, using metadata it gets from HCAT. At least it
>> is my understanding.
>>
>> Until recently, SparkSQL was ignoring Ranger, just like the Hive CLI, and
>> it was working (I could access Hive data from Spark on a cluster with
>> Ranger up, but of course Ranger rules were ignored). But since a recent
>> update, SparkSQL now clearly does interact with Ranger, as I get Ranger
>> exceptions when I use SparkSQL. I think that it gets the value of
>> hive.security.authorization.manager (which in my system is a Ranger
>> class) and instantiate this class in order to comply with security rules
>> defined by this class. I am no expert in Spark internals or Ranger, this is
>> just assumptions.
>>
>> I have solved multiple classpath (ranger jar not found) and configuration
>> file (xa-secure.xml ?) issues in order to reach the point where I am now.
>> Now I don't get missing class or missing file exceptions, but it still does
>> not work, and I get the issue describe in my previous mail (see below).
>>
>> I will try to continue my investigations. If I make progress I will post
>> it here. But any additional help would be appreciated.
>>
>> Best regards,
>>
>> Julien
>>
>>
>> 2016-01-18 22:24 GMT+01:00 Don Bosco Durai <bo...@apache.org>:
>>
>>> Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not
>>> wrong, Spark using HiveCLI API.
>>>
>>> To avoid this issue, I thought we only update hiveserver2.properties.
>>> Julien, I assume you are using the standard enable plugin scripts.
>>>
>>> Thanks
>>>
>>> Bosco
>>>
>>>
>>> From: Madhan Neethiraj <mn...@hortonworks.com> on behalf of Madhan
>>> Neethiraj <ma...@apache.org>
>>> Reply-To: <us...@ranger.incubator.apache.org>
>>> Date: Monday, January 18, 2016 at 9:54 AM
>>> To: "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org
>>> >
>>> Subject: Re: Spark + Hive + Ranger
>>>
>>> Julien,
>>>
>>> Ranger Hive plugin requires additional configuration, like whereto
>>> location of Ranger Admin, name of the service containing policies for Hive,
>>> etc. Such configurations (in files named ranger-*.xml) are created when
>>> enable-hive-plugin.sh script is run with appropriate values in
>>> install.properties. This script also update hive-site.xml with necessary
>>> changes – like registering Ranger as authorizer in
>>> hive.security.authorization.manager. If you haven’t installed the plugin
>>> using enable-hive-plugin.sh, please do so and let us know the result.
>>>
>>> Hope this helps.
>>>
>>> Madhan
>>>
>>>
>>> From: Julien Carme <ju...@gmail.com>
>>> Reply-To: "user@ranger.incubator.apache.org" <
>>> user@ranger.incubator.apache.org>
>>> Date: Monday, January 18, 2016 at 9:27 AM
>>> To: "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org
>>> >
>>> Subject: Spark + Hive + Ranger
>>>
>>> Hello,
>>>
>>> I try to access Hive from Spark in an Hadoop cluster where I use Ranger
>>> to control Hive access.
>>>
>>> As Ranger is installed, I have setup hive accordingly:
>>>
>>> hive.security.authorization.manager=
>>> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
>>>
>>> When I run Spark and I request it to access Hive table, it is using this
>>> class to access it but I get several errors:
>>>
>>> 16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit
>>> configuration found. Trying v2 audit configurations
>>> 16/01/18 17:51:50 ERROR util.PolicyRefresher:
>>> PolicyRefresher(serviceName=null): failed to refresh policies. Will
>>> continue to use last known version of policies (-1)
>>> com.sun.jersey.api.client.ClientHandlerException:
>>> java.lang.IllegalArgumentException: URI is not absolute
>>>         at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>>>         at com.sun.jersey.api.client.Client.handle(Client.java:648)
>>>         at
>>> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>>>         at
>>> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>>>         at
>>> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
>>>         at
>>> org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
>>>         at
>>> org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)
>>>
>>>
>>>
>>> --
>>>
>>> And then (but it is not clear at all the two errors are connected) :
>>>
>>> 16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>> mode
>>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>>> filterListCmdObjects: Internal error: null RangerAccessResult object
>>> received back from isAccessAllowed()!
>>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>>> filterListCmdObjects: Internal error: null RangerAccessResult object
>>> received back from isAccessAllowed()!
>>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>>> filterListCmdObjects: Internal error: null RangerAccessResult object
>>> received back from isAccessAllowed()!
>>> 1
>>> --
>>>
>>> And then the access to Hive tables fails.
>>>
>>> I am not sure where to go from there. Any help would be appreciated.
>>>
>>> Best Regards,
>>>
>>> Julien
>>>
>>>
>>>
>>>
>>
>

Re: Spark + Hive + Ranger

Posted by Julien Carme <ju...@gmail.com>.
Hello,

I answer to myself as I am happy to say that I solved my problem and I have
been able to access Hive tables from SparkSQL with Ranger enabled. Policies
defined in Ranger are properly enforced in Spark.

So here is how to do it (assuming you have been able to make it work
without Ranger):
- Check that you have set hive.security.authorization.manager=
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
- Get ranger-hive-security.xml and ranger-hive-audit.xml from your ranger
hive plugin folder and copy them in you spark conf directory.
- Add these jars from your ranger distribution to your classpath (or use
the --driver-class-path argument for spark): ranger-hive-plugin,
ranger-plugins-common, ranger-plugins-audit, guava

That's all. It should work.

The only thing which bothers me a little bit now is that SparkSQL does not
handle 'doAs=F'. It is not surprising considering Spark is run by the user
and not by a server process own by a system user. So I am afraid it will be
an issue with Ranger, as all tables written with hive will be owned by hive
but all tables written with Spark will be owned by the user who wrote them.
We have to find a solution for that.

Regards,

Julien


2016-01-19 13:38 GMT+01:00 Julien Carme <ju...@gmail.com>:

> Hello,
>
> Thanks Madhan and Bosco for your answers.
>
> I am using HDP 2.3 and installed Ranger from Ambari. I suppose Ambari does
> run enable-hive-plugin, as Ranger does work correctly with Hive when I use
> Hive through the hiveserver2. It is only when I try to use it from Spark
> (using SparkSQL) that it does not work.
>
> SparkSQL does not use Hiveserver2, but it does not use HiveCLI either (at
> least not directly). Hive engine is not used at all. SparkSQL is a
> standalone SQL engine which is part of Spark, it gets Hive tables directly
> from where they are stored, using metadata it gets from HCAT. At least it
> is my understanding.
>
> Until recently, SparkSQL was ignoring Ranger, just like the Hive CLI, and
> it was working (I could access Hive data from Spark on a cluster with
> Ranger up, but of course Ranger rules were ignored). But since a recent
> update, SparkSQL now clearly does interact with Ranger, as I get Ranger
> exceptions when I use SparkSQL. I think that it gets the value of
> hive.security.authorization.manager (which in my system is a Ranger
> class) and instantiate this class in order to comply with security rules
> defined by this class. I am no expert in Spark internals or Ranger, this is
> just assumptions.
>
> I have solved multiple classpath (ranger jar not found) and configuration
> file (xa-secure.xml ?) issues in order to reach the point where I am now.
> Now I don't get missing class or missing file exceptions, but it still does
> not work, and I get the issue describe in my previous mail (see below).
>
> I will try to continue my investigations. If I make progress I will post
> it here. But any additional help would be appreciated.
>
> Best regards,
>
> Julien
>
>
> 2016-01-18 22:24 GMT+01:00 Don Bosco Durai <bo...@apache.org>:
>
>> Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not
>> wrong, Spark using HiveCLI API.
>>
>> To avoid this issue, I thought we only update hiveserver2.properties.
>> Julien, I assume you are using the standard enable plugin scripts.
>>
>> Thanks
>>
>> Bosco
>>
>>
>> From: Madhan Neethiraj <mn...@hortonworks.com> on behalf of Madhan
>> Neethiraj <ma...@apache.org>
>> Reply-To: <us...@ranger.incubator.apache.org>
>> Date: Monday, January 18, 2016 at 9:54 AM
>> To: "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
>> Subject: Re: Spark + Hive + Ranger
>>
>> Julien,
>>
>> Ranger Hive plugin requires additional configuration, like whereto
>> location of Ranger Admin, name of the service containing policies for Hive,
>> etc. Such configurations (in files named ranger-*.xml) are created when
>> enable-hive-plugin.sh script is run with appropriate values in
>> install.properties. This script also update hive-site.xml with necessary
>> changes – like registering Ranger as authorizer in
>> hive.security.authorization.manager. If you haven’t installed the plugin
>> using enable-hive-plugin.sh, please do so and let us know the result.
>>
>> Hope this helps.
>>
>> Madhan
>>
>>
>> From: Julien Carme <ju...@gmail.com>
>> Reply-To: "user@ranger.incubator.apache.org" <
>> user@ranger.incubator.apache.org>
>> Date: Monday, January 18, 2016 at 9:27 AM
>> To: "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
>> Subject: Spark + Hive + Ranger
>>
>> Hello,
>>
>> I try to access Hive from Spark in an Hadoop cluster where I use Ranger
>> to control Hive access.
>>
>> As Ranger is installed, I have setup hive accordingly:
>>
>> hive.security.authorization.manager=
>> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
>>
>> When I run Spark and I request it to access Hive table, it is using this
>> class to access it but I get several errors:
>>
>> 16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit
>> configuration found. Trying v2 audit configurations
>> 16/01/18 17:51:50 ERROR util.PolicyRefresher:
>> PolicyRefresher(serviceName=null): failed to refresh policies. Will
>> continue to use last known version of policies (-1)
>> com.sun.jersey.api.client.ClientHandlerException:
>> java.lang.IllegalArgumentException: URI is not absolute
>>         at
>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>>         at com.sun.jersey.api.client.Client.handle(Client.java:648)
>>         at
>> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>>         at
>> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>>         at
>> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
>>         at
>> org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
>>         at
>> org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)
>>
>>
>>
>> --
>>
>> And then (but it is not clear at all the two errors are connected) :
>>
>> 16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>> mode
>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>> filterListCmdObjects: Internal error: null RangerAccessResult object
>> received back from isAccessAllowed()!
>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>> filterListCmdObjects: Internal error: null RangerAccessResult object
>> received back from isAccessAllowed()!
>> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
>> filterListCmdObjects: Internal error: null RangerAccessResult object
>> received back from isAccessAllowed()!
>> 1
>> --
>>
>> And then the access to Hive tables fails.
>>
>> I am not sure where to go from there. Any help would be appreciated.
>>
>> Best Regards,
>>
>> Julien
>>
>>
>>
>>
>

Re: Spark + Hive + Ranger

Posted by Julien Carme <ju...@gmail.com>.
Hello,

Thanks Madhan and Bosco for your answers.

I am using HDP 2.3 and installed Ranger from Ambari. I suppose Ambari does
run enable-hive-plugin, as Ranger does work correctly with Hive when I use
Hive through the hiveserver2. It is only when I try to use it from Spark
(using SparkSQL) that it does not work.

SparkSQL does not use Hiveserver2, but it does not use HiveCLI either (at
least not directly). Hive engine is not used at all. SparkSQL is a
standalone SQL engine which is part of Spark, it gets Hive tables directly
from where they are stored, using metadata it gets from HCAT. At least it
is my understanding.

Until recently, SparkSQL was ignoring Ranger, just like the Hive CLI, and
it was working (I could access Hive data from Spark on a cluster with
Ranger up, but of course Ranger rules were ignored). But since a recent
update, SparkSQL now clearly does interact with Ranger, as I get Ranger
exceptions when I use SparkSQL. I think that it gets the value of
hive.security.authorization.manager (which in my system is a Ranger class)
and instantiate this class in order to comply with security rules defined
by this class. I am no expert in Spark internals or Ranger, this is just
assumptions.

I have solved multiple classpath (ranger jar not found) and configuration
file (xa-secure.xml ?) issues in order to reach the point where I am now.
Now I don't get missing class or missing file exceptions, but it still does
not work, and I get the issue describe in my previous mail (see below).

I will try to continue my investigations. If I make progress I will post it
here. But any additional help would be appreciated.

Best regards,

Julien


2016-01-18 22:24 GMT+01:00 Don Bosco Durai <bo...@apache.org>:

> Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not
> wrong, Spark using HiveCLI API.
>
> To avoid this issue, I thought we only update hiveserver2.properties.
> Julien, I assume you are using the standard enable plugin scripts.
>
> Thanks
>
> Bosco
>
>
> From: Madhan Neethiraj <mn...@hortonworks.com> on behalf of Madhan
> Neethiraj <ma...@apache.org>
> Reply-To: <us...@ranger.incubator.apache.org>
> Date: Monday, January 18, 2016 at 9:54 AM
> To: "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
> Subject: Re: Spark + Hive + Ranger
>
> Julien,
>
> Ranger Hive plugin requires additional configuration, like whereto
> location of Ranger Admin, name of the service containing policies for Hive,
> etc. Such configurations (in files named ranger-*.xml) are created when
> enable-hive-plugin.sh script is run with appropriate values in
> install.properties. This script also update hive-site.xml with necessary
> changes – like registering Ranger as authorizer in
> hive.security.authorization.manager. If you haven’t installed the plugin
> using enable-hive-plugin.sh, please do so and let us know the result.
>
> Hope this helps.
>
> Madhan
>
>
> From: Julien Carme <ju...@gmail.com>
> Reply-To: "user@ranger.incubator.apache.org" <
> user@ranger.incubator.apache.org>
> Date: Monday, January 18, 2016 at 9:27 AM
> To: "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
> Subject: Spark + Hive + Ranger
>
> Hello,
>
> I try to access Hive from Spark in an Hadoop cluster where I use Ranger to
> control Hive access.
>
> As Ranger is installed, I have setup hive accordingly:
>
> hive.security.authorization.manager=
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
>
> When I run Spark and I request it to access Hive table, it is using this
> class to access it but I get several errors:
>
> 16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit
> configuration found. Trying v2 audit configurations
> 16/01/18 17:51:50 ERROR util.PolicyRefresher:
> PolicyRefresher(serviceName=null): failed to refresh policies. Will
> continue to use last known version of policies (-1)
> com.sun.jersey.api.client.ClientHandlerException:
> java.lang.IllegalArgumentException: URI is not absolute
>         at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>         at com.sun.jersey.api.client.Client.handle(Client.java:648)
>         at
> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>         at
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>         at
> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
>         at
> org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
>         at
> org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)
>
>
>
> --
>
> And then (but it is not clear at all the two errors are connected) :
>
> 16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
> mode
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 1
> --
>
> And then the access to Hive tables fails.
>
> I am not sure where to go from there. Any help would be appreciated.
>
> Best Regards,
>
> Julien
>
>
>
>

Re: Spark + Hive + Ranger

Posted by Don Bosco Durai <bo...@apache.org>.
Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not wrong, Spark using HiveCLI API.

To avoid this issue, I thought we only update hiveserver2.properties. Julien, I assume you are using the standard enable plugin scripts.

Thanks

Bosco


From:  Madhan Neethiraj <mn...@hortonworks.com> on behalf of Madhan Neethiraj <ma...@apache.org>
Reply-To:  <us...@ranger.incubator.apache.org>
Date:  Monday, January 18, 2016 at 9:54 AM
To:  "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
Subject:  Re: Spark + Hive + Ranger

Julien,

Ranger Hive plugin requires additional configuration, like whereto location of Ranger Admin, name of the service containing policies for Hive, etc. Such configurations (in files named ranger-*.xml) are created when enable-hive-plugin.sh script is run with appropriate values in install.properties. This script also update hive-site.xml with necessary changes – like registering Ranger as authorizer in hive.security.authorization.manager. If you haven’t installed the plugin using enable-hive-plugin.sh, please do so and let us know the result.

Hope this helps.

Madhan


From:  Julien Carme <ju...@gmail.com>
Reply-To:  "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
Date:  Monday, January 18, 2016 at 9:27 AM
To:  "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
Subject:  Spark + Hive + Ranger

Hello, 

I try to access Hive from Spark in an Hadoop cluster where I use Ranger to control Hive access.

As Ranger is installed, I have setup hive accordingly:

hive.security.authorization.manager=org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

When I run Spark and I request it to access Hive table, it is using this class to access it but I get several errors:

16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit configuration found. Trying v2 audit configurations
16/01/18 17:51:50 ERROR util.PolicyRefresher: PolicyRefresher(serviceName=null): failed to refresh policies. Will continue to use last known version of policies (-1)
com.sun.jersey.api.client.ClientHandlerException: java.lang.IllegalArgumentException: URI is not absolute
        at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
        at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
        at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)



--

And then (but it is not clear at all the two errors are connected) :

16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
1
-- 

And then the access to Hive tables fails.

I am not sure where to go from there. Any help would be appreciated.

Best Regards,

Julien





Re: Spark + Hive + Ranger

Posted by Madhan Neethiraj <ma...@apache.org>.
Julien,

Ranger Hive plugin requires additional configuration, like whereto location of Ranger Admin, name of the service containing policies for Hive, etc. Such configurations (in files named ranger-*.xml) are created when enable-hive-plugin.sh script is run with appropriate values in install.properties. This script also update hive-site.xml with necessary changes – like registering Ranger as authorizer in hive.security.authorization.manager. If you haven’t installed the plugin using enable-hive-plugin.sh, please do so and let us know the result.

Hope this helps.

Madhan


From:  Julien Carme <ju...@gmail.com>
Reply-To:  "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
Date:  Monday, January 18, 2016 at 9:27 AM
To:  "user@ranger.incubator.apache.org" <us...@ranger.incubator.apache.org>
Subject:  Spark + Hive + Ranger

Hello, 

I try to access Hive from Spark in an Hadoop cluster where I use Ranger to control Hive access.

As Ranger is installed, I have setup hive accordingly:

hive.security.authorization.manager=org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

When I run Spark and I request it to access Hive table, it is using this class to access it but I get several errors:

16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit configuration found. Trying v2 audit configurations
16/01/18 17:51:50 ERROR util.PolicyRefresher: PolicyRefresher(serviceName=null): failed to refresh policies. Will continue to use last known version of policies (-1)
com.sun.jersey.api.client.ClientHandlerException: java.lang.IllegalArgumentException: URI is not absolute
        at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
        at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
        at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)



--

And then (but it is not clear at all the two errors are connected) :

16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: filterListCmdObjects: Internal error: null RangerAccessResult object received back from isAccessAllowed()!
1
-- 

And then the access to Hive tables fails.

I am not sure where to go from there. Any help would be appreciated.

Best Regards,

Julien