You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ranger.apache.org by Odon Copon <od...@gmail.com> on 2019/03/25 15:36:06 UTC

Ranger + Hive

Hi,
On my last test using HDFS + Ranger I had to sync my LDAP groups with
Hadoop based on the following link:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

That means users and groups had to be in Ranger and Hadoop cluster to make
policies to work.
But what about Hive + Ranger?
Is that mapping also required?
do I need users also to be mapped in Hadoop cluster?
what if tables are in S3 instead of HDFS per example?

Thanks.

Re: Ranger + Hive

Posted by Don Bosco Durai <bo...@apache.org>.
It be on the safe side, I will just install and configure SSSD on all the nodes.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Tuesday, March 26, 2019 at 2:41 PM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

I was reading this link: https://community.hortonworks.com/articles/175133/how-hiveserver2-and-ranger-interact-internals.html

And it was specifically mentioning the Hiveserver2, so that's why I was wondering if just the Hiveserver2 needed to run the SSSD.

 

As a good practice do you mean is mandatory?

 

Thanks.

 

On Tue, 26 Mar 2019 at 21:28, Don Bosco Durai <bo...@apache.org> wrote:

Hi Odon

 

As a good practice, each node should have the SSSD installed and configured.

 

If you are doing PoC or testing out, then at least the master nodes should have it configured.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Tuesday, March 26, 2019 at 2:07 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Good point. Not sure which components need to have the users and groups from ldap.

Just Hiveserver2? Any other Hadoop component?

Is there any link to that information?

 

Thanks

 

On Tue, 26 Mar 2019, 01:30 Don Bosco Durai, <bo...@apache.org> wrote:

If you already have LDAP, then you should install SSSD on all nodes. SSSD will only materialize the users when requested for.

 

I think, in your case, if you are only using Hive, then you would just need SSSD on the server which is running HiveServer2

 

Depending on the users you want to set policies in Ranger, you can apply filters during user sync.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 3:33 PM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thank you for your help and the information provided.

 

I don't want to have all users and groups as part of the server, that's why I'm looking for the mapping option with LDAP.

My groups are changing rapidly and I'm not considering having something like Ansible adding and removing users and groups from the server constantly.

 

Does it make sense?

Thanks

On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:

Hi Odon

 

If you are not using Kerberos, then it is much simpler. You don’t need do a lot…

 

Do you even need groups or group level policies? If so, you just need to create OS users and assign the groups you want to on the server where Hive Server2 is running

 

> Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

By default, Hadoop will go to the OS and get the groups for the user. So if you are doing SSSD (or similar technology), then it will get the groups from LDAP for you. So you don’t need to do any configuration in the core-site.xml.

 

Check this article :  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html

 

Bosco

 

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 10:09 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thanks for your help.

For this test I'm not using Kerberos, I'm just testing a simple pipeline with Hive+Ranger and some external tables in S3 and see what are the requirements.

From your comments, I understand I need to setup SSSD as explained in the link you provided: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

and having also the sync for Ranger would allow me to create policies.

 

Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

 

Thanks.

 

On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:

There are few things:
In Kerberos/secure mode, users needs to be materialized on each node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else you need to create the users explicitly on each node using ansible or puppet or manually…
The group mapping can be via LDAP or by groups from unix (SSSD will also do this you). FYI, LdapGroupsMapping is not recommended due to performance reasons. FYI, if you are using SSSD, it will get the groups from LDAP/AD
In Kerberos/secure mode, you need to materialize users on each node regardless whether you are accessing S3 or HDFS. This is a YARN requirement. So the that the YARN job process will run as the end user.
The users and groups in Ranger are just for convenience to create policy. Having it or not in Ranger, doesn’t affect the service. However, you will not be able to create the policies in Ranger. During testing or PoC, if you don’t want to sync, you can manually add to Ranger to using Ranger Admin UI
 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 8:36 AM
To: <us...@ranger.apache.org>
Subject: Ranger + Hive

 

Hi,

On my last test using HDFS + Ranger I had to sync my LDAP groups with Hadoop based on the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

 

That means users and groups had to be in Ranger and Hadoop cluster to make policies to work.

But what about Hive + Ranger? 

Is that mapping also required? 

do I need users also to be mapped in Hadoop cluster?

what if tables are in S3 instead of HDFS per example?

 

Thanks.


Re: Ranger + Hive

Posted by Odon Copon <od...@gmail.com>.
Hi Bosco,
I was reading this link:
https://community.hortonworks.com/articles/175133/how-hiveserver2-and-ranger-interact-internals.html
And it was specifically mentioning the Hiveserver2, so that's why I was
wondering if just the Hiveserver2 needed to run the SSSD.

As a good practice do you mean is mandatory?

Thanks.

On Tue, 26 Mar 2019 at 21:28, Don Bosco Durai <bo...@apache.org> wrote:

> Hi Odon
>
>
>
> As a good practice, each node should have the SSSD installed and
> configured.
>
>
>
> If you are doing PoC or testing out, then at least the master nodes should
> have it configured.
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Tuesday, March 26, 2019 at 2:07 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Good point. Not sure which components need to have the users and groups
> from ldap.
>
> Just Hiveserver2? Any other Hadoop component?
>
> Is there any link to that information?
>
>
>
> Thanks
>
>
>
> On Tue, 26 Mar 2019, 01:30 Don Bosco Durai, <bo...@apache.org> wrote:
>
> If you already have LDAP, then you should install SSSD on all nodes. SSSD
> will only materialize the users when requested for.
>
>
>
> I think, in your case, if you are only using Hive, then you would just
> need SSSD on the server which is running HiveServer2
>
>
>
> Depending on the users you want to set policies in Ranger, you can apply
> filters during user sync.
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 3:33 PM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Hi Bosco,
>
> Thank you for your help and the information provided.
>
>
>
> I don't want to have all users and groups as part of the server, that's
> why I'm looking for the mapping option with LDAP.
>
> My groups are changing rapidly and I'm not considering having something
> like Ansible adding and removing users and groups from the server
> constantly.
>
>
>
> Does it make sense?
>
> Thanks
>
> On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:
>
> Hi Odon
>
>
>
> If you are not using Kerberos, then it is much simpler. You don’t need do
> a lot…
>
>
>
> Do you even need groups or group level policies? If so, you just need to
> create OS users and assign the groups you want to on the server where Hive
> Server2 is running
>
>
>
> > Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
> By default, Hadoop will go to the OS and get the groups for the user. So
> if you are doing SSSD (or similar technology), then it will get the groups
> from LDAP for you. So you don’t need to do any configuration in the
> core-site.xml.
>
>
>
> Check this article :
> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html
>
>
>
> Bosco
>
>
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 10:09 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Hi Bosco,
>
> Thanks for your help.
>
> For this test I'm not using Kerberos, I'm just testing a simple pipeline
> with Hive+Ranger and some external tables in S3 and see what are the
> requirements.
>
> From your comments, I understand I need to setup SSSD as explained in the
> link you provided:
> https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
> and having also the sync for Ranger would allow me to create policies.
>
>
>
> Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
>
>
> Thanks.
>
>
>
> On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:
>
> There are few things:
>
>    1. In Kerberos/secure mode, users needs to be materialized on each
>    node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else
>    you need to create the users explicitly on each node using ansible or
>    puppet or manually…
>    2. The group mapping can be via LDAP or by groups from unix (SSSD will
>    also do this you). FYI, LdapGroupsMapping is not recommended due to
>    performance reasons. FYI, if you are using SSSD, it will get the groups
>    from LDAP/AD
>    3. In Kerberos/secure mode, you need to materialize users on each node
>    regardless whether you are accessing S3 or HDFS. This is a YARN
>    requirement. So the that the YARN job process will run as the end user.
>    4. The users and groups in Ranger are just for convenience to create
>    policy. Having it or not in Ranger, doesn’t affect the service. However,
>    you will not be able to create the policies in Ranger. During testing or
>    PoC, if you don’t want to sync, you can manually add to Ranger to using
>    Ranger Admin UI
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 8:36 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger + Hive
>
>
>
> Hi,
>
> On my last test using HDFS + Ranger I had to sync my LDAP groups with
> Hadoop based on the following link:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
>
>
> That means users and groups had to be in Ranger and Hadoop cluster to make
> policies to work.
>
> But what about Hive + Ranger?
>
> Is that mapping also required?
>
> do I need users also to be mapped in Hadoop cluster?
>
> what if tables are in S3 instead of HDFS per example?
>
>
>
> Thanks.
>
>

Re: Ranger + Hive

Posted by Don Bosco Durai <bo...@apache.org>.
Hi Odon

 

As a good practice, each node should have the SSSD installed and configured.

 

If you are doing PoC or testing out, then at least the master nodes should have it configured.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Tuesday, March 26, 2019 at 2:07 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Good point. Not sure which components need to have the users and groups from ldap.

Just Hiveserver2? Any other Hadoop component?

Is there any link to that information?

 

Thanks

 

On Tue, 26 Mar 2019, 01:30 Don Bosco Durai, <bo...@apache.org> wrote:

If you already have LDAP, then you should install SSSD on all nodes. SSSD will only materialize the users when requested for.

 

I think, in your case, if you are only using Hive, then you would just need SSSD on the server which is running HiveServer2

 

Depending on the users you want to set policies in Ranger, you can apply filters during user sync.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 3:33 PM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thank you for your help and the information provided.

 

I don't want to have all users and groups as part of the server, that's why I'm looking for the mapping option with LDAP.

My groups are changing rapidly and I'm not considering having something like Ansible adding and removing users and groups from the server constantly.

 

Does it make sense?

Thanks

On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:

Hi Odon

 

If you are not using Kerberos, then it is much simpler. You don’t need do a lot…

 

Do you even need groups or group level policies? If so, you just need to create OS users and assign the groups you want to on the server where Hive Server2 is running

 

> Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

By default, Hadoop will go to the OS and get the groups for the user. So if you are doing SSSD (or similar technology), then it will get the groups from LDAP for you. So you don’t need to do any configuration in the core-site.xml.

 

Check this article :  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html

 

Bosco

 

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 10:09 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thanks for your help.

For this test I'm not using Kerberos, I'm just testing a simple pipeline with Hive+Ranger and some external tables in S3 and see what are the requirements.

From your comments, I understand I need to setup SSSD as explained in the link you provided: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

and having also the sync for Ranger would allow me to create policies.

 

Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

 

Thanks.

 

On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:

There are few things:
In Kerberos/secure mode, users needs to be materialized on each node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else you need to create the users explicitly on each node using ansible or puppet or manually…
The group mapping can be via LDAP or by groups from unix (SSSD will also do this you). FYI, LdapGroupsMapping is not recommended due to performance reasons. FYI, if you are using SSSD, it will get the groups from LDAP/AD
In Kerberos/secure mode, you need to materialize users on each node regardless whether you are accessing S3 or HDFS. This is a YARN requirement. So the that the YARN job process will run as the end user.
The users and groups in Ranger are just for convenience to create policy. Having it or not in Ranger, doesn’t affect the service. However, you will not be able to create the policies in Ranger. During testing or PoC, if you don’t want to sync, you can manually add to Ranger to using Ranger Admin UI
 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 8:36 AM
To: <us...@ranger.apache.org>
Subject: Ranger + Hive

 

Hi,

On my last test using HDFS + Ranger I had to sync my LDAP groups with Hadoop based on the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

 

That means users and groups had to be in Ranger and Hadoop cluster to make policies to work.

But what about Hive + Ranger? 

Is that mapping also required? 

do I need users also to be mapped in Hadoop cluster?

what if tables are in S3 instead of HDFS per example?

 

Thanks.


Re: Ranger + Hive

Posted by Odon Copon <od...@gmail.com>.
Good point. Not sure which components need to have the users and groups
from ldap.
Just Hiveserver2? Any other Hadoop component?
Is there any link to that information?

Thanks

On Tue, 26 Mar 2019, 01:30 Don Bosco Durai, <bo...@apache.org> wrote:

> If you already have LDAP, then you should install SSSD on all nodes. SSSD
> will only materialize the users when requested for.
>
>
>
> I think, in your case, if you are only using Hive, then you would just
> need SSSD on the server which is running HiveServer2
>
>
>
> Depending on the users you want to set policies in Ranger, you can apply
> filters during user sync.
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 3:33 PM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Hi Bosco,
>
> Thank you for your help and the information provided.
>
>
>
> I don't want to have all users and groups as part of the server, that's
> why I'm looking for the mapping option with LDAP.
>
> My groups are changing rapidly and I'm not considering having something
> like Ansible adding and removing users and groups from the server
> constantly.
>
>
>
> Does it make sense?
>
> Thanks
>
> On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:
>
> Hi Odon
>
>
>
> If you are not using Kerberos, then it is much simpler. You don’t need do
> a lot…
>
>
>
> Do you even need groups or group level policies? If so, you just need to
> create OS users and assign the groups you want to on the server where Hive
> Server2 is running
>
>
>
> > Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
> By default, Hadoop will go to the OS and get the groups for the user. So
> if you are doing SSSD (or similar technology), then it will get the groups
> from LDAP for you. So you don’t need to do any configuration in the
> core-site.xml.
>
>
>
> Check this article :
> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html
>
>
>
> Bosco
>
>
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 10:09 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Hi Bosco,
>
> Thanks for your help.
>
> For this test I'm not using Kerberos, I'm just testing a simple pipeline
> with Hive+Ranger and some external tables in S3 and see what are the
> requirements.
>
> From your comments, I understand I need to setup SSSD as explained in the
> link you provided:
> https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
> and having also the sync for Ranger would allow me to create policies.
>
>
>
> Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
>
>
> Thanks.
>
>
>
> On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:
>
> There are few things:
>
>    1. In Kerberos/secure mode, users needs to be materialized on each
>    node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else
>    you need to create the users explicitly on each node using ansible or
>    puppet or manually…
>    2. The group mapping can be via LDAP or by groups from unix (SSSD will
>    also do this you). FYI, LdapGroupsMapping is not recommended due to
>    performance reasons. FYI, if you are using SSSD, it will get the groups
>    from LDAP/AD
>    3. In Kerberos/secure mode, you need to materialize users on each node
>    regardless whether you are accessing S3 or HDFS. This is a YARN
>    requirement. So the that the YARN job process will run as the end user.
>    4. The users and groups in Ranger are just for convenience to create
>    policy. Having it or not in Ranger, doesn’t affect the service. However,
>    you will not be able to create the policies in Ranger. During testing or
>    PoC, if you don’t want to sync, you can manually add to Ranger to using
>    Ranger Admin UI
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 8:36 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger + Hive
>
>
>
> Hi,
>
> On my last test using HDFS + Ranger I had to sync my LDAP groups with
> Hadoop based on the following link:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
>
>
> That means users and groups had to be in Ranger and Hadoop cluster to make
> policies to work.
>
> But what about Hive + Ranger?
>
> Is that mapping also required?
>
> do I need users also to be mapped in Hadoop cluster?
>
> what if tables are in S3 instead of HDFS per example?
>
>
>
> Thanks.
>
>

Re: Ranger + Hive

Posted by Don Bosco Durai <bo...@apache.org>.
If you already have LDAP, then you should install SSSD on all nodes. SSSD will only materialize the users when requested for.

 

I think, in your case, if you are only using Hive, then you would just need SSSD on the server which is running HiveServer2

 

Depending on the users you want to set policies in Ranger, you can apply filters during user sync.

 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 3:33 PM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thank you for your help and the information provided.

 

I don't want to have all users and groups as part of the server, that's why I'm looking for the mapping option with LDAP.

My groups are changing rapidly and I'm not considering having something like Ansible adding and removing users and groups from the server constantly.

 

Does it make sense?

Thanks

On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:

Hi Odon

 

If you are not using Kerberos, then it is much simpler. You don’t need do a lot…

 

Do you even need groups or group level policies? If so, you just need to create OS users and assign the groups you want to on the server where Hive Server2 is running

 

> Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

By default, Hadoop will go to the OS and get the groups for the user. So if you are doing SSSD (or similar technology), then it will get the groups from LDAP for you. So you don’t need to do any configuration in the core-site.xml.

 

Check this article :  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html

 

Bosco

 

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 10:09 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thanks for your help.

For this test I'm not using Kerberos, I'm just testing a simple pipeline with Hive+Ranger and some external tables in S3 and see what are the requirements.

From your comments, I understand I need to setup SSSD as explained in the link you provided: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

and having also the sync for Ranger would allow me to create policies.

 

Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

 

Thanks.

 

On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:

There are few things:
In Kerberos/secure mode, users needs to be materialized on each node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else you need to create the users explicitly on each node using ansible or puppet or manually…
The group mapping can be via LDAP or by groups from unix (SSSD will also do this you). FYI, LdapGroupsMapping is not recommended due to performance reasons. FYI, if you are using SSSD, it will get the groups from LDAP/AD
In Kerberos/secure mode, you need to materialize users on each node regardless whether you are accessing S3 or HDFS. This is a YARN requirement. So the that the YARN job process will run as the end user.
The users and groups in Ranger are just for convenience to create policy. Having it or not in Ranger, doesn’t affect the service. However, you will not be able to create the policies in Ranger. During testing or PoC, if you don’t want to sync, you can manually add to Ranger to using Ranger Admin UI
 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 8:36 AM
To: <us...@ranger.apache.org>
Subject: Ranger + Hive

 

Hi,

On my last test using HDFS + Ranger I had to sync my LDAP groups with Hadoop based on the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

 

That means users and groups had to be in Ranger and Hadoop cluster to make policies to work.

But what about Hive + Ranger? 

Is that mapping also required? 

do I need users also to be mapped in Hadoop cluster?

what if tables are in S3 instead of HDFS per example?

 

Thanks.


Re: Ranger + Hive

Posted by Odon Copon <od...@gmail.com>.
Hi Bosco,
Thank you for your help and the information provided.

I don't want to have all users and groups as part of the server, that's why
I'm looking for the mapping option with LDAP.
My groups are changing rapidly and I'm not considering having something
like Ansible adding and removing users and groups from the server
constantly.

Does it make sense?
Thanks

On Mon, 25 Mar 2019, 21:58 Don Bosco Durai, <bo...@apache.org> wrote:

> Hi Odon
>
>
>
> If you are not using Kerberos, then it is much simpler. You don’t need do
> a lot…
>
>
>
> Do you even need groups or group level policies? If so, you just need to
> create OS users and assign the groups you want to on the server where Hive
> Server2 is running
>
>
>
> > Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
> By default, Hadoop will go to the OS and get the groups for the user. So
> if you are doing SSSD (or similar technology), then it will get the groups
> from LDAP for you. So you don’t need to do any configuration in the
> core-site.xml.
>
>
>
> Check this article :
> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html
>
>
>
> Bosco
>
>
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 10:09 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Re: Ranger + Hive
>
>
>
> Hi Bosco,
>
> Thanks for your help.
>
> For this test I'm not using Kerberos, I'm just testing a simple pipeline
> with Hive+Ranger and some external tables in S3 and see what are the
> requirements.
>
> From your comments, I understand I need to setup SSSD as explained in the
> link you provided:
> https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
> and having also the sync for Ranger would allow me to create policies.
>
>
>
> Why do you say "LdapGroupsMapping is not recommended". It seems the only
> way to ingest and use information from LDAP.
>
>
>
> Thanks.
>
>
>
> On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:
>
> There are few things:
>
>    1. In Kerberos/secure mode, users needs to be materialized on each
>    node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else
>    you need to create the users explicitly on each node using ansible or
>    puppet or manually…
>    2. The group mapping can be via LDAP or by groups from unix (SSSD will
>    also do this you). FYI, LdapGroupsMapping is not recommended due to
>    performance reasons. FYI, if you are using SSSD, it will get the groups
>    from LDAP/AD
>    3. In Kerberos/secure mode, you need to materialize users on each node
>    regardless whether you are accessing S3 or HDFS. This is a YARN
>    requirement. So the that the YARN job process will run as the end user.
>    4. The users and groups in Ranger are just for convenience to create
>    policy. Having it or not in Ranger, doesn’t affect the service. However,
>    you will not be able to create the policies in Ranger. During testing or
>    PoC, if you don’t want to sync, you can manually add to Ranger to using
>    Ranger Admin UI
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 8:36 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger + Hive
>
>
>
> Hi,
>
> On my last test using HDFS + Ranger I had to sync my LDAP groups with
> Hadoop based on the following link:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
>
>
> That means users and groups had to be in Ranger and Hadoop cluster to make
> policies to work.
>
> But what about Hive + Ranger?
>
> Is that mapping also required?
>
> do I need users also to be mapped in Hadoop cluster?
>
> what if tables are in S3 instead of HDFS per example?
>
>
>
> Thanks.
>
>

Re: Ranger + Hive

Posted by Don Bosco Durai <bo...@apache.org>.
Hi Odon

 

If you are not using Kerberos, then it is much simpler. You don’t need do a lot…

 

Do you even need groups or group level policies? If so, you just need to create OS users and assign the groups you want to on the server where Hive Server2 is running

 

> Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

By default, Hadoop will go to the OS and get the groups for the user. So if you are doing SSSD (or similar technology), then it will get the groups from LDAP for you. So you don’t need to do any configuration in the core-site.xml.

 

Check this article :  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_sg_ldap_grp_mappings.html

 

Bosco

 

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 10:09 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger + Hive

 

Hi Bosco,

Thanks for your help.

For this test I'm not using Kerberos, I'm just testing a simple pipeline with Hive+Ranger and some external tables in S3 and see what are the requirements.

From your comments, I understand I need to setup SSSD as explained in the link you provided: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

and having also the sync for Ranger would allow me to create policies.

 

Why do you say "LdapGroupsMapping is not recommended". It seems the only way to ingest and use information from LDAP.

 

Thanks.

 

On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:

There are few things:
In Kerberos/secure mode, users needs to be materialized on each node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else you need to create the users explicitly on each node using ansible or puppet or manually…
The group mapping can be via LDAP or by groups from unix (SSSD will also do this you). FYI, LdapGroupsMapping is not recommended due to performance reasons. FYI, if you are using SSSD, it will get the groups from LDAP/AD
In Kerberos/secure mode, you need to materialize users on each node regardless whether you are accessing S3 or HDFS. This is a YARN requirement. So the that the YARN job process will run as the end user.
The users and groups in Ranger are just for convenience to create policy. Having it or not in Ranger, doesn’t affect the service. However, you will not be able to create the policies in Ranger. During testing or PoC, if you don’t want to sync, you can manually add to Ranger to using Ranger Admin UI
 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 8:36 AM
To: <us...@ranger.apache.org>
Subject: Ranger + Hive

 

Hi,

On my last test using HDFS + Ranger I had to sync my LDAP groups with Hadoop based on the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

 

That means users and groups had to be in Ranger and Hadoop cluster to make policies to work.

But what about Hive + Ranger? 

Is that mapping also required? 

do I need users also to be mapped in Hadoop cluster?

what if tables are in S3 instead of HDFS per example?

 

Thanks.


Re: Ranger + Hive

Posted by Odon Copon <od...@gmail.com>.
Hi Bosco,
Thanks for your help.
For this test I'm not using Kerberos, I'm just testing a simple pipeline
with Hive+Ranger and some external tables in S3 and see what are the
requirements.
From your comments, I understand I need to setup SSSD as explained in the
link you provided:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
and having also the sync for Ranger would allow me to create policies.

Why do you say "LdapGroupsMapping is not recommended". It seems the only
way to ingest and use information from LDAP.

Thanks.

On Mon, 25 Mar 2019 at 16:47, Don Bosco Durai <bo...@apache.org> wrote:

> There are few things:
>
>    1. In Kerberos/secure mode, users needs to be materialized on each
>    node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else
>    you need to create the users explicitly on each node using ansible or
>    puppet or manually…
>    2. The group mapping can be via LDAP or by groups from unix (SSSD will
>    also do this you). FYI, LdapGroupsMapping is not recommended due to
>    performance reasons. FYI, if you are using SSSD, it will get the groups
>    from LDAP/AD
>    3. In Kerberos/secure mode, you need to materialize users on each node
>    regardless whether you are accessing S3 or HDFS. This is a YARN
>    requirement. So the that the YARN job process will run as the end user.
>    4. The users and groups in Ranger are just for convenience to create
>    policy. Having it or not in Ranger, doesn’t affect the service. However,
>    you will not be able to create the policies in Ranger. During testing or
>    PoC, if you don’t want to sync, you can manually add to Ranger to using
>    Ranger Admin UI
>
>
>
> Bosco
>
>
>
>
>
> *From: *Odon Copon <od...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Monday, March 25, 2019 at 8:36 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger + Hive
>
>
>
> Hi,
>
> On my last test using HDFS + Ranger I had to sync my LDAP groups with
> Hadoop based on the following link:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html
>
>
>
> That means users and groups had to be in Ranger and Hadoop cluster to make
> policies to work.
>
> But what about Hive + Ranger?
>
> Is that mapping also required?
>
> do I need users also to be mapped in Hadoop cluster?
>
> what if tables are in S3 instead of HDFS per example?
>
>
>
> Thanks.
>

Re: Ranger + Hive

Posted by Don Bosco Durai <bo...@apache.org>.
There are few things:
In Kerberos/secure mode, users needs to be materialized on each node. If you are using AD/LDAP, then you can use SSSD (or equivalent), else you need to create the users explicitly on each node using ansible or puppet or manually…
The group mapping can be via LDAP or by groups from unix (SSSD will also do this you). FYI, LdapGroupsMapping is not recommended due to performance reasons. FYI, if you are using SSSD, it will get the groups from LDAP/AD
In Kerberos/secure mode, you need to materialize users on each node regardless whether you are accessing S3 or HDFS. This is a YARN requirement. So the that the YARN job process will run as the end user.
The users and groups in Ranger are just for convenience to create policy. Having it or not in Ranger, doesn’t affect the service. However, you will not be able to create the policies in Ranger. During testing or PoC, if you don’t want to sync, you can manually add to Ranger to using Ranger Admin UI
 

Bosco

 

 

From: Odon Copon <od...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Monday, March 25, 2019 at 8:36 AM
To: <us...@ranger.apache.org>
Subject: Ranger + Hive

 

Hi,

On my last test using HDFS + Ranger I had to sync my LDAP groups with Hadoop based on the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/setting_up_hadoop_group_mappping_for_ldap_ad.html

 

That means users and groups had to be in Ranger and Hadoop cluster to make policies to work.

But what about Hive + Ranger? 

Is that mapping also required? 

do I need users also to be mapped in Hadoop cluster?

what if tables are in S3 instead of HDFS per example?

 

Thanks.