You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by John Omernik <jo...@omernik.com> on 2013/05/04 15:31:48 UTC

Hive Authorization and Views

We were doing some tests this past week with hive authorization, one of our
current use "challenges" is when we have an underlying, well managed and
partitioned table, and we want to allow access to certain columns in that
table.  Our first thoughts went to VIEWs as that's a common use case with
Relational Databases, (i.e. setup a view with only the columns you want the
user to access) and set the permissions appropriately.

In testing, and this is not surprising given the the "newness" of Hive
Authorization, a VIEW can not be created as to allow access to to a table
without granting access to the underlying table, defeating the idea of the
view as tool to manage that access.

So I wanted to put to the user group: I've done some JIRA searching and
didn't find anything (I will admit my JIRA search Foo is not stellar), but
is there an option that could be thrown together in Hive that would allow
that use case?  Perhaps a configuration setting that would allow views to
execute as a specific user (perhaps a global user, or perhaps a user
specified as view creation).  This could allow the "view" to have access to
underlying table, but since the view is created, and it couldn't be changed
by the user, and thus you could set view "read" permissions to your user or
group of users you want access.

I suppose this has challenges "i.e. can a user just create a view to bypass
table level restrictions? Perhaps if this model was taken, the privilege
for CREATING/MODIFYING views could be created and granted only to a
superuser of some sort.  I am really just walking through ideas here as
this is the one last stumbling blocks we have with Hive from an "Enterprise
ready" point of view. Heck, if done right, you could almost do data masking
at the view level. You have a column in your source data that is sensitive,
so instead of returning that column you do a MD5 (can we have a native MD5
function? :) of that column or you blank that column. If we put in strong
security on the creation, modification of views, and allow views to execute
as a different user that has access to source data, you have a powerful way
to represent your data to all levels within your org.

Also: Since I am just brain storming here, I'd love to hear what others
maybe doing around this area. Perhaps the Hive User Community can come up
with a strategic plan, while at the same time share some shorter term
workarounds.

Thanks!

Re: Hive Authorization and Views

Posted by John Omernik <jo...@omernik.com>.

Edward - I agree that hive and rdbms are different animals, so in looking
at that current work around hive authorization, I  get that the user would
still have access to the underlying file system.  We have to assume that
permissions are only enforced from a metadata perspective.  But given that
it's high on the list of questions around hive in enterprise adoption of
any data warehousing solution, it may provide enough of a control to pass
audit requirements if views could be used as the control. User can access
data directly (outside of hive) however in hive users can't access table
directly, but can access the view.   Need to think it through some more,
even in a RBDMS, sometimes certain users would be able to access the files
of the data store (administrators etc) but be controlled from a perspective
of accessing the data through the rdbms.   Great discussion, I love stuff
like this, Hive is awesome its community discussion that makes it kick ass
(excuse the language) :)



On Thu, May 16, 2013 at 4:19 PM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  Also we have all external tables to ensure that accidental dropping of
> tables does not delete data…Plus the good part of HDFS architecture is data
> is immutable….which means u cannot update rows….u can move partitions or
> delete/insert data from hdfs which IMHO is very cool….but may not solve all
> use cases
> Regards
> sanjay
>
>   From: Edward Capriolo <ed...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Thursday, May 16, 2013 2:05 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: Hive Authorization and Views
>
>   The largest issue is that the RDBMS security model does not match with
> hive. Hive/Hadoop has file permissions, RDMBS have column and sometimes row
> level permissions.
>
>  When you physically have access to the underlying file (row level)
> permissions are not enforceable. The only way to enforce this type of
> security is to force users through a "turnstyle" that changes how hive
> currently works.
>
>
>
>
> On Thu, May 16, 2013 at 4:42 PM, John Omernik <jo...@omernik.com> wrote:
>
>> I am curious on the thoughts of the community here, this seems like
>> something many enterprises would drool over with Hive... I am not a coder
>> so the level coding involved something like this is unknown.
>>
>>
>> On Sat, May 4, 2013 at 8:31 AM, John Omernik <jo...@omernik.com> wrote:
>>
>>> We were doing some tests this past week with hive authorization, one of
>>> our current use "challenges" is when we have an underlying, well managed
>>> and partitioned table, and we want to allow access to certain columns in
>>> that table.  Our first thoughts went to VIEWs as that's a common use case
>>> with Relational Databases, (i.e. setup a view with only the columns you
>>> want the user to access) and set the permissions appropriately.
>>>
>>>  In testing, and this is not surprising given the the "newness" of Hive
>>> Authorization, a VIEW can not be created as to allow access to to a table
>>> without granting access to the underlying table, defeating the idea of the
>>> view as tool to manage that access.
>>>
>>>  So I wanted to put to the user group: I've done some JIRA searching
>>> and didn't find anything (I will admit my JIRA search Foo is not stellar),
>>> but is there an option that could be thrown together in Hive that would
>>> allow that use case?  Perhaps a configuration setting that would allow
>>> views to execute as a specific user (perhaps a global user, or perhaps a
>>> user specified as view creation).  This could allow the "view" to have
>>> access to underlying table, but since the view is created, and it couldn't
>>> be changed by the user, and thus you could set view "read" permissions to
>>> your user or group of users you want access.
>>>
>>>  I suppose this has challenges "i.e. can a user just create a view to
>>> bypass table level restrictions? Perhaps if this model was taken, the
>>> privilege for CREATING/MODIFYING views could be created and granted only to
>>> a superuser of some sort.  I am really just walking through ideas here as
>>> this is the one last stumbling blocks we have with Hive from an "Enterprise
>>> ready" point of view. Heck, if done right, you could almost do data masking
>>> at the view level. You have a column in your source data that is sensitive,
>>> so instead of returning that column you do a MD5 (can we have a native MD5
>>> function? :) of that column or you blank that column. If we put in strong
>>> security on the creation, modification of views, and allow views to execute
>>> as a different user that has access to source data, you have a powerful way
>>> to represent your data to all levels within your org.
>>>
>>>  Also: Since I am just brain storming here, I'd love to hear what
>>> others maybe doing around this area. Perhaps the Hive User Community can
>>> come up with a strategic plan, while at the same time share some shorter
>>> term workarounds.
>>>
>>>  Thanks!
>>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: Hive Authorization and Views

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Also we have all external tables to ensure that accidental dropping of tables does not delete data…Plus the good part of HDFS architecture is data is immutable….which means u cannot update rows….u can move partitions or delete/insert data from hdfs which IMHO is very cool….but may not solve all use cases
Regards
sanjay

From: Edward Capriolo <ed...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, May 16, 2013 2:05 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Hive Authorization and Views

The largest issue is that the RDBMS security model does not match with hive. Hive/Hadoop has file permissions, RDMBS have column and sometimes row level permissions.

When you physically have access to the underlying file (row level) permissions are not enforceable. The only way to enforce this type of security is to force users through a "turnstyle" that changes how hive currently works.

On Thu, May 16, 2013 at 4:42 PM, John Omernik <jo...@omernik.com>> wrote:
I am curious on the thoughts of the community here, this seems like something many enterprises would drool over with Hive... I am not a coder so the level coding involved something like this is unknown.

On Sat, May 4, 2013 at 8:31 AM, John Omernik <jo...@omernik.com>> wrote:
We were doing some tests this past week with hive authorization, one of our current use "challenges" is when we have an underlying, well managed and partitioned table, and we want to allow access to certain columns in that table.  Our first thoughts went to VIEWs as that's a common use case with Relational Databases, (i.e. setup a view with only the columns you want the user to access) and set the permissions appropriately.

In testing, and this is not surprising given the the "newness" of Hive Authorization, a VIEW can not be created as to allow access to to a table without granting access to the underlying table, defeating the idea of the view as tool to manage that access.

So I wanted to put to the user group: I've done some JIRA searching and didn't find anything (I will admit my JIRA search Foo is not stellar), but is there an option that could be thrown together in Hive that would allow that use case?  Perhaps a configuration setting that would allow views to execute as a specific user (perhaps a global user, or perhaps a user specified as view creation).  This could allow the "view" to have access to underlying table, but since the view is created, and it couldn't be changed by the user, and thus you could set view "read" permissions to your user or group of users you want access.

I suppose this has challenges "i.e. can a user just create a view to bypass table level restrictions? Perhaps if this model was taken, the privilege for CREATING/MODIFYING views could be created and granted only to a superuser of some sort.  I am really just walking through ideas here as this is the one last stumbling blocks we have with Hive from an "Enterprise ready" point of view. Heck, if done right, you could almost do data masking at the view level. You have a column in your source data that is sensitive, so instead of returning that column you do a MD5 (can we have a native MD5 function? :) of that column or you blank that column. If we put in strong security on the creation, modification of views, and allow views to execute as a different user that has access to source data, you have a powerful way to represent your data to all levels within your org.

Also: Since I am just brain storming here, I'd love to hear what others maybe doing around this area. Perhaps the Hive User Community can come up with a strategic plan, while at the same time share some shorter term workarounds.

Thanks!

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Hive Authorization and Views

Posted by Edward Capriolo <ed...@gmail.com>.

The largest issue is that the RDBMS security model does not match with
hive. Hive/Hadoop has file permissions, RDMBS have column and sometimes row
level permissions.

When you physically have access to the underlying file (row level)
permissions are not enforceable. The only way to enforce this type of
security is to force users through a "turnstyle" that changes how hive
currently works.




On Thu, May 16, 2013 at 4:42 PM, John Omernik <jo...@omernik.com> wrote:

> I am curious on the thoughts of the community here, this seems like
> something many enterprises would drool over with Hive... I am not a coder
> so the level coding involved something like this is unknown.
>
>
> On Sat, May 4, 2013 at 8:31 AM, John Omernik <jo...@omernik.com> wrote:
>
>> We were doing some tests this past week with hive authorization, one of
>> our current use "challenges" is when we have an underlying, well managed
>> and partitioned table, and we want to allow access to certain columns in
>> that table.  Our first thoughts went to VIEWs as that's a common use case
>> with Relational Databases, (i.e. setup a view with only the columns you
>> want the user to access) and set the permissions appropriately.
>>
>> In testing, and this is not surprising given the the "newness" of Hive
>> Authorization, a VIEW can not be created as to allow access to to a table
>> without granting access to the underlying table, defeating the idea of the
>> view as tool to manage that access.
>>
>> So I wanted to put to the user group: I've done some JIRA searching and
>> didn't find anything (I will admit my JIRA search Foo is not stellar), but
>> is there an option that could be thrown together in Hive that would allow
>> that use case?  Perhaps a configuration setting that would allow views to
>> execute as a specific user (perhaps a global user, or perhaps a user
>> specified as view creation).  This could allow the "view" to have access to
>> underlying table, but since the view is created, and it couldn't be changed
>> by the user, and thus you could set view "read" permissions to your user or
>> group of users you want access.
>>
>> I suppose this has challenges "i.e. can a user just create a view to
>> bypass table level restrictions? Perhaps if this model was taken, the
>> privilege for CREATING/MODIFYING views could be created and granted only to
>> a superuser of some sort.  I am really just walking through ideas here as
>> this is the one last stumbling blocks we have with Hive from an "Enterprise
>> ready" point of view. Heck, if done right, you could almost do data masking
>> at the view level. You have a column in your source data that is sensitive,
>> so instead of returning that column you do a MD5 (can we have a native MD5
>> function? :) of that column or you blank that column. If we put in strong
>> security on the creation, modification of views, and allow views to execute
>> as a different user that has access to source data, you have a powerful way
>> to represent your data to all levels within your org.
>>
>> Also: Since I am just brain storming here, I'd love to hear what others
>> maybe doing around this area. Perhaps the Hive User Community can come up
>> with a strategic plan, while at the same time share some shorter term
>> workarounds.
>>
>> Thanks!
>>
>
>

Re: Hive Authorization and Views

Posted by John Omernik <jo...@omernik.com>.

I am curious on the thoughts of the community here, this seems like
something many enterprises would drool over with Hive... I am not a coder
so the level coding involved something like this is unknown.


On Sat, May 4, 2013 at 8:31 AM, John Omernik <jo...@omernik.com> wrote:

> We were doing some tests this past week with hive authorization, one of
> our current use "challenges" is when we have an underlying, well managed
> and partitioned table, and we want to allow access to certain columns in
> that table.  Our first thoughts went to VIEWs as that's a common use case
> with Relational Databases, (i.e. setup a view with only the columns you
> want the user to access) and set the permissions appropriately.
>
> In testing, and this is not surprising given the the "newness" of Hive
> Authorization, a VIEW can not be created as to allow access to to a table
> without granting access to the underlying table, defeating the idea of the
> view as tool to manage that access.
>
> So I wanted to put to the user group: I've done some JIRA searching and
> didn't find anything (I will admit my JIRA search Foo is not stellar), but
> is there an option that could be thrown together in Hive that would allow
> that use case?  Perhaps a configuration setting that would allow views to
> execute as a specific user (perhaps a global user, or perhaps a user
> specified as view creation).  This could allow the "view" to have access to
> underlying table, but since the view is created, and it couldn't be changed
> by the user, and thus you could set view "read" permissions to your user or
> group of users you want access.
>
> I suppose this has challenges "i.e. can a user just create a view to
> bypass table level restrictions? Perhaps if this model was taken, the
> privilege for CREATING/MODIFYING views could be created and granted only to
> a superuser of some sort.  I am really just walking through ideas here as
> this is the one last stumbling blocks we have with Hive from an "Enterprise
> ready" point of view. Heck, if done right, you could almost do data masking
> at the view level. You have a column in your source data that is sensitive,
> so instead of returning that column you do a MD5 (can we have a native MD5
> function? :) of that column or you blank that column. If we put in strong
> security on the creation, modification of views, and allow views to execute
> as a different user that has access to source data, you have a powerful way
> to represent your data to all levels within your org.
>
> Also: Since I am just brain storming here, I'd love to hear what others
> maybe doing around this area. Perhaps the Hive User Community can come up
> with a strategic plan, while at the same time share some shorter term
> workarounds.
>
> Thanks!
>