You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/11/22 00:05:44 UTC

[jira] Created: (HIVE-78) Authentication infrastructure for Hive

Authentication infrastructure for Hive
--------------------------------------

                 Key: HIVE-78
                 URL: https://issues.apache.org/jira/browse/HIVE-78
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Ashish Thusoo
            Assignee: Ashish Thusoo


Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756823#action_12756823 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

@Min

I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all. 

It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well.




> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-78:
-------------------------

    Attachment: hive-78-metadata-v1.patch

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929896#action_12929896 ] 

John Sichi commented on HIVE-78:
--------------------------------

It looks like HIVE-78.1.nothrift.patch still has a bunch of thrift-generate files in it (metastore/src/gen-javabean/org/apache/hadoop/hive/metastore/api/*)


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-78:
-------------------------

    Attachment: createuser-v1.patch

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924039#action_12924039 ] 

Carl Steinbach commented on HIVE-78:
------------------------------------

@Namit: I think it's fine to take an incremental approach with this, but then it's important
to spell out what the known security holes are so users and administrators
know what they're getting. Otherwise we're going to spend a lot of time answering
questions on the hive-user list.



> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758112#action_12758112 ] 

Min Zhou commented on HIVE-78:
------------------------------

@Namit

Got your meaning.  We are maintaining a version of our own, it needs couples of weeks for adapting  to the trunk.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698305#action_12698305 ] 

Min Zhou commented on HIVE-78:
------------------------------

Is there any further progess on this issue?

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923667#action_12923667 ] 

He Yongqiang commented on HIVE-78:
----------------------------------

The other option we came up from offline discussion is the rule of "one accept then accept" but in a hierarchy style. First check privileges granted the user and groups. One accept then accept; One deny then deny. And then check role level privileges, one accept then accept; one deny then deny.

We prefer to go with this rule. Please comment, and if no concerns on this, i will update the wiki.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757998#action_12757998 ] 

Namit Jain commented on HIVE-78:
--------------------------------

Looking at Min's patch createuser-v1.patch, 

I dont think we need create user/drop user etc. at all.

As Edward mentioned before,
When HWI starts the session on behalf of the user it runs "SET hadoop.ugi={what user entered in the test box}" at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember).

the user is always available from the environment and for now, let us assume that all authorizations happen to that user.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756936#action_12756936 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

@Min
 
I would think the code should apply to any client cli, hive server, or HWI. 

We should probably also provide a configuration variable 

{noformat}
<property>
   <name>hive.authorize</name>
   <value>true</value>
</property>
{noformat}

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923598#action_12923598 ] 

Namit Jain commented on HIVE-78:
--------------------------------

Please comment - we would like to hear all use cases before finalizing the design.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652091#action_12652091 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

On the first looks Realms seems to be a nice fit for this problem.

One capability that is missing there and which may become an issue later is the ability to compose roles into higher level roles. To me it seems that roles are strictly flat and are not hierarchical, so I cannot create an admin role that has the basic roles within it . Can this be achieved with Realms? I have not used it before so I am not sure that if it is achievable?

The other issue that I can think of is whether Realms is generic enough to protect any kind of a resource and not just limited to web resourrces. We have tables and partitions, servers etc. Could you elaborate on how this would work for the capabilities that I listed in my previous comments.



> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699270#action_12699270 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

>> 1) What would be the syntax to create user/passwd combos and logging in?
username and password would come externally. I notice a hadoop Jira on authenticate via Kerb4 and LDAP. We are best off splitting the authentication and authorization as we spoke of above. user and group are your external posix groups

>> 2) Are the permissions stored in metastore are per user or per table or a combo? 
They should be stored in the metastore.  a rule like GRANT * on '*' TO '*' AS my_permission would have to be stored everywhere and that would be a PITA.

>> 3) Do we really need groups? I don't think MySQL implements groups
 The group is your posix login group. Allowing groups is a simple way to reduce the number of per user rules.

>> 4) 
Right again. The separation here is we let the authentication system carry all the burden of username, groups and password. The metastore is only concerned with what that user can do inside hive. 

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923734#action_12923734 ] 

He Yongqiang commented on HIVE-78:
----------------------------------

By-passing the hdfs permission from hive layer is just one option. And the implementation should also support setting user groups in the hdfs side. And let the mapreduce job run as the user.

Just a quick update about the authorization rule:

In the offline discussion we had internally this afternoon, remove DENY should also another option to be considered. And we examined our use cased with this (without DENY), it works. So remove DENY from the authorization will simplify the implementation a lot.

And regarding view and index, for the first version, we should not do that. And we can do them later when we have a better understanding after we implement the first version.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923671#action_12923671 ] 

He Yongqiang commented on HIVE-78:
----------------------------------

Sorry, in the previous comment: by "one accept then accept; one deny then deny", i mean "Accept overwrite deny. one accept then accept; no accept then deny"

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-78:
------------------------------

    Component/s: Server Infrastructure

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650271#action_12650271 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

+1 on this.

I also wanted to integrate this with AD through kerberos as that is perhaps the dominant user repositories in most enterprises and at least internally we have some users that do not have unix accounts (mostly analysts). We could use samba to provide the bridge to AD as there are certain nuances when it comes to Kerberos with AD as well as NTLM and NTLMv2 auths that samba has already solved.

Also we should also think of providing integration with unix accounts - those maintained in passwd db specially for folks who want to just test authentication specific features.

In the past the most dominant directories that I have found in enterprise environments as AD (can be bridged through LDAP and Samba), Sun Java One, Novell and OID (all LDAP directories) and Unix accounts.

Thoughts?


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757252#action_12757252 ] 

Namit Jain commented on HIVE-78:
--------------------------------

coping a earlier comment from the jira:

I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes.

I agree with the above - it might be a good idea to not do password handling in hive in the first step - we can add it later if need be. Let us assume that the user has already been authenticated by some external entity,
and proceed from there. What do you think ?

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756662#action_12756662 ] 

Namit Jain commented on HIVE-78:
--------------------------------

I think, we should spend some time on finalizing the functionality before implementing it - it is very difficult to change something once it is out, due to all kinds of backward compatibility issues.

For the syntax, AS

wont it be simpler to add permissions to a role, and then assign roles to a user.



GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission

ALTER GRANT my_permission add USER 'USER3'


Can I revoke some privileges from my_permissions ?

If yes, how is it different from doing the two things differently ?


CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ;
GRANT my_permission to USER1, USER2;

later

GRANT my_permission to USER3;

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756068#action_12756068 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

Min,

First, let me say you have probably come along much farther then me on this issue.

Your approach is too strong. Hive is an open-community process. Through it is not very detailed we have loosely agreed on a spec (above), in that spec we have decided not to store username/password information in hive. Rather upstream is still going to be responsible for this information. We also agreed on syntax.

You should not throw up a new spec, and some code, and say something along the lines of  "We are going to take over and do it this way". Imagine if each jira issue you working on you were 20% to 50% done. And then someone jumped in and said "I already finished it a different way", that would be rather annoying. It would be a "first patch wins" system. 

First, before you are going to write a line of code you should let someone know your intention to work on it. Otherwise what is the point of having two people work on something where one version gets thrown away? It is a waste, and this would be the second issue this has happened to me. 

Second even if you want to starting coding it up it has to be what people agreed on. We agreed not to store user/pass (hadoop will be doing this upstream soon), and we agreed on syntax, if you want to reopen that issue you should discuss it before coding it. It has to be good for the community, not just your deployment.

So where do we go from here? Do we go back to the design phase and describe all the syntax we want to support?

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757615#action_12757615 ] 

Min Zhou commented on HIVE-78:
------------------------------

well, I've written a another Authorizer like your Authenticator yesterday
{noformat}
public enum Privilege {
   SELECT_PRIV,
   INSERT_PRIV,
   CREATE_PRIV,
   ALTER_PRIV,
   DROP_PRIV,
   CREATE_USER_PRIV,
   GRANT_PRIV,
   SUPER_PRIV
}
public interface Authenticator {
  public boolean authenticate(Privilege priv);
  public boolean authenticate(Privilege priv, Table table);
  public boolean authenticate(Privilege priv,  List<Table> table);
}

public class GenericAuthenticator {
  public GenericAuthenticator (Hive db, User user);
   ...
}
{noformat}
and added a Authenticator instance info thread local SessionState.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919104#action_12919104 ] 

Namit Jain commented on HIVE-78:
--------------------------------

Is anyone working on this ?

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757621#action_12757621 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

@Min, 

I think you are on the right track. I think you might have your terminology mixed up. In AAA
The first A is authentication which usually implies supply a user/password.
second A authorize means what privileges the user has 
third A is accounting ( we already have that)

The interfaces you supplied above looks like an Authorizer.... not Authenticator. I think 

{noformat}
public interface Authorizer {
  public boolean authorize(Privilege priv);
  public boolean authorize(Privilege priv, Table table);
  public boolean authorize(Privilege priv,  List<Table> table);
}
{noformat}

But you seem to be on a role. I will hang back and wait to see what you come up with.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698771#action_12698771 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

My last comment is a blocker in my mind. How can we implement complex access controls at the Hive level when we have basic file ownership issues at the file level? Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? How is this  this effect the CLI that will run as the individual user? 

These are not as much Hive issues as they are environment/setup issues, but I do not want to assume my environment is the target environment. Will we be assuming users are members of a 'hive' posix group or that all the files in the warehouse are owned by user 'Hive' group 'Hive'? I wanted to get others opinion on this.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699247#action_12699247 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

GRANT 
* 	SELECT
*	ALTER
*	INSERT	
*	UPDATE --RESERVED
*	DROP
*	CREATE

GLOBAL GRANT PERMISSIONS
* PROCESS_LIST -List Query 
* PROCESS_KILL -Kill query
* RC - start shutdown
* WITH_GRANT - Give user permission to grant other permissions

SPECIAL
* 'ALL' ALL PERMISSIONS 

Target Objects: ALL, DataBase, Table, Partition, Column
	
* Permissions are additive
* Upper level implies lower level i.e. select on table implies select on all columns in table

Suggested Syntax
* GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission
* GRANT SELECT ON 'cat1','cat2' TO 'USER1' AS my_permission
* GRANT SELECT ON 'cat1.*', 'cat2.homes.name'  TO 'USER4', '%GROUP1' AS my_permission
* GRANT SELECT on 'cat1.*', 'cat2.homes.PARTITION="5.5.4".owner' TO 'USER5' AS my_permission

In the metastore we can store the permissions like this:
PERMISSION SET {
	Vector <User|GROUP> ,
	Vector <TargetObject>,
 	Vector <PRIV>,
	String Name
}

	

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756949#action_12756949 ] 

Min Zhou commented on HIVE-78:
------------------------------

I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682719#action_12682719 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

We also have to look at this on the file system level. For example, files in my warehouse are owned by the user who created the table.

{quote}
/user/hive/warehouse/edward      <dir>           2008-10-30 17:13        rwxr-xr-x       edward supergroup
{quote}

Regardless of what permissions are granted in the metastore (via this jira), hadoop ACL governs what a user can do to that file. 

This is not an issue in mysql. In a typical mysql deployment all of the data files are owned by a mysql user. 

I do not see a clear cut solution for this. 

In one scenario we make sure all the files in the warehouse are owned RW to all, or owned by a specific user. A component like HiveServer, CLI, or HWI would decide if the user action would succeed based on the meta data.

The other option is that an operation like 'GRANT SELECT' would have to physically modify the Hadoop ACL/owner. This method will not help us get the fine grained control we desire.
 

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Amr Awadallah (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802720#action_12802720 ] 

Amr Awadallah commented on HIVE-78:
-----------------------------------

I am also very curious what is latest on this jira, no updates since Sept of last year. Min, did you stop working on this? 

-- amr


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755876#action_12755876 ] 

Min Zhou commented on HIVE-78:
------------------------------

we will take over this issue, it would be finished in two weeks.  Here are the sql statements will be added:
{noformat}
CREATE USER, 
DROP USER;
ALTER USER SET PASSOWRD;
GRANT;
REVOKE
{noformat}

Metadata is stored at some sort of persistent media such as mysql DBMS through jdo.  We will add three tables for this issue, they are USER, DBS_PRIV, TABLES_PRIV. Privileges can be granted at several levels, each table above are corresponding to a privilege level. 
#  Global level
Global privileges apply to all databases on a given server. These privileges are stored in the USER table. GRANT ALL ON *.* and REVOKE ALL ON *.* grant and revoke only global privileges. 
GRANT ALL ON *.* TO 'someuser';
GRANT SELECT, INSERT ON *.* TO 'someuser';

#  Database level
Database privileges apply to all objects in a given database. These privileges are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON db_name.* grant and revoke only database privileges. 
GRANT ALL ON mydb.* TO 'someuser';
GRANT SELECT, INSERT ON mydb.* TO 'someuser';
Although we can't create DBs currently,  it would take a reserved place till hive support.

# Table level
Table privileges apply to all columns in a given table. These privileges are stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL ON db_name.tbl_name grant and revoke only table privileges. 
GRANT ALL ON mydb.mytbl TO 'someuser';
GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser';

Hive account information is stored in USER table, includes username, password and kinds of privileges. User who has been granted any privilege to, such as select/insert/drop on a particular table, always have a right to show that table.



> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Royce Rollins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775117#action_12775117 ] 

Royce Rollins commented on HIVE-78:
-----------------------------------

I'm very interested in working on this issue this week but don't want to tread on anyone's work.  What's the status?  
is anything checked in yet.  I'd like to get this done as soon as possible. 

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699306#action_12699306 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here.

About the syntax:

1.  I am not sure what AS is used for.
2. column level permissions are good but they can perhaps be addressed with views and treating permissions on views as we do for tables.
3. I would add the key word TABLE in the GRANT statement, like mysql because we may have permissions on User defined functions and types in future... so something like..
   GRANT SELECT ON TABLE 'cat1' TO 'USER1' 
4. Also maybe in the TO clause make the user and group explict - TO USERS a, b, c GROUPS g1, g2  otherwise the reader of the command may not know what is a group and what is a user. I presume this would also make the authorization logic somewhat simpler as you would know exactly what to look for?

About the blocker that you mentioned, we should perhaps let the hadoop file permissions be independent of Hive ACLs. Of course you need both to be able to do anything on the table. Can be tricky though.. Will spend a bit more time thinking about this - this looks pretty cool...


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699253#action_12699253 ] 

Prasad Chakka commented on HIVE-78:
-----------------------------------

This is great. I have few questions..
1) What would be the syntax to create user/passwd combos and logging in?
2) Are the permissions stored in metastore are per user or per table or a combo? 
3) Do we really need groups? I don't think MySQL implements groups.
4) I am totally naive in authentication systems, but I am assuming only access details are stored in metastore and authentication is done by one of the systems discussed. is that correct?

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-78:
-----------------------------

    Attachment: HIVE-78.1.nothrift.patch
                HIVE-78.1.thrift.patch

Attach two patches. One is including thrift generated code in case anyone wants to try it. 
The other is just java code changes for a clean review.

These two patches only contains DDL and metadata changes. There are no integration code with query execution part. will do that in the following patch.

Some examples:

    > show grant user `test` on table `src`;                 
OK
Time taken: 0.081 seconds

hive> grant `select` on table src to user test, group grp;   
OK
Time taken: 0.118 seconds

hive> show grant user `test` on table `src`;              
OK
dbName:default
tableName:src
userName:test
isRole:false
isGroup:false
privileges:Select
grantTime:1288850969
grantor:
grantor:
Time taken: 0.09 seconds

hive> show grant group `grp` on table `src`;                 
OK
dbName:default
tableName:src
userName:grp
isRole:false
isGroup:true
privileges:Select
grantTime:1288850969
grantor:
grantor:
Time taken: 0.08 seconds

hive> revoke `select` on table src from user test;           
OK
Time taken: 0.041 seconds

hive> show grant user `test` on table `src`;      
OK
Time taken: 0.078 seconds

hive> show grant group `grp` on table `src`;      
OK
dbName:default
tableName:src
userName:grp
isRole:false
isGroup:true
privileges:Select
grantTime:1288850969
grantor:
grantor:
Time taken: 0.079 seconds

>grant `select`(key, value) on table src to user test;
OK
Time taken: 0.174 seconds

> show grant user `test` on table `src`(key);       
OK
dbName:default
tableName:src
columnName:key
userName:test
isRole:false
isGroup:false
privileges:Select
grantTime:1288851160
grantor:
grantor:
Time taken: 6.722 seconds

hive> show grant user `test` on table `src`(key, value);
OK
dbName:default
tableName:src
columnName:key
userName:test
isRole:false
isGroup:false
privileges:Select
grantTime:1288851160
grantor:
dbName:default
tableName:src
columnName:value
userName:test
isRole:false
isGroup:false
privileges:Select
grantTime:1288851160
grantor:
grantor:


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923597#action_12923597 ] 

Carl Steinbach commented on HIVE-78:
------------------------------------

Authorization proposal on the wiki: http://wiki.apache.org/hadoop/Hive/AuthDev


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652252#action_12652252 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

The roles are actually per object. I would say that these are atleast per table, if not per partition. I don't have a use case for the later but seperation on the basis of table is actually very very desirable.

Given that, and the fact that currently we have around 5000 tables in our warehouse, do you have some idea of how realms with scale with such a large number of objects.

I agree that a generic recursive role infrastructure does not have a lot of utility, but considering that we have so many permissions, I would think that it would be quite cumbersome for an administrator to enumerate all of them for every user that is created (though some good defaults can surely alleviate some of the concerns here). So I think being able to package permissions into some higher level roles would help. Note that we do not need a generic role within a role, but it would be nice to have a role be a set of permissions on certain objects and an ability to allow authorization framework to be able to associate a role or permission with a user.

The other way to do this is to define groups which can be assigned a set of permissions and a set of users. That level of indirection would also work in reducing the number of user to permission assignments that we would have to make otherwise.

I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out  and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes.

If we do that separation, then Realms perhaps can take care of just the authentication portion, and once the user is authenticated, the authorization infrastructure looks up the user by ID in metastore to figure out what capabilities the user has.

Is that what you have in mind?

In this scenario, I presume that we would have a realm for AD and just have all the users authenticate with that realm. So the number of realms would be a function of the number of directories or user repositories as opposed to being a function of the number of objects.


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923733#action_12923733 ] 

Carl Steinbach commented on HIVE-78:
------------------------------------

The issue that Todd raised is pretty important and needs to be addressed in the proposal.
My personal opinion is that running all queries as a "hive" super-user is the most
practical approach and will also yield behavior that is familiar to users of traditional
RDBMS systems (who I expect will increasingly define the average Hive user/administrator).

There are some other follow-on issues that need to be decided if we end up settling
on this approach:

* This approach to authorization presupposes that users are accessing Hive through a HiveServer process. This follows from the fact that A) you want Hive to execute the query plans as the Hive superuser, and B) that user can circumvent the authorization model if they are given direct access to the MetaStore DB. It would be nice if the proposal explicitly stated this requirement and mentioned some of the follow-on work that this necessitates, e.g. fixing concurrency issues in HiveServer, reducing the memory requirements of HiveServer, etc.

* We need to apply the authorization model to the '{{add [archive|file|jar]}}' commands as well as {{add temorary function}}. {{add jar}} and {{add file}} both currently allow the user to inject code into MR jobs, and {{add jar}} in conjunction with {{add temporary function}} allows the user to inject and execute arbitrary code within the HiveServer process. We may also want to add a new {{add executable}} command for adding executable scripts that has a different permission model than {{add file}}.

* I think there also may be security issues stemming from external tables, e.g. if I create an external table that points to another user's home directory and then run a query on it which executes with Hive's superuser permissions.

* Loading date into the Hive warehouse from an arbitrary HDFS location and exporting data to other locations in HDFS are two issues that need to be considered. In each case I think the correct behavior depends on both the Hive process's permissions and those of the user.




> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923624#action_12923624 ] 

dhruba borthakur commented on HIVE-78:
--------------------------------------

Can somebody pl comment on how this ties in with HDFS permission/authorization? There is a small subsection in the doc about this issue, but I am unable to understand that part.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756335#action_12756335 ] 

Min Zhou commented on HIVE-78:
------------------------------

@Edward

Sorry for my abuse of some words,  I hope this will not affect our work.  

Can you give me the jiras you decided not to store username/password information in hive and hadoop will?  	
I think most companies are using hadoop versions from 0.17 to 0.20 , which don't have good password securities. Once  a company takes a particular version, upgrades for them is a very important issue, many companies will adopt a more stable version. Moreover, now hadoop still do not have that feature, which may cost a very long time to implement.  Why should we are waiting for, rather than accomplish it? I think Hive is necessary to support user/password at least for current versions of hadoop. There are many companies who are using hive reflected that current hive is inconvenient for multi-user, as long as environment isolation, table sharing, security, etc. We must try to meet the requirements of most of them.

Regarding the syntax, I guess we can do it in two steps. 
# support GRANT/REVOKE privileges to users.
# support some sort of server administration privileges as Ashish metioned. 
The GRANT statement enables system administrators to create Hive user accounts and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION privilege, and you must have the privileges that you are grantingad. The REVOKE statement is related and enables ministrators to remove account privileges.

 File hive-78-syntax-v1.patch modifies the syntax. Any comments on that?


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo reassigned HIVE-78:
---------------------------------

    Assignee: Edward Capriolo  (was: Ashish Thusoo)

Assigning to Edward as he is going to start on this... Thanks Edward!!

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923719#action_12923719 ] 

Todd Lipcon commented on HIVE-78:
---------------------------------

I'm a little unclear on how the user identity is passed down to the MR layer. Carl and I had chatted about this a few weeks back -- is the idea now that all hive queries will run MR jobs as a "hive" user, rather than "todd"? If so, we need to add authorization control for UDFs and TRANSFORM as well, since a user could trivially take over the "hive" user credentials from within a UDF. If the MR jobs will continue to run as "todd", then I don't understand how we can apply any permissions model that is any different than HDFS permissions. More restrictive is impossible because I can just read the files myself, and less restrictive is impossible because HDFS is applying permissions based on the "todd" identity.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-78:
-------------------------

    Attachment: hive-78-syntax-v1.patch

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757616#action_12757616 ] 

Min Zhou commented on HIVE-78:
------------------------------

sorry, 
{nofromat}
public class GenericAuthenticator extends  Authenticator {
  public GenericAuthenticator (Hive db, User user);
   ...
}
{nofromat}

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650289#action_12650289 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

I wanted to mention one more solution. JDBCRealm. This is pretty well established in tomcat. It should be easy to retrofit. It has support for roles.
Password file is a good solution as well. 

Q. Active Directory is an LDAP at its core. What is a case that you need samba to get at data in LDAP? It seems like we should be able to support active directory and LDAP using JNDI-- http://forums.sun.com/thread.jspa?threadID=581425 

I was thinking about 'roles'. hiveuser - Can issue queries kill their own queries ,  hiveadmin - can kill users queries

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756904#action_12756904 ] 

Min Zhou commented on HIVE-78:
------------------------------

Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user  like mysqld does.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-78:
-------------------------------

    Component/s: Query Processor
                 Metastore

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755882#action_12755882 ] 

Min Zhou commented on HIVE-78:
------------------------------

 We currently use seperated mysql dbs for achieving an isolated CLI environment, which is not practical. An authentication infrastructure is urgently needed for us.

Almost all statements would be influenced, for example
SELECT 
INSERT
SHOW TABLES
SHOW PARTITIONS
DESCRIBE TABLE
MSCK
CREATE TABLE
CREATE FUNCTION -- we are considering how to control people creating udfs.
DROP TABLE
DROP FUNCTION
LOAD
added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even includes some non-sql commands like set , add file ,add jar. 


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652083#action_12652083 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

I would like to leverage the 'REALM' has already been done with tomcat. This would give us the ability to plug into many standard authentication architectures.
http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/realm/package-tree.html

If we including a jar file in a binary format from tomcat should it be part of the patch or should we fork some of the tomcat source? We should have not have to alter the original code we will be using it directly or extending it.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923642#action_12923642 ] 

He Yongqiang commented on HIVE-78:
----------------------------------

@dhruba
HDFS has its own authorization. So if we allow an access in Hive layer and pass this access to HDFS (by setting the correct hdfs username and groups), the job can fail with HDFS permission problem. 
So need to solve the problem from 2 layer independent authorization.
One way to allow all accesses to HDFS, and let hive do the authorization. So hive runs as root in terms of HDFS.
The other way is to plug in HDFS authorization to Hive layer, and only accept one access if both of Hive and HDFS say YES.  A user belongs to different unix groups, and set hdfs permission based on the unix group. [ I am not sure about how many groups a user can have in terms of HDFS. I mean how many group settings you can put to a hdfs file. Let's simply say i want these 2 groups to be able to read the file.]  The another problem is the column level privileges.
This is very open for discussion, please comment on it.


About the proposal, there is one authorization rule that we are not sure about. It's the simple rule: one deny then deny.

Let's say this example:
5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, and then later i want to protect one table db_name.T from ALL users but a few
1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.*
2) Add those few users to a new group 'users2'. AND REMOVE them from 'users'
3) DENY 'users' to db_name.T
4) Grant ALL on db_name.T to users2

The main problem in this approach is that "REMOVE them from 'users'" is not practicable. 


The other options that we have thought about is another rule.

First try user name:

first try to deny this access by look up the deny tables by user name:

1. If there is an entry in 'user' that deny this access, return DENY
2. If there is an entry in 'db'  that deny this access, return DENY
3. If there is an entry in 'table'  that deny this access, return DENY
4. If there is an entry in 'column'  that deny this access, return DENY

If we got one deny, will return deny for this attempt.

if deny failed, go through all privilege levels with the user name:

5. If there is an entry in 'user' that accept this access, return ACCEPT
6. If there is an entry in 'db'  that accept this access, return ACCEPT
7. If there is an entry in 'table'  that accept this access, return ACCEPT
8. If there is an entry in 'column'  that accept this access, return ACCEPT


Second try the user's group/role names one by one until we get an ACCEPT. If we get an ACCEPT from one group/role, will ACCEPT this access. Else deny.

For each role/group, we do the same routine as we did for user name.
The problem with this approach is it's a little bit complex and we did not find any system that use this. For mysql, there is no deny. For sql server, it's one deny then deny.


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924032#action_12924032 ] 

John Sichi commented on HIVE-78:
--------------------------------

(implementation note)

If we really need multiple metastore tables, let's name them consistently:

user_priv
db_priv
tbl_priv
col_priv


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757299#action_12757299 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

@namit,

Yes, I agree/agreed. I was off topic there, describing how we could do it if we wanted to. I will open a separate Jira for that. 

Upcoming at Hadoop World NYC someone is going to present the new authentication code in Hadoop, I would like to watch that then we(I) might better understand what the long term strategy is for Hadoop. I will split off authentication and authorization into two separate Jira to avoid confusion.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated HIVE-78:
--------------------------------

    Summary: Authorization infrastructure for Hive  (was: Authentication infrastructure for Hive)

This deals with authorization not authentication

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652162#action_12652162 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

Recursive Role processing is probably not possible with JDBCRealm.

Recursive Role processing is generally difficult to implement. N.I.S. Net Groups is an example of this, because of the recursive nature you have a more complicated implementation. Firstly, you have to check for loops in the group definition. Role1 memberOf-> Role2-> memberOf Role3-> memberOf ->Role1. This needs to be done when the rule is created, or evaluated, or both. I have found (in my experience) dynamic/recursive groups are are less practical then they originally seem. They do have merit however.

The roles you mentioned were:
    *  SELECT
    * INSERT
    * ALTER TABLE
    * CREATE
    * DROP
    * KILL SESSION(QUERY)
    * SHUTDOWN
    * STARTUP
    * VIEW SESSIONS

IMPORTANT: Are roles global or per object? Realms really only make sense with global permissions.

Lets look at a scenario:

* Hive
** tableA
** tableB
** tableC

* Users
** john
*** uid 3000
*** gid 3000,4000
** bob
*** uid 3001
*** gid 3001,4000

* Groups
** john
*** gid 3000
** bob
*** gid 3001
** hr
*** gid 4000

Goal to implement root has full access to all tables, john has access to table a, and bob has access to table b. tablec can be read by anyone in hr

* Realms
**  tableA_select
*** root
*** john
** tableA_insert
*** root
*** john
** tableB_select
*** root
*** bob
** tableB_insert
*** root
*** bob
** tableC_select
*** root
*** bob
*** john

Using '_' as a delimiter and constructing several roles per table is a slightly non standard for realms, but it would work. User lists are flat. 

About these permissions:
    *  SELECT
    * INSERT
    * ALTER TABLE
    * CREATE
    * DROP

If an external table was created. If my UID has access to the file through HDFS I would expect to have select access inside Hive. If I could not write the file in HDFS hive would not expect hive to give me these permissions. I think we should clearly define the difference between  AUTHENTICATION and ACCESS.

For example, the AUTHENTICATION information for a user is commonly stored in Active Directory. However ACCESS information like, what tables a user may run SELECT on can not be stored in Active Directory without changing the Active Directory schema.

Realm or JAAS gives us a quick way to answer the authorization question.  As to the ACCESS we either have to store that information in the meta store or an external system. 

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650407#action_12650407 ] 

Ashish Thusoo commented on HIVE-78:
-----------------------------------

For Active Directory I think JNDI will work as long as we work off GSSAPI - so I think Kerb V should work with JNDI.

However, the traditional authentication mechanisms of NTLM and NTLMv2, I think those will not work with AD as they are proprietary protocols and the only public domain implementations of those are present in Samba. They are mostly an issue for old machines and old directory installations. We may as well do JNDI for now and then 
address these later.

Will check out JDBCRealm, I have not used those in the past.

For query side roles we could just model those on mysql privileges. Some of the basic ones include:

- SELECT
- INSERT
- ALTER TABLE
- CREATE
- DROP

And on the server administration side, things like:
- KILL SESSION(QUERY)
- SHUTDOWN
- STARTUP
- VIEW SESSIONS

are useful...

We could role these privileges up into role objects so essentially your

hiveuser role would become SELECT, INSERT, CREATE
while hiveadmin would become KILL SESSION, SHUTDOWN, STARTUP, VIEW SESSIONS, DROP, ALTER + whatever is in hiveusers






> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757622#action_12757622 ] 

Min Zhou commented on HIVE-78:
------------------------------

oops,  my code wasn't in my machine. I just pasted yours and modified it into mine. 
here is a patch show my code on that.


> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699313#action_12699313 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

All those points make sense. 

>>1. I am not sure what AS is used for.
I am thinking AS is the way to name the PermissionSet. Imagine a rule like this:
{noformat}
GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission
{noformat}
At some point 'USER3' might become an administrator. It would be nice to issue a command like:  
{noformat}
ALTER GRANT my_permission add USER 'USER3'
{noformat}

It also makes the grant self documenting.

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929888#action_12929888 ] 

John Sichi commented on HIVE-78:
--------------------------------

https://reviews.apache.org/r/55/diff/#index_header

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757179#action_12757179 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

@Min
(this may be somewhat mistated but) Hadoop-Core gets the user/group information for a posix user by running shell commands like,
WHOAMI, GROUPS, ID, etc. The hive CLI will inherit this information as does HiveServer, HWI.

The hive web interface starts as the user sho ran the start script. The first screen on the web interface is a defacto log-in screen. This allows the user to enter their user and group information in text boxes. 

When HWI starts the session on behalf of the user it runs "SET hadoop.ugi={what user entered in the test box}" at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember).

My comment above is just referencing the fact that in some cases Hadoop ACL and our Hive authorization rules would conflict. IE
If the files were owned by mzhou. "saying grant delete to * user edward" would not give me privileges to drop files you owned. In that case sections of the HiveServer would have to run as superuser to elevate privileges, but we punted on that issue too. (We are like a football team with bad offense. always punting)

(If we were going to tackle password we could do it in this way)
I would think if we wanted to enforce strong user/password authentication we could do this 

{noformat}
<property>
   <name>hive.password.insession</true>
  <value>hive_password</value>
  <description>empty for no password checking, if defined this is the session variable to look for password"</descripton>
</property>
{noformat}

In this way QL would read this value and would not execute any task for the user unless they had  run "set hive_password=XYXYXYY"

Does that make sense? Session already holds the user. It could hold the password as well. Do you see anything wrong with that approach?

I will trim down some of the stuff I have and get upload it for reference



> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated HIVE-78:
--------------------------------

    Attachment: hive-78.diff

The metastore would be a good place to start the ball rolling. Any comments?



> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-78) Authorization infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-78:
------------------------------

    Assignee: He Yongqiang  (was: Edward Capriolo)

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924026#action_12924026 ] 

Namit Jain commented on HIVE-78:
--------------------------------

Overall, there are many security holes in the system. and we are not proposing to close all of them.

To start with, it is an attempt for good users, it is not meant for the malicious users - 
the idea is to prevent good users from committing a mistake.

> Authorization infrastructure for Hive
> -------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor, Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: He Yongqiang
>         Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756817#action_12756817 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

@namit,

I think, I can explain why AS made sense at the time. My plan was not to decouple users from a rule. See my little patch.

{noformat}
+struct AccessControl {
+  1: list<string>	user,
+  2: list<string>	group,
+  3: list<string>	database,
+  4: list<string>	table,
+  5: list<string>	partition,
+  6: list<string>	column,
+  7: list<string>	priv,
+  8: string		name
+}
{noformat}

I wanted to be more or less immutable or support really simple syntax.

Something like this is doable
{noformat}
GRANT my_permission to USER3;
{noformat}
But it seems to imply that users are decoupled from the rule. 
This is really not true (in my design) a user or group is just another multivalued attribute of the rule. 

I would like the format to be inter-changable 
{noformat}
ALTER my_permission add db 'db';
ALTER my_permission add table 'db.table';
ALTER my_permission drop table 'db.table';
{noformat}

@Min,
Above in this Jira see Ashish's comment..

{noformat}
I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. 
{noformat}

The goal here is to trust the User/group information as hadoop does, and create a system that grants/revokes privileges.  Authentication and Authorization are two separate things so our Jira is misnamed :)

I will review your patch, just to see what you came up with. As I said, you are farther along then I am, and this has been off my radar so I don't mind passing the baton, but Namit is right we have to agree on the syntax because and what we are controlling because down the road it will be an issue.





> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756951#action_12756951 ] 

Min Zhou commented on HIVE-78:
------------------------------

>From the words you commented:
{noformat}
Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? 
{noformat}

> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Ashish Thusoo
>            Assignee: Edward Capriolo
>         Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649855#action_12649855 ] 

Edward Capriolo commented on HIVE-78:
-------------------------------------

LDAP seems like a good way to handle this. We have a few alternatives. 

Any posixAccount can log into hive. LDAP search would be (&(objectClass=posixAccount (uid=<user>))

We could enforce that the user must be have some other attribute (&(objectClass)=posixAccount (uid=<user>)(businessCategory="hiveuser"))

We could enforce that the user must be valid and they must be inside of a specific groupOfUniqueNames 
(&(objectClass=posixAccount (uid=<user>)     and memberof (hiveGroup)  apache mod_ldap can do this

We can create a supplemental schema attribute we can append to already exists ldap users.


> Authentication infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-78
>                 URL: https://issues.apache.org/jira/browse/HIVE-78
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Allow hive to integrate with existing user repositories for authentication and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.