You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Edward Capriolo (JIRA)" <ji...@apache.org> on 2009/09/18 20:02:16 UTC

[jira] Created: (HIVE-842) Authentication Infrastructure for Hive

Authentication Infrastructure for Hive
--------------------------------------

                 Key: HIVE-842
                 URL: https://issues.apache.org/jira/browse/HIVE-842
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Server Infrastructure
            Reporter: Edward Capriolo


This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon reassigned HIVE-842:
--------------------------------

    Assignee: Todd Lipcon

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Venkatesh S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913706#action_12913706 ] 

Venkatesh S commented on HIVE-842:
----------------------------------

Sounds good to me.




> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918072#action_12918072 ] 

Pradeep Kamath commented on HIVE-842:
-------------------------------------

I tried applying this patch after applying the patch for HIVE-1264 and got the following compile errors which seem to suggest I am missing some jar (seems thrift related) - any pointers on how to resolve these errors?

{noformat}
build_shims:
     [echo] Compiling shims against hadoop 0.20.104.3.1007202301 (/tmp/hive-svn/build/hadoopcore/hadoop-0.20.104.3.1007202301)
    [javac] Compiling 8 source files to /tmp/hive-svn/build/shims/classes
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol
    [javac] symbol  : class TMemoryInputTransport
    [javac] location: class org.apache.thrift.transport.TSaslTransport
    [javac]   private TMemoryInputTransport readBuffer = new TMemoryInputTransport();
    [javac]           ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:354: cannot find symbol
    [javac] symbol  : method getBuffer()
    [javac] location: class org.apache.thrift.transport.TTransport
    [javac]       return wrapped.getBuffer();
    [javac]                     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:352: method does not override or implement a method from a supertype
    [javac]     @Override
    [javac]     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:359: cannot find symbol
    [javac] symbol  : method getBufferPosition()
    [javac] location: class org.apache.thrift.transport.TTransport
    [javac]       return wrapped.getBufferPosition();
    [javac]                     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:357: method does not override or implement a method from a supertype
    [javac]     @Override
    [javac]     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:364: cannot find symbol
    [javac] symbol  : method getBytesRemainingInBuffer()
    [javac] location: class org.apache.thrift.transport.TTransport
    [javac]       return wrapped.getBytesRemainingInBuffer();
    [javac]                     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:362: method does not override or implement a method from a supertype
    [javac]     @Override
    [javac]     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:369: cannot find symbol
    [javac] symbol  : method consumeBuffer(int)
    [javac] location: class org.apache.thrift.transport.TTransport
    [javac]       wrapped.consumeBuffer(len);
    [javac]              ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:367: method does not override or implement a method from a supertype
    [javac]     @Override
    [javac]     ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol
    [javac] symbol  : class TMemoryInputTransport
    [javac] location: class org.apache.thrift.transport.TSaslTransport
    [javac]   private TMemoryInputTransport readBuffer = new TMemoryInputTransport();
    [javac]                                                  ^
    [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:352: cannot find symbol
    [javac] symbol  : method encodeFrameSize(int,byte[])
    [javac] location: class org.apache.thrift.transport.TFramedTransport
    [javac]     TFramedTransport.encodeFrameSize(length, lenBuf);
    [javac]                     ^
    [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses or overrides a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 11 errors

{noformat}

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918239#action_12918239 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

Seems like the patch that updates Thrift has fallen out of date with trunk. I'll try to regenerate it ASAP. You can probably fix the above issues by (a) importing StageType in MapRedTask, and (b) replacing StatsTask.getType's return with the StageType enum. (the new version of Thrift uses java enums instead of ints to represent thrift enums)

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913787#action_12913787 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

I don't anticipate breaking the web UI (or anything) on non-secure Hadoop versions. But it will probably be insecure to run the web UI, which currently trusts users to say who they want to be - i.e I don't plan in the short term to integrate an auth layer for the web UI itself.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Venkatesh S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914408#action_12914408 ] 

Venkatesh S commented on HIVE-842:
----------------------------------

> Should the metastore always take HDFS actions as the user making the RPC?
Yes, metastore will run as a super-user (Hadoop proxy user) enabling DO AS operations and impersonate the target user while accessing data on HDFS.

> If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default?
I'd think so.

> should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?
Wouldn't this leave a hole as it currently exists?

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913439#action_12913439 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

As discussed at the last contributor meeting, I am working on authenticating access to the metastore by kerberizing the Thrift interface.

Plan is currently:
1) Update the version of Thrift in Hive to 0.4.0
2) Temporarily check in the SASL support from Thrift trunk (this will be in 0.5.0 release, due out in October some time)
3) Build a bridge between Thrift's SASL support and Hadoop's UserGroupInformation classes. Thus, if a user has a current UGI on the client side, it will get propagated to the JAAS context on the handler side.
4) In places where the metastore accesses the file system, use the "proxy user" functionality to act on behalf of the authenticated user.
5) When we detect that we are running on secure hadoop with security enabled, enable the above functionality.

I'd like to attack the Hive Web UI separately.

One open question:
- Do Hive *tasks* ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-842) Authentication Infrastructure for Hive

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-842:
----------------------------

    Attachment: HiveSecurityThoughts.pdf

For lack of a better place, uploading this doc from Venkatesh here so I can link it from wiki.


> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761880#action_12761880 ] 

Edward Capriolo commented on HIVE-842:
--------------------------------------

@Min I added you as a watcher on this issue I hope you do not mind.

At Hadoop world NYC I got to listen to Owen O'Malley do his presentation on Hadoop security. He also took some time to answer some questions for me.

In summary Hadoop 0.22 is going to have authentication at the RPC layer. This can be turned on and turned off through configuration in Hadoop. This authentication will be able to use Kerberos or active directories kerberos implementation. 

DFS is the easy case. You authenticate directly to it.
MapReduce is another beast. The job tracker/task tracker will have to run jobs as the user on the system! So my jobs will be run from my posix account ( I am not sure if this is inplace on only the JobTracker or the TaskTracker as well)

Programs that act as proxies like JobTracker might need a binary shim that starts them as root user then drops to a hadoop users, this is also required to run jobs as that user.

"Why kerberos?" I asked him. Kerberos allows a ticket to be created and attached to you session. This is because kerberos can create you a ticket that you can then pass onto the job tracker for example. Otherwise you would have to password/key on the job tracker itself which would be nasty to put your password in a jobconf.

So, it seems like proxy type applications like HWI and HiveServer may have to take some part in passing around the kerberos tickets.

The hadoop WebInterfaces will use Kerberos as well. SPNEGO is a protocol for this and it  has good cross browser support. So that is the future...

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918085#action_12918085 ] 

Pradeep Kamath commented on HIVE-842:
-------------------------------------

Hey Todd, I applied  the patches in the following sequence on current hive trunk:
hive-1264.txt, hive-842.txt and then HIVE-1526.2.patch.txt. The last one didn't apply cleanly for ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java - so I manually edited it based on the reject file. After that, I get the following compile error:

  [javac] Compiling 607 source files to /tmp/hive-svn/build/ql/classes
    [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:384: cannot find symbol
    [javac] symbol  : class StageType
    [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask
    [javac]   public StageType getType() {
    [javac]          ^
    [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:385: cannot find symbol
    [javac] symbol  : variable StageType
    [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask
    [javac]     return StageType.MAPREDLOCAL;
    [javac]            ^
    [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:214: getType() in org.apache.hadoop.hive.ql.exec.StatsTask cannot override getType() in org.apache.hadoop.hive.ql.exec.Task; attempting to use incompatible return type
    [javac] found   : int
    [javac] required: org.apache.hadoop.hive.ql.plan.api.StageType
    [javac]   public int getType() {
    [javac]              ^
    [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:215: cannot find symbol
    [javac] symbol  : variable STATS
    [javac] location: class org.apache.hadoop.hive.ql.plan.api.StageType
    [javac]     return StageType.STATS;
    [javac]                     ^
    [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:213: method does not override or implement a method from a supertype
    [javac]   @Override
    [javac]   ^


> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765362#action_12765362 ] 

Min Zhou commented on HIVE-842:
-------------------------------

@Edward

Kerberos for authethication is a good way I think,  user/password is no need here.  This issue would be implemented in the future.
btw, we've finished the development of authorization infrastructure for Hive.  

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Venkatesh S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913466#action_12913466 ] 

Venkatesh S commented on HIVE-842:
----------------------------------

>     *  Do Hive tasks ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive.
I learnt it from Alan and Pradeep that Howl uses the commit task to talk to the metastore. Hence we'll have to build the delegation token system.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918689#action_12918689 ] 

Pradeep Kamath commented on HIVE-842:
-------------------------------------

Hey Todd, I did the changes you mentioned and got it to compile. While trying to test it out I had to run the metastore as user whose keytab file only had a "user" principal and not a "service" principal - so I hacked the code in the patch a little to not check if the principal had the service/host@realm structure and I hardcoded the host name into the calls. With all these machinations I got the server to run and tried running "show tables" and got the following with loglevel DEBUG (on the client side):

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:95)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:254)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38)

Do you think this is because I don't have a "service" principal in the keytab used by the metastore? 

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914281#action_12914281 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

I have this basically working. A couple questions I wanted to run by people before posting a patch:

- Should the metastore always take HDFS actions as the user making the RPC? Or, for example, with a create table call, should it act as the "owner" specified in the thrift call regardless of the authenticated user? If the latter, what authorization mechanism do we need? (ie is there a use case where user A can make tables on behalf of user B?)

- Are there any metastore operations that should be done as a metastore principal, or should all HDFS access be done as the authenticated user?

- If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default? If SASL-thrift is not enabled, what user should the metastore act as? In other words, should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?


> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918076#action_12918076 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

Hey Pradeep. You also need HIVE-1526 which updates Hive to use Thrift 0.4.0.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913691#action_12913691 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be too hard to implement. If I recall correctly, though, the delegation tokens rely on a secret key that the master daemon periodically rotates. We need to add some kind of persistent token storage for this to work - I guess in the metastore's DB?

To make this easier to review, I'd like to do the straight kerberos first, and then add delegation tokens in a second patch/JIRA. Sound good?

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HIVE-842:
-----------------------------

    Attachment: hive-842.txt

Here's a "preview" patch of this work. A few notes:
- This checks in a bunch of Thrift classes that are in Thrift trunk. Thrift is currently in rc phase for an 0.5.0 release, so we can ditch these thrift classes out of Hive as soon as that's out (probably before this patch is even ready for commit)
- There are still some javadocs that could be improved a little bit.
- There's currently not any integration into the "guts" of Hive - we simply assume the calling user's identity as soon as the RPC is received. I think that's OK for the scope of this patch, as discussed above.

There's a bit of a lurking bug, I believe, due to HADOOP-6982, but it's shouldn't be major.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918745#action_12918745 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

Hey Pradeep. It sounds like it might be - I haven't seen that error before, but I also have only been testing with actual service principals (ie principals of the type metastore/<hostname>). 

You can try running both sides with HADOOP_OPTS="-Dsun.security.krb5.debug=true" and it should give you some extra details.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916687#action_12916687 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

bq. > should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?
bq. Wouldn't this leave a hole as it currently exists?

Yea - I think the use case is that you may have some old Thrift clients that haven't yet been updated to work with the SASL implementation (eg PHP). For those clients, perhaps you can provide security based on firewall rules, etc. But you would still like to run Hive on top of a secured HDFS.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757309#action_12757309 ] 

Edward Capriolo commented on HIVE-842:
--------------------------------------

hive.conf
{noformat}
<property>
  <name>hive.authenticate.class</name>
  <value>org.apache.hadoop.hive.auth.DefaultAuthenticator</value>
    <description>Use this setting to use your own authentication framework. LDAP mysql etc
  </description>
</property>
{noformat}

{noformat}
public interface Authenticator {
  public boolean authenticate(SessionState session);
}
class DefaultAuthenticator implements Authenticator{
  public boolean authenticate(SessionState session){
    return true;
  }
}
{noformat}

Thus the authentication is plugable

{noformat}
class SharedSecretAuthenticator implements Authenticator {
   public boolean authenticate(SessionState session){
      if (session.ss.getConf().getVar("USERNAME").equals("admind") &&
          session.ss.getConf().getVar("PASSWORD").equals("secret") )
            return true;
      return false;
  }
}
{noformat}

It would be trivial to them implement LDAP, Mysql, or other types of authentication. The call to the authenticator could be plugged in to the API anywhere a reference to the clients SessionState exists.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913741#action_12913741 ] 

Edward Capriolo commented on HIVE-842:
--------------------------------------

By attack the Web UI separately what is meant? Will it be broken or non-functional at any phase here? That is what I find happens often, some of it is really the WUI's fault for using JSP and not servlets, but there is no simple way to code cover the wui and all the different ways its gets broken. 

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924020#action_12924020 ] 

Pradeep Kamath commented on HIVE-842:
-------------------------------------

I looked at the issue of the server requiring restarts with Devaraj Das who worked on Hadoop security - he suggested a couple of changes (below) and that solved it - the server now does not need a restart.
Apparenlty UserGroupInformation.loginUserFromKeytabAndReturnUGI() does not set the loginUser member and UserGroupInformation.loginUserFromKeytab() does. He also suggested another change with not caching the realUser - both these changes are below:

{noformat}

In the following code 
 private Server(String keytabFile, String principalConf)
 TTransportException {
 ...

         realUgi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(
           kerberosName, keytabFile);
         assert realUgi.isFromKeytab();

I had to change above lines to the lines below:

         UserGroupInformation.loginUserFromKeytab(
           kerberosName, keytabFile);
         realUgi = UserGroupInformation.getLoginUser();


Likewise in:

      public boolean process(final TProtocol inProt, final TProtocol outProt) throws TException {              
        TTransport trans = inProt.getTransport();                                                              
	...
        UserGroupInformation clientUgi = UserGroupInformation.createProxyUser(                                 
          authId, realUgi);

I changed the above to:

  UserGroupInformation clientUgi = UserGroupInformation.createProxyUser(
               auhtId, UserGroupInformation.getLoginUser());

{noformat}

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924034#action_12924034 ] 

Todd Lipcon commented on HIVE-842:
----------------------------------

Hey Pradeep. Those changes seem reasonable. I'm not personally a fan of the "login user" concept in Hadoop security - it's static state, which prevents servers which may want to use multiple principals from doing so easily (eg if running a hive server with an embedded metastore, you may need a different principal for the two different pieces). But given that there is no "renewer" thread for non-loginuser keytab logins, it may be the only choice for now.

> Authentication Infrastructure for Hive
> --------------------------------------
>
>                 Key: HIVE-842
>                 URL: https://issues.apache.org/jira/browse/HIVE-842
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Edward Capriolo
>            Assignee: Todd Lipcon
>         Attachments: hive-842.txt, HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.