You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Elliot West <te...@gmail.com> on 2016/07/13 13:17:30 UTC

Hive authentication and authorisation in AWS.

Hello,

I am attempting to setup a long running, shared Hive metastore in AWS. The
intention is to have this serve as the core repository of metadata for
shared datasets across multiple AWS accounts. Users will be able to spin up
their own short-lived EMR clusters, Spark jobs, etc. and then locate the
data that they need using this metastore. The data will be stored on S3,
the metadata database will be provided using RDS MySQL or Aurora, and I
have the metastore service running on EC2 instances. I’m trying to
determine what would be the best way to both authenticate and authorize
users of the metastore in this scenario. Given that I’m no expert on user
identity management and security, I’m finding it rather difficult to make
headway.
On the subject of authentication, I’d ideally like to use the user’s global
IAM identity. However, I’m at a loss on where and how I can integrate this
with the metastore service. The metastore apparently supports Kerberos and
LDAP but I’m note sure how these fit into an AWS setting. I’d rather not
run a separate directory server that maintains a set of identities separate
from the IAM identities in accounts, although this seems to be a
possibility.
On the subject of authorisation, I suspect that storage based authorisation
will not work with S3. Hive appears to use the Hadoop FileSystem
abstraction to interrogate FS permissions and the S3 FileSystem
implementations do not appear to provide any visibility on S3 bucket
permissions. Additionally, SQL based authorization also appears to be
inappropriate for this use case as it requires HiveServer2 to enforce the
finer grained permissions (column access control for example). However, I
don’t want to force all users to access data via HiveServer2 as this then
mandates that they must use a client that supports HS2. At this point I
wonder whether I must implement my own metastore authorization hook that
interrogates the S3 bucket policy using AWS apis.
Any suggestions or thoughts would be appreciated.
Thanks,
Elliot.