You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Jay Ramadorai <jr...@tripadvisor.com> on 2011/02/14 17:01:01 UTC

Multi-user use and tracking in Hive with Cloudera Hadoop

I want different clients from various machines outside the cluster running queries on hive as different users, and I want to be able to track who is running what, and use the Fair Scheduler or something similar as governor to throttle usage. So bottom line I need fine grained tracking and control at a user and user group level.

Is this possible, for 
(a) remote clients connecting to the Derby metastore listener and running Hive queries from their hive clients as different users?
(b) remote clients connecting with JDBC through Thrift to Hive

I'm running Hive from Apache trunk (0.7.0) on top of Cloudera Hadoop CDH3b3. The Hive Thrift server is running as the user hive, and the hive tables are owned by linux user hive.

Must I use something like Kerberos to make this work or is there an alternative? In fact, will Kerberos even help in achieving the above?