You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/03/09 01:00:40 UTC
Re: Hive Context: Hive Metastore Client
The current scenario resembles a three tier architecture but without the
security of second tier. In a typical three-tier you have users connecting
to the application server (read Hive server2) are independently
authenticated and if OK, the second tier creates new ,NET type or JDBC
threads to connect to database much like multi-threading. The problem I
believe is that Hive server 2 does not have that concept of handling the
individual loggings yet. Hive server 2 should be able to handle LDAP logins
as well. It is a useful layer to have.
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 8 March 2016 at 23:28, Alex <th...@gmail.com> wrote:
> Yes, when creating a Hive Context a Hive Metastore client should be
> created with a user that the Spark application will talk to the *remote*
> Hive Metastore with. We would like to add a custom authorization plugin to
> our remote Hive Metastore to authorize the query requests that the spark
> application is submitting which would also add authorization for any other
> applications hitting the Hive Metastore. Furthermore we would like to
> extend this so that we can submit "jobs" to our Spark application that will
> allow us to run against the metastore as different users while leveraging
> the abilities of our spark cluster. But as you mentioned only one login
> connects to the Hive Metastore is shared among all HiveContext sessions.
>
> Likely the authentication would have to be completed either through a
> secured Hive Metastore (Kerberos) or by having the requests go through
> HiveServer2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>
> Hi,
>
> What do you mean by Hive Metastore Client? Are you referring to Hive
> server login much like beeline?
>
> Spark uses hive-site.xml to get the details of Hive metastore and the
> login to the metastore which could be any database. Mine is Oracle and as
> far as I know even in Hive 2, hive-site.xml has an entry for
> javax.jdo.option.ConnectionUserName that specifies username to use against
> metastore database. These are all multi-threaded JDBC connections to the
> database, the same login as shown below:
>
> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID
> PROGRAM MEM/KB Logical I/O Physical I/O ACT
> -------- ----------- ----------- ---------- -------------- --------------
> --------------- ------------ ---------------- ------------ ---
> INFO
> -------
> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234
> JDBC Thin Clien 1,081 528 0 N
> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234
> JDBC Thin Clien 889 37 0 N
> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234
> JDBC Thin Clien 1,017 323 0 N
>
> As I understand what you are suggesting is that each Spark user uses
> different login to connect to Hive metastore. As of now there is only one
> login that connects to Hive metastore shared among all
>
> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t
> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop
> pat=.*
>
> And this is an entry in Hive log when connection is made theough Zeppelin
> UI
>
> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore
> (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with
> implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using
> direct SQL, underlying DB is ORACLE
> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>
> I am not sure there is currently such plan to have different logins
> allowed to Hive Metastore. But it will add another level of security.
> Though I am not sure how this would be authenticated.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> <http://talebzadehmich.wordpress.com/>http://talebzadehmich.wordpress.com
>
>
>
> On 8 March 2016 at 22:23, Alex F <th...@gmail.com> wrote:
>
>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions
>> sharing various components but right now the Hive Metastore Client is
>> shared amongst each new Hive Context Session.
>>
>> Are there any plans to create individual Metastore Clients for each Hive
>> Context?
>>
>> Related to the question above are there any plans to create an interface
>> for customizing the username that the Metastore Client uses to connect to
>> the Hive Metastore? Right now it either uses the user specified in an
>> environment variable or the application's process owner.
>>
>
>
>
Re: Hive Context: Hive Metastore Client
Posted by Alex <th...@gmail.com>.
I agree it is a useful layer and during my investigations in to
individual user connections from a spark application I was running some
tests with HiveServer2 and using Beeline I was able to authenticate the
users passed in correctly but when it came down to authorizing the
queries on the metastore they were all using the initial user connection
that HiveServer2 had made with the Hive Metastore.
It is my intention that should we get access to the Hive Metastore
Client and its configuration through the Hive Context that we could
create new HiveContext sessions each with their own connections to the
Hive Metastore and have the authorization for the query be completed on
the Metastore itself and we would handle the authentication of the users
acting as the second tier.
It sounds like this functionality is not likely to be implemented any
time soon though so we will have to find a solution in the meantime.
Thanks,
Alex
On 3/8/2016 4:00 PM, Mich Talebzadeh wrote:
> The current scenario resembles a three tier architecture but without
> the security of second tier. In a typical three-tier you have users
> connecting to the application server (read Hive server2)
> are independently authenticated and if OK, the second tier creates new
> ,NET type or JDBC threads to connect to database much like
> multi-threading. The problem I believe is that Hive server 2 does not
> have that concept of handling the individual loggings yet. Hive server
> 2 should be able to handle LDAP logins as well. It is a useful layer
> to have.
>
> Dr Mich Talebzadeh
>
> LinkedIn
> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 8 March 2016 at 23:28, Alex <this.side.of.confusion@gmail.com
> <ma...@gmail.com>> wrote:
>
> Yes, when creating a Hive Context a Hive Metastore client should
> be created with a user that the Spark application will talk to the
> *remote* Hive Metastore with. We would like to add a custom
> authorization plugin to our remote Hive Metastore to authorize the
> query requests that the spark application is submitting which
> would also add authorization for any other applications hitting
> the Hive Metastore. Furthermore we would like to extend this so
> that we can submit "jobs" to our Spark application that will allow
> us to run against the metastore as different users while
> leveraging the abilities of our spark cluster. But as you
> mentioned only one login connects to the Hive Metastore is shared
> among all HiveContext sessions.
>
> Likely the authentication would have to be completed either
> through a secured Hive Metastore (Kerberos) or by having the
> requests go through HiveServer2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>> Hi,
>>
>> What do you mean by Hive Metastore Client? Are you referring to
>> Hive server login much like beeline?
>>
>> Spark uses hive-site.xml to get the details of Hive metastore and
>> the login to the metastore which could be any database. Mine is
>> Oracle and as far as I know even in Hive 2, hive-site.xml has an
>> entry for javax.jdo.option.ConnectionUserName that specifies
>> username to use against metastore database. These are all
>> multi-threaded JDBC connections to the database, the same login
>> as shown below:
>>
>> LOGIN SID/serial# LOGGED IN S HOST OS PID Client
>> PID PROGRAM MEM/KB Logical I/O Physical I/O ACT
>> -------- ----------- ----------- ---------- --------------
>> -------------- --------------- ------------ ----------------
>> ------------ ---
>> INFO
>> -------
>> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539
>> hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541
>> hduser/1234 JDBC Thin Clien 1,081 528 0 N
>> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624
>> hduser/1234 JDBC Thin Clien 889 37 0 N
>> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543
>> hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626
>> hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545
>> hduser/1234 JDBC Thin Clien 1,017 323 0 N
>>
>> As I understand what you are suggesting is that each Spark user
>> uses different login to connect to Hive metastore. As of now
>> there is only one login that connects to Hive metastore shared
>> among all
>>
>> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]:
>> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
>> ugi=hduser ip=50.140.197.217 cmd=source:50.140.197.217
>> get_table : db=test tbl=t
>> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]:
>> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) -
>> ugi=hduser ip=50.140.197.216 cmd=source:50.140.197.216
>> get_tables: db=asehadoop pat=.*
>>
>> And this is an entry in Hive log when connection is made theough
>> Zeppelin UI
>>
>> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]:
>> metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) -
>> 84: Opening raw store with implementation
>> class:org.apache.hadoop.hive.metastore.ObjectStore
>> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]:
>> metastore.ObjectStore (ObjectStore.java:initialize(318)) -
>> ObjectStore, initialize called
>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
>> metastore.MetaStoreDirectSql
>> (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL,
>> underlying DB is ORACLE
>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
>> metastore.ObjectStore (ObjectStore.java:setConf(301)) -
>> Initialized ObjectStore
>>
>> I am not sure there is currently such plan to have different
>> logins allowed to Hive Metastore. But it will add another level
>> of security. Though I am not sure how this would be authenticated.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn
>> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 8 March 2016 at 22:23, Alex F
>> <this.side.of.confusion@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> As of Spark 1.6.0 it is now possible to create new Hive
>> Context sessions sharing various components but right now the
>> Hive Metastore Client is shared amongst each new Hive Context
>> Session.
>>
>> Are there any plans to create individual Metastore Clients
>> for each Hive Context?
>>
>> Related to the question above are there any plans to create
>> an interface for customizing the username that the Metastore
>> Client uses to connect to the Hive Metastore? Right now it
>> either uses the user specified in an environment variable or
>> the application's process owner.
>>
>>
>
>
Re: Hive Context: Hive Metastore Client
Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks Alan for the info. I will have a look.
Some tools like MongoDB (providing its own database) provide a layer of
access by creating an admin database through which admin users are
authenticated and new users can be added to the individual databases.
When it comes to Hadoop and its ultimate storage system HDFS, it is clear
that a common framework for security is needed. Having said that one can
bypass anything by going directory to HDFS file system.
We are also concerned that when data in ingested into Hive through
temporary OS file storage, the data on the file system needs to be
encrypted to stop exposing client data. Most RDBMS offer encrypted tables
and columns and I presume if and when data ends up in Hive, they need to
protected through encryption. I have not heard of any encrypted utility
within Hive yet.
Cheers,
Mich
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 9 March 2016 at 15:58, Alan Gates <al...@gmail.com> wrote:
> One way people have gotten around the lack of LDAP connectivity in HS2 has
> been to use Apache Knox. That project’s goal is to provide a single login
> capability for Hadoop related projects so that users can tie their LDAP or
> Active Directory servers into Hadoop.
>
> Alan.
>
> > On Mar 8, 2016, at 16:00, Mich Talebzadeh <mi...@gmail.com>
> wrote:
> >
> > The current scenario resembles a three tier architecture but without the
> security of second tier. In a typical three-tier you have users connecting
> to the application server (read Hive server2) are independently
> authenticated and if OK, the second tier creates new ,NET type or JDBC
> threads to connect to database much like multi-threading. The problem I
> believe is that Hive server 2 does not have that concept of handling the
> individual loggings yet. Hive server 2 should be able to handle LDAP logins
> as well. It is a useful layer to have.
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > On 8 March 2016 at 23:28, Alex <th...@gmail.com> wrote:
> > Yes, when creating a Hive Context a Hive Metastore client should be
> created with a user that the Spark application will talk to the *remote*
> Hive Metastore with. We would like to add a custom authorization plugin to
> our remote Hive Metastore to authorize the query requests that the spark
> application is submitting which would also add authorization for any other
> applications hitting the Hive Metastore. Furthermore we would like to
> extend this so that we can submit "jobs" to our Spark application that will
> allow us to run against the metastore as different users while leveraging
> the abilities of our spark cluster. But as you mentioned only one login
> connects to the Hive Metastore is shared among all HiveContext sessions.
> >
> > Likely the authentication would have to be completed either through a
> secured Hive Metastore (Kerberos) or by having the requests go through
> HiveServer2.
> >
> > --Alex
> >
> >
> > On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
> >> Hi,
> >>
> >> What do you mean by Hive Metastore Client? Are you referring to Hive
> server login much like beeline?
> >>
> >> Spark uses hive-site.xml to get the details of Hive metastore and the
> login to the metastore which could be any database. Mine is Oracle and as
> far as I know even in Hive 2, hive-site.xml has an entry for
> javax.jdo.option.ConnectionUserName that specifies username to use against
> metastore database. These are all multi-threaded JDBC connections to the
> database, the same login as shown below:
> >>
> >> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID
> PROGRAM MEM/KB Logical I/O Physical I/O ACT
> >> -------- ----------- ----------- ---------- --------------
> -------------- --------------- ------------ ---------------- ------------
> ---
> >> INFO
> >> -------
> >> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> >> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234
> JDBC Thin Clien 1,081 528 0 N
> >> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234
> JDBC Thin Clien 889 37 0 N
> >> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> >> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234
> JDBC Thin Clien 1,017 37 0 N
> >> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234
> JDBC Thin Clien 1,017 323 0 N
> >>
> >> As I understand what you are suggesting is that each Spark user uses
> different login to connect to Hive metastore. As of now there is only one
> login that connects to Hive metastore shared among all
> >>
> >> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t
> >> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop
> pat=.*
> >>
> >> And this is an entry in Hive log when connection is made theough
> Zeppelin UI
> >>
> >> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]:
> metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening
> raw store with implementation
> class:org.apache.hadoop.hive.metastore.ObjectStore
> >> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
> >> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]:
> metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using
> direct SQL, underlying DB is ORACLE
> >> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore
> (ObjectStore.java:setConf(301)) - Initialized ObjectStore
> >>
> >> I am not sure there is currently such plan to have different logins
> allowed to Hive Metastore. But it will add another level of security.
> Though I am not sure how this would be authenticated.
> >>
> >> HTH
> >>
> >>
> >>
> >> Dr Mich Talebzadeh
> >>
> >> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> On 8 March 2016 at 22:23, Alex F <th...@gmail.com>
> wrote:
> >> As of Spark 1.6.0 it is now possible to create new Hive Context
> sessions sharing various components but right now the Hive Metastore Client
> is shared amongst each new Hive Context Session.
> >>
> >> Are there any plans to create individual Metastore Clients for each
> Hive Context?
> >>
> >> Related to the question above are there any plans to create an
> interface for customizing the username that the Metastore Client uses to
> connect to the Hive Metastore? Right now it either uses the user specified
> in an environment variable or the application's process owner.
> >>
> >
> >
>
>
Re: Hive Context: Hive Metastore Client
Posted by Jörn Franke <jo...@gmail.com>.
Apache Knox for authentication makes sense. For Hive authorization there are tools such as Apache ranger or Sentry, which themselves can connect via LDAP.
> On 09 Mar 2016, at 16:58, Alan Gates <al...@gmail.com> wrote:
>
> One way people have gotten around the lack of LDAP connectivity in HS2 has been to use Apache Knox. That project’s goal is to provide a single login capability for Hadoop related projects so that users can tie their LDAP or Active Directory servers into Hadoop.
>
> Alan.
>
>> On Mar 8, 2016, at 16:00, Mich Talebzadeh <mi...@gmail.com> wrote:
>>
>> The current scenario resembles a three tier architecture but without the security of second tier. In a typical three-tier you have users connecting to the application server (read Hive server2) are independently authenticated and if OK, the second tier creates new ,NET type or JDBC threads to connect to database much like multi-threading. The problem I believe is that Hive server 2 does not have that concept of handling the individual loggings yet. Hive server 2 should be able to handle LDAP logins as well. It is a useful layer to have.
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 8 March 2016 at 23:28, Alex <th...@gmail.com> wrote:
>> Yes, when creating a Hive Context a Hive Metastore client should be created with a user that the Spark application will talk to the *remote* Hive Metastore with. We would like to add a custom authorization plugin to our remote Hive Metastore to authorize the query requests that the spark application is submitting which would also add authorization for any other applications hitting the Hive Metastore. Furthermore we would like to extend this so that we can submit "jobs" to our Spark application that will allow us to run against the metastore as different users while leveraging the abilities of our spark cluster. But as you mentioned only one login connects to the Hive Metastore is shared among all HiveContext sessions.
>>
>> Likely the authentication would have to be completed either through a secured Hive Metastore (Kerberos) or by having the requests go through HiveServer2.
>>
>> --Alex
>>
>>
>>> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>>> Hi,
>>>
>>> What do you mean by Hive Metastore Client? Are you referring to Hive server login much like beeline?
>>>
>>> Spark uses hive-site.xml to get the details of Hive metastore and the login to the metastore which could be any database. Mine is Oracle and as far as I know even in Hive 2, hive-site.xml has an entry for javax.jdo.option.ConnectionUserName that specifies username to use against metastore database. These are all multi-threaded JDBC connections to the database, the same login as shown below:
>>>
>>> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID PROGRAM MEM/KB Logical I/O Physical I/O ACT
>>> -------- ----------- ----------- ---------- -------------- -------------- --------------- ------------ ---------------- ------------ ---
>>> INFO
>>> -------
>>> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>>> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 JDBC Thin Clien 1,081 528 0 N
>>> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 JDBC Thin Clien 889 37 0 N
>>> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>>> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>>> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 JDBC Thin Clien 1,017 323 0 N
>>>
>>> As I understand what you are suggesting is that each Spark user uses different login to connect to Hive metastore. As of now there is only one login that connects to Hive metastore shared among all
>>>
>>> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t
>>> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop pat=.*
>>>
>>> And this is an entry in Hive log when connection is made theough Zeppelin UI
>>>
>>> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
>>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is ORACLE
>>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>>>
>>> I am not sure there is currently such plan to have different logins allowed to Hive Metastore. But it will add another level of security. Though I am not sure how this would be authenticated.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>> LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> On 8 March 2016 at 22:23, Alex F <th...@gmail.com> wrote:
>>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions sharing various components but right now the Hive Metastore Client is shared amongst each new Hive Context Session.
>>>
>>> Are there any plans to create individual Metastore Clients for each Hive Context?
>>>
>>> Related to the question above are there any plans to create an interface for customizing the username that the Metastore Client uses to connect to the Hive Metastore? Right now it either uses the user specified in an environment variable or the application's process owner.
>
Re: Hive Context: Hive Metastore Client
Posted by Alan Gates <al...@gmail.com>.
One way people have gotten around the lack of LDAP connectivity in HS2 has been to use Apache Knox. That project’s goal is to provide a single login capability for Hadoop related projects so that users can tie their LDAP or Active Directory servers into Hadoop.
Alan.
> On Mar 8, 2016, at 16:00, Mich Talebzadeh <mi...@gmail.com> wrote:
>
> The current scenario resembles a three tier architecture but without the security of second tier. In a typical three-tier you have users connecting to the application server (read Hive server2) are independently authenticated and if OK, the second tier creates new ,NET type or JDBC threads to connect to database much like multi-threading. The problem I believe is that Hive server 2 does not have that concept of handling the individual loggings yet. Hive server 2 should be able to handle LDAP logins as well. It is a useful layer to have.
>
> Dr Mich Talebzadeh
>
> LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> http://talebzadehmich.wordpress.com
>
>
> On 8 March 2016 at 23:28, Alex <th...@gmail.com> wrote:
> Yes, when creating a Hive Context a Hive Metastore client should be created with a user that the Spark application will talk to the *remote* Hive Metastore with. We would like to add a custom authorization plugin to our remote Hive Metastore to authorize the query requests that the spark application is submitting which would also add authorization for any other applications hitting the Hive Metastore. Furthermore we would like to extend this so that we can submit "jobs" to our Spark application that will allow us to run against the metastore as different users while leveraging the abilities of our spark cluster. But as you mentioned only one login connects to the Hive Metastore is shared among all HiveContext sessions.
>
> Likely the authentication would have to be completed either through a secured Hive Metastore (Kerberos) or by having the requests go through HiveServer2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>> Hi,
>>
>> What do you mean by Hive Metastore Client? Are you referring to Hive server login much like beeline?
>>
>> Spark uses hive-site.xml to get the details of Hive metastore and the login to the metastore which could be any database. Mine is Oracle and as far as I know even in Hive 2, hive-site.xml has an entry for javax.jdo.option.ConnectionUserName that specifies username to use against metastore database. These are all multi-threaded JDBC connections to the database, the same login as shown below:
>>
>> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID PROGRAM MEM/KB Logical I/O Physical I/O ACT
>> -------- ----------- ----------- ---------- -------------- -------------- --------------- ------------ ---------------- ------------ ---
>> INFO
>> -------
>> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 JDBC Thin Clien 1,081 528 0 N
>> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 JDBC Thin Clien 889 37 0 N
>> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 JDBC Thin Clien 1,017 37 0 N
>> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 JDBC Thin Clien 1,017 323 0 N
>>
>> As I understand what you are suggesting is that each Spark user uses different login to connect to Hive metastore. As of now there is only one login that connects to Hive metastore shared among all
>>
>> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t
>> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop pat=.*
>>
>> And this is an entry in Hive log when connection is made theough Zeppelin UI
>>
>> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is ORACLE
>> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>>
>> I am not sure there is currently such plan to have different logins allowed to Hive Metastore. But it will add another level of security. Though I am not sure how this would be authenticated.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 8 March 2016 at 22:23, Alex F <th...@gmail.com> wrote:
>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions sharing various components but right now the Hive Metastore Client is shared amongst each new Hive Context Session.
>>
>> Are there any plans to create individual Metastore Clients for each Hive Context?
>>
>> Related to the question above are there any plans to create an interface for customizing the username that the Metastore Client uses to connect to the Hive Metastore? Right now it either uses the user specified in an environment variable or the application's process owner.
>>
>
>