You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alex Holmes <gr...@gmail.com> on 2011/11/29 03:06:45 UTC

Is there a reason for the Hive remote metastore to execute commands as different users?

Hi,

I'm running Hive 0.7.1 with a remote metastore (Derby) on Hadoop 0.20.2.

Is there a reason that CREATE and DROP commands when translated into
HDFS operations are run as the remote Hive metastore user, but a LOAD
is translated into HDFS operations that are executed as the Hive
client user?  If my understanding is correct, doesn't this mean that:

1.  The Hive remote metastore must always be run as a superuser, which
is arguably a security risk.  If I run the Hive remote metastore as a
non-superuser different from the Hive client user, then a LOAD DATA
LOCAL (with the HDFS umask default of 022) creates a directory chmod'd
755, which doesn't give the Hive metastore user permissions to remove
the directory in a subsequent DROP.

2.  The Hive client must have write permissions on the initial table
directory created by the CREATE command executed as the Hive remove
metastore user.  This would only work in cases where both the remote
Hive metastore user and the client Hive user were the same user, or if
the Hive client were a superuser.  In my own testing the only way I
could get this to work when they were different users (and not
superusers) was in the application of a locally written patch which
addresses HIVE-2504.

Maybe I'm over-simplifying, but couldn't all the Hive remote metastore
HDFS operations be run as the Hive client's user/group?

Thanks,
Alex

Re: Is there a reason for the Hive remote metastore to execute commands as different users?

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hey Alex,

This indeed is a bug. I have done a patch for it on
https://issues.apache.org/jira/browse/HIVE-2616 Would you like to try it
out to see if that works for you?

Ashutosh
On Tue, Nov 29, 2011 at 02:45, Alex Holmes <gr...@gmail.com> wrote:

> Running mysql as the metastore doesn't change the behavior of the HDFS
> operations, and more importantly who (the ugi) they are executed as.
>
> Does anyone have any thoughts as to why Hive HDFS operations are run
> as different users?
>
> Many thoughts,
> Alex
>
>
> On Tue, Nov 29, 2011 at 2:47 AM, Alexander C.H. Lorenz
> <wg...@googlemail.com> wrote:
> > Derby depends on a local filestore, for more flexibility and security I
> > suggest mysql as a metastore.
> > - Alex
> >
> > On Tue, Nov 29, 2011 at 3:06 AM, Alex Holmes <gr...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> I'm running Hive 0.7.1 with a remote metastore (Derby) on Hadoop 0.20.2.
> >>
> >> Is there a reason that CREATE and DROP commands when translated into
> >> HDFS operations are run as the remote Hive metastore user, but a LOAD
> >> is translated into HDFS operations that are executed as the Hive
> >> client user?  If my understanding is correct, doesn't this mean that:
> >>
> >> 1.  The Hive remote metastore must always be run as a superuser, which
> >> is arguably a security risk.  If I run the Hive remote metastore as a
> >> non-superuser different from the Hive client user, then a LOAD DATA
> >> LOCAL (with the HDFS umask default of 022) creates a directory chmod'd
> >> 755, which doesn't give the Hive metastore user permissions to remove
> >> the directory in a subsequent DROP.
> >>
> >> 2.  The Hive client must have write permissions on the initial table
> >> directory created by the CREATE command executed as the Hive remove
> >> metastore user.  This would only work in cases where both the remote
> >> Hive metastore user and the client Hive user were the same user, or if
> >> the Hive client were a superuser.  In my own testing the only way I
> >> could get this to work when they were different users (and not
> >> superusers) was in the application of a locally written patch which
> >> addresses HIVE-2504.
> >>
> >> Maybe I'm over-simplifying, but couldn't all the Hive remote metastore
> >> HDFS operations be run as the Hive client's user/group?
> >>
> >> Thanks,
> >> Alex
> >
> >
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> > P Think of the environment: please don't print this email unless you
> really
> > need to.
> >
> >
>

Re: Is there a reason for the Hive remote metastore to execute commands as different users?

Posted by Alex Holmes <gr...@gmail.com>.
Running mysql as the metastore doesn't change the behavior of the HDFS
operations, and more importantly who (the ugi) they are executed as.

Does anyone have any thoughts as to why Hive HDFS operations are run
as different users?

Many thoughts,
Alex


On Tue, Nov 29, 2011 at 2:47 AM, Alexander C.H. Lorenz
<wg...@googlemail.com> wrote:
> Derby depends on a local filestore, for more flexibility and security I
> suggest mysql as a metastore.
> - Alex
>
> On Tue, Nov 29, 2011 at 3:06 AM, Alex Holmes <gr...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm running Hive 0.7.1 with a remote metastore (Derby) on Hadoop 0.20.2.
>>
>> Is there a reason that CREATE and DROP commands when translated into
>> HDFS operations are run as the remote Hive metastore user, but a LOAD
>> is translated into HDFS operations that are executed as the Hive
>> client user?  If my understanding is correct, doesn't this mean that:
>>
>> 1.  The Hive remote metastore must always be run as a superuser, which
>> is arguably a security risk.  If I run the Hive remote metastore as a
>> non-superuser different from the Hive client user, then a LOAD DATA
>> LOCAL (with the HDFS umask default of 022) creates a directory chmod'd
>> 755, which doesn't give the Hive metastore user permissions to remove
>> the directory in a subsequent DROP.
>>
>> 2.  The Hive client must have write permissions on the initial table
>> directory created by the CREATE command executed as the Hive remove
>> metastore user.  This would only work in cases where both the remote
>> Hive metastore user and the client Hive user were the same user, or if
>> the Hive client were a superuser.  In my own testing the only way I
>> could get this to work when they were different users (and not
>> superusers) was in the application of a locally written patch which
>> addresses HIVE-2504.
>>
>> Maybe I'm over-simplifying, but couldn't all the Hive remote metastore
>> HDFS operations be run as the Hive client's user/group?
>>
>> Thanks,
>> Alex
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> P Think of the environment: please don't print this email unless you really
> need to.
>
>

Re: Is there a reason for the Hive remote metastore to execute commands as different users?

Posted by "Alexander C.H. Lorenz" <wg...@googlemail.com>.
Derby depends on a local filestore, for more flexibility and security I
suggest mysql as a metastore.

- Alex

On Tue, Nov 29, 2011 at 3:06 AM, Alex Holmes <gr...@gmail.com> wrote:

> Hi,
>
> I'm running Hive 0.7.1 with a remote metastore (Derby) on Hadoop 0.20.2.
>
> Is there a reason that CREATE and DROP commands when translated into
> HDFS operations are run as the remote Hive metastore user, but a LOAD
> is translated into HDFS operations that are executed as the Hive
> client user?  If my understanding is correct, doesn't this mean that:
>
> 1.  The Hive remote metastore must always be run as a superuser, which
> is arguably a security risk.  If I run the Hive remote metastore as a
> non-superuser different from the Hive client user, then a LOAD DATA
> LOCAL (with the HDFS umask default of 022) creates a directory chmod'd
> 755, which doesn't give the Hive metastore user permissions to remove
> the directory in a subsequent DROP.
>
> 2.  The Hive client must have write permissions on the initial table
> directory created by the CREATE command executed as the Hive remove
> metastore user.  This would only work in cases where both the remote
> Hive metastore user and the client Hive user were the same user, or if
> the Hive client were a superuser.  In my own testing the only way I
> could get this to work when they were different users (and not
> superusers) was in the application of a locally written patch which
> addresses HIVE-2504.
>
> Maybe I'm over-simplifying, but couldn't all the Hive remote metastore
> HDFS operations be run as the Hive client's user/group?
>
> Thanks,
> Alex
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you
really need to.*