You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Pradeep Kamath <pr...@yahoo-inc.com> on 2010/07/13 23:04:51 UTC

Thrift metastore server and dfs file owner

Hi,

   I suspect this is true but wanted to confirm: If I start a thrift
metastore service as user "joe" then all internal tables created will
have directories under the warehouse directory owned by "joe" regardless
of the actual user running the create table statement - is this correct?
There is no way for the thrift server to create the directory as the
actual user? However if thrift service is not used and the hive client
directly works against the metastore database, then the directories are
created by the actual user - is this correct?

 

Thanks,

Pradeep


RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
I have created https://issues.apache.org/jira/browse/HIVE-1476 to track
this.

 

________________________________

From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Wednesday, July 21, 2010 11:33 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, I think that the conf var is a better solution, because it would
give consistent behavior once the switch is made. Plus, it would avoid
cluttering up the metastore API (at the expense of another conf var...).
If the CLI were configured to use a remote metastore, it would need to
have additional checks to see if the directory were created by the
metastore call.

 

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 21, 2010 9:07 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

I favor the option of a conf variable - "strict.owner.mode" to indicate
that dirs will not be created by server and will be done by the client.
In installations where there are thrift clients, this can be set to
false till the point the clients are ready to create the dirs themselves
- is this an acceptable solution - I can then open a jira with this
proposed solution.

 

Thoughts?

 

Pradeep

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Tuesday, July 20, 2010 10:10 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

In addition to the options below, if there is some way to have custom
code into thrift clients then that could be a third option - from what
little I know of thrift, I think the client code is generated and there
is no way to add additional logic into the methods - but in case there
is a way to do that, then that might be the best option.

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Monday, July 19, 2010 1:09 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

I agree this will be an issue for direct thrift clients. How about the
following options:

 

1) Add a conf variable - "strict.owner.mode" - if this is set to true on
the server, dirs will not be created and they will be created on the
client (both client and server should have the same value (true or
false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean
arg whether or not to create dirs. The HiveMetaStoreClient code will use
this new api with a "false" argument value and create the dir on the
client side. The issue with this is that existing Thrift client would be
calling the current API method which would create dirs as the thrift
server users. So depending on whether you are creating the table using
thrift (with old method) or CLI you get different results. The old
method could be deprecated and the thrift clients can migrate to the new
one.

 

Thoughts?

 

(This directory creation/deletion is relevant to create table/drop
table/add partition/alter table/alter partition I think)

 

Pradeep 

 

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

That approach would work for the CLI, but then the semantics for the
create table/create partition calls for thrift clients would be
different - it would no longer create the table directory. This might be
a problem if there are scripts that rely on this property for
copying/moving files. Also, table renaming code would need to be
modified as well.

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

I was thinking about this a little more and was wondering if the
following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have
HiveMetastoreClient create it in createTable() after the table is
created - i.e. it can do a getTable().getSd().getLocation() and perform
wh.mkdirs() on that path. We could do the same thing with
addPartition().

 

This way, we can have the metastore thrift server running as a
non-hdfs-superuser. Also, we no longer need to keep track or user/group
information since the client already is running with the right
user/group credentials.

 

Thoughts?

 

Pradeep

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Currently group information is not present in the Table and both owner
and group information are absent from Database. If these are added to
these classes, we could change Warehouse.mkdirs(). This method is also
called form addPartition(), should we just use the table's owner/group
in this case? - could potentially fail in non thrift case if some other
user is creating the partitions OR we would need to add owner/group to
Partition as well with the implication that table and partition owner's
could differ causing query failures.

 

Paul's concern about security is valid but is there any other way around
this?

 

Pradeep

 

-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com] 

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, you could overload Warehouse.mkdirs() to allow specification of an
owner/group and then use Filesystem.setOwner() within the method.

 

If the thrift server has full permissions for DFS though, wouldn't this
present a security hole? 

 

-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com] 

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

We could just fix this in Warehouse.java so that the mkdirs call make
the directories according to the owner field that is passed to the
table? That probably would be a simple fix for this, no?

 

Ashish

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>

 

You mentioned: "I think the thrift server can use the dfs processor." -
were you suggesting the metastore implementation in HiveMetastore should
always do chown user:user on create_table_core() (or selectively look at
the conf and known it is being run as a thrift server and chown only in
that case)?

 

Pradeep

 

-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner

 

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com>
wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

> 

> My warehouse table dir still got created by "root" (the user my thrift


> server is running as) drwxr-xr-x   - root supergroup          0 

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

> 

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

> 

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath
<pr...@yahoo-inc.com> wrote:

>> Hi,

>> 

>>    I suspect this is true but wanted to confirm: If I start a thrift 

>> metastore service as user "joe" then all internal tables created will


>> have directories under the warehouse directory owned by "joe" 

>> regardless of the actual user running the create table statement - is


>> this correct? There is no way for the thrift server to create the
directory as the actual user?

>> However if thrift service is not used and the hive client directly 

>> works against the metastore database, then the directories are 

>> created by the actual user - is this correct?

>> 

>> 

>> 

>> Thanks,

>> 

>> Pradeep

> 

> The hive web interface does this:

> 

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

> 

> You should be able to accomplish the same thing using set commands 

> with the Thrift Server to impersonate.

> 

> Regards,

> Edward

> 

 

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

 

hive> dfs -chown user:user /user/hive/warehouse/foo2;

 

Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?

 

If you are enforcing permissions only the hadoop superuser (hadoop) will
be able to chown files to other users and groups.


RE: Thrift metastore server and dfs file owner

Posted by Paul Yang <py...@facebook.com>.
Yeah, I think that the conf var is a better solution, because it would give consistent behavior once the switch is made. Plus, it would avoid cluttering up the metastore API (at the expense of another conf var...). If the CLI were configured to use a remote metastore, it would need to have additional checks to see if the directory were created by the metastore call.

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
Sent: Wednesday, July 21, 2010 9:07 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

I favor the option of a conf variable - "strict.owner.mode" to indicate that dirs will not be created by server and will be done by the client. In installations where there are thrift clients, this can be set to false till the point the clients are ready to create the dirs themselves - is this an acceptable solution - I can then open a jira with this proposed solution.

Thoughts?

Pradeep

________________________________
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
Sent: Tuesday, July 20, 2010 10:10 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

In addition to the options below, if there is some way to have custom code into thrift clients then that could be a third option - from what little I know of thrift, I think the client code is generated and there is no way to add additional logic into the methods - but in case there is a way to do that, then that might be the best option.

________________________________
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
Sent: Monday, July 19, 2010 1:09 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner


I agree this will be an issue for direct thrift clients. How about the following options:



1) Add a conf variable - "strict.owner.mode" - if this is set to true on the server, dirs will not be created and they will be created on the client (both client and server should have the same value (true or false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean arg whether or not to create dirs. The HiveMetaStoreClient code will use this new api with a "false" argument value and create the dir on the client side. The issue with this is that existing Thrift client would be calling the current API method which would create dirs as the thrift server users. So depending on whether you are creating the table using thrift (with old method) or CLI you get different results. The old method could be deprecated and the thrift clients can migrate to the new one.



Thoughts?



(This directory creation/deletion is relevant to create table/drop table/add partition/alter table/alter partition I think)



Pradeep



-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com]
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner



That approach would work for the CLI, but then the semantics for the create table/create partition calls for thrift clients would be different - it would no longer create the table directory. This might be a problem if there are scripts that rely on this property for copying/moving files. Also, table renaming code would need to be modified as well.



-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner



I was thinking about this a little more and was wondering if the following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have HiveMetastoreClient create it in createTable() after the table is created - i.e. it can do a getTable().getSd().getLocation() and perform wh.mkdirs() on that path. We could do the same thing with addPartition().



This way, we can have the metastore thrift server running as a non-hdfs-superuser. Also, we no longer need to keep track or user/group information since the client already is running with the right user/group credentials.



Thoughts?



Pradeep



-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner



Currently group information is not present in the Table and both owner and group information are absent from Database. If these are added to these classes, we could change Warehouse.mkdirs(). This method is also called form addPartition(), should we just use the table's owner/group in this case? - could potentially fail in non thrift case if some other user is creating the partitions OR we would need to add owner/group to Partition as well with the implication that table and partition owner's could differ causing query failures.



Paul's concern about security is valid but is there any other way around this?



Pradeep



-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com]

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner



Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method.



If the thrift server has full permissions for DFS though, wouldn't this present a security hole?



-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com]

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner



We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?



Ashish



-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner



<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>



You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?



Pradeep



-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner



On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

>

> My warehouse table dir still got created by "root" (the user my thrift

> server is running as) drwxr-xr-x   - root supergroup          0

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

>

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

>

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:

>> Hi,

>>

>>    I suspect this is true but wanted to confirm: If I start a thrift

>> metastore service as user "joe" then all internal tables created will

>> have directories under the warehouse directory owned by "joe"

>> regardless of the actual user running the create table statement - is

>> this correct? There is no way for the thrift server to create the directory as the actual user?

>> However if thrift service is not used and the hive client directly

>> works against the metastore database, then the directories are

>> created by the actual user - is this correct?

>>

>>

>>

>> Thanks,

>>

>> Pradeep

>

> The hive web interface does this:

>

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

>

> You should be able to accomplish the same thing using set commands

> with the Thrift Server to impersonate.

>

> Regards,

> Edward

>



You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.



hive> dfs -chown user:user /user/hive/warehouse/foo2;



Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?



If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
I favor the option of a conf variable - "strict.owner.mode" to indicate
that dirs will not be created by server and will be done by the client.
In installations where there are thrift clients, this can be set to
false till the point the clients are ready to create the dirs themselves
- is this an acceptable solution - I can then open a jira with this
proposed solution.

 

Thoughts?

 

Pradeep

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Tuesday, July 20, 2010 10:10 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

In addition to the options below, if there is some way to have custom
code into thrift clients then that could be a third option - from what
little I know of thrift, I think the client code is generated and there
is no way to add additional logic into the methods - but in case there
is a way to do that, then that might be the best option.

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Monday, July 19, 2010 1:09 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

I agree this will be an issue for direct thrift clients. How about the
following options:

 

1) Add a conf variable - "strict.owner.mode" - if this is set to true on
the server, dirs will not be created and they will be created on the
client (both client and server should have the same value (true or
false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean
arg whether or not to create dirs. The HiveMetaStoreClient code will use
this new api with a "false" argument value and create the dir on the
client side. The issue with this is that existing Thrift client would be
calling the current API method which would create dirs as the thrift
server users. So depending on whether you are creating the table using
thrift (with old method) or CLI you get different results. The old
method could be deprecated and the thrift clients can migrate to the new
one.

 

Thoughts?

 

(This directory creation/deletion is relevant to create table/drop
table/add partition/alter table/alter partition I think)

 

Pradeep 

 

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

That approach would work for the CLI, but then the semantics for the
create table/create partition calls for thrift clients would be
different - it would no longer create the table directory. This might be
a problem if there are scripts that rely on this property for
copying/moving files. Also, table renaming code would need to be
modified as well.

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

I was thinking about this a little more and was wondering if the
following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have
HiveMetastoreClient create it in createTable() after the table is
created - i.e. it can do a getTable().getSd().getLocation() and perform
wh.mkdirs() on that path. We could do the same thing with
addPartition().

 

This way, we can have the metastore thrift server running as a
non-hdfs-superuser. Also, we no longer need to keep track or user/group
information since the client already is running with the right
user/group credentials.

 

Thoughts?

 

Pradeep

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Currently group information is not present in the Table and both owner
and group information are absent from Database. If these are added to
these classes, we could change Warehouse.mkdirs(). This method is also
called form addPartition(), should we just use the table's owner/group
in this case? - could potentially fail in non thrift case if some other
user is creating the partitions OR we would need to add owner/group to
Partition as well with the implication that table and partition owner's
could differ causing query failures.

 

Paul's concern about security is valid but is there any other way around
this?

 

Pradeep

 

-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com] 

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, you could overload Warehouse.mkdirs() to allow specification of an
owner/group and then use Filesystem.setOwner() within the method.

 

If the thrift server has full permissions for DFS though, wouldn't this
present a security hole? 

 

-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com] 

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

We could just fix this in Warehouse.java so that the mkdirs call make
the directories according to the owner field that is passed to the
table? That probably would be a simple fix for this, no?

 

Ashish

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>

 

You mentioned: "I think the thrift server can use the dfs processor." -
were you suggesting the metastore implementation in HiveMetastore should
always do chown user:user on create_table_core() (or selectively look at
the conf and known it is being run as a thrift server and chown only in
that case)?

 

Pradeep

 

-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner

 

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com>
wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

> 

> My warehouse table dir still got created by "root" (the user my thrift


> server is running as) drwxr-xr-x   - root supergroup          0 

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

> 

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

> 

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath
<pr...@yahoo-inc.com> wrote:

>> Hi,

>> 

>>    I suspect this is true but wanted to confirm: If I start a thrift 

>> metastore service as user "joe" then all internal tables created will


>> have directories under the warehouse directory owned by "joe" 

>> regardless of the actual user running the create table statement - is


>> this correct? There is no way for the thrift server to create the
directory as the actual user?

>> However if thrift service is not used and the hive client directly 

>> works against the metastore database, then the directories are 

>> created by the actual user - is this correct?

>> 

>> 

>> 

>> Thanks,

>> 

>> Pradeep

> 

> The hive web interface does this:

> 

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

> 

> You should be able to accomplish the same thing using set commands 

> with the Thrift Server to impersonate.

> 

> Regards,

> Edward

> 

 

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

 

hive> dfs -chown user:user /user/hive/warehouse/foo2;

 

Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?

 

If you are enforcing permissions only the hadoop superuser (hadoop) will
be able to chown files to other users and groups.


RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
In addition to the options below, if there is some way to have custom
code into thrift clients then that could be a third option - from what
little I know of thrift, I think the client code is generated and there
is no way to add additional logic into the methods - but in case there
is a way to do that, then that might be the best option.

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Monday, July 19, 2010 1:09 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

I agree this will be an issue for direct thrift clients. How about the
following options:

 

1) Add a conf variable - "strict.owner.mode" - if this is set to true on
the server, dirs will not be created and they will be created on the
client (both client and server should have the same value (true or
false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean
arg whether or not to create dirs. The HiveMetaStoreClient code will use
this new api with a "false" argument value and create the dir on the
client side. The issue with this is that existing Thrift client would be
calling the current API method which would create dirs as the thrift
server users. So depending on whether you are creating the table using
thrift (with old method) or CLI you get different results. The old
method could be deprecated and the thrift clients can migrate to the new
one.

 

Thoughts?

 

(This directory creation/deletion is relevant to create table/drop
table/add partition/alter table/alter partition I think)

 

Pradeep 

 

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

That approach would work for the CLI, but then the semantics for the
create table/create partition calls for thrift clients would be
different - it would no longer create the table directory. This might be
a problem if there are scripts that rely on this property for
copying/moving files. Also, table renaming code would need to be
modified as well.

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

I was thinking about this a little more and was wondering if the
following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have
HiveMetastoreClient create it in createTable() after the table is
created - i.e. it can do a getTable().getSd().getLocation() and perform
wh.mkdirs() on that path. We could do the same thing with
addPartition().

 

This way, we can have the metastore thrift server running as a
non-hdfs-superuser. Also, we no longer need to keep track or user/group
information since the client already is running with the right
user/group credentials.

 

Thoughts?

 

Pradeep

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Currently group information is not present in the Table and both owner
and group information are absent from Database. If these are added to
these classes, we could change Warehouse.mkdirs(). This method is also
called form addPartition(), should we just use the table's owner/group
in this case? - could potentially fail in non thrift case if some other
user is creating the partitions OR we would need to add owner/group to
Partition as well with the implication that table and partition owner's
could differ causing query failures.

 

Paul's concern about security is valid but is there any other way around
this?

 

Pradeep

 

-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com] 

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, you could overload Warehouse.mkdirs() to allow specification of an
owner/group and then use Filesystem.setOwner() within the method.

 

If the thrift server has full permissions for DFS though, wouldn't this
present a security hole? 

 

-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com] 

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

We could just fix this in Warehouse.java so that the mkdirs call make
the directories according to the owner field that is passed to the
table? That probably would be a simple fix for this, no?

 

Ashish

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>

 

You mentioned: "I think the thrift server can use the dfs processor." -
were you suggesting the metastore implementation in HiveMetastore should
always do chown user:user on create_table_core() (or selectively look at
the conf and known it is being run as a thrift server and chown only in
that case)?

 

Pradeep

 

-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner

 

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com>
wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

> 

> My warehouse table dir still got created by "root" (the user my thrift


> server is running as) drwxr-xr-x   - root supergroup          0 

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

> 

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

> 

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath
<pr...@yahoo-inc.com> wrote:

>> Hi,

>> 

>>    I suspect this is true but wanted to confirm: If I start a thrift 

>> metastore service as user "joe" then all internal tables created will


>> have directories under the warehouse directory owned by "joe" 

>> regardless of the actual user running the create table statement - is


>> this correct? There is no way for the thrift server to create the
directory as the actual user?

>> However if thrift service is not used and the hive client directly 

>> works against the metastore database, then the directories are 

>> created by the actual user - is this correct?

>> 

>> 

>> 

>> Thanks,

>> 

>> Pradeep

> 

> The hive web interface does this:

> 

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

> 

> You should be able to accomplish the same thing using set commands 

> with the Thrift Server to impersonate.

> 

> Regards,

> Edward

> 

 

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

 

hive> dfs -chown user:user /user/hive/warehouse/foo2;

 

Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?

 

If you are enforcing permissions only the hadoop superuser (hadoop) will
be able to chown files to other users and groups.


RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
I agree this will be an issue for direct thrift clients. How about the
following options:

 

1) Add a conf variable - "strict.owner.mode" - if this is set to true on
the server, dirs will not be created and they will be created on the
client (both client and server should have the same value (true or
false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean
arg whether or not to create dirs. The HiveMetaStoreClient code will use
this new api with a "false" argument value and create the dir on the
client side. The issue with this is that existing Thrift client would be
calling the current API method which would create dirs as the thrift
server users. So depending on whether you are creating the table using
thrift (with old method) or CLI you get different results. The old
method could be deprecated and the thrift clients can migrate to the new
one.

 

Thoughts?

 

(This directory creation/deletion is relevant to create table/drop
table/add partition/alter table/alter partition I think)

 

Pradeep 

 

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

That approach would work for the CLI, but then the semantics for the
create table/create partition calls for thrift clients would be
different - it would no longer create the table directory. This might be
a problem if there are scripts that rely on this property for
copying/moving files. Also, table renaming code would need to be
modified as well.

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

I was thinking about this a little more and was wondering if the
following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have
HiveMetastoreClient create it in createTable() after the table is
created - i.e. it can do a getTable().getSd().getLocation() and perform
wh.mkdirs() on that path. We could do the same thing with
addPartition().

 

This way, we can have the metastore thrift server running as a
non-hdfs-superuser. Also, we no longer need to keep track or user/group
information since the client already is running with the right
user/group credentials.

 

Thoughts?

 

Pradeep

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Currently group information is not present in the Table and both owner
and group information are absent from Database. If these are added to
these classes, we could change Warehouse.mkdirs(). This method is also
called form addPartition(), should we just use the table's owner/group
in this case? - could potentially fail in non thrift case if some other
user is creating the partitions OR we would need to add owner/group to
Partition as well with the implication that table and partition owner's
could differ causing query failures.

 

Paul's concern about security is valid but is there any other way around
this?

 

Pradeep

 

-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com] 

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, you could overload Warehouse.mkdirs() to allow specification of an
owner/group and then use Filesystem.setOwner() within the method.

 

If the thrift server has full permissions for DFS though, wouldn't this
present a security hole? 

 

-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com] 

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

We could just fix this in Warehouse.java so that the mkdirs call make
the directories according to the owner field that is passed to the
table? That probably would be a simple fix for this, no?

 

Ashish

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>

 

You mentioned: "I think the thrift server can use the dfs processor." -
were you suggesting the metastore implementation in HiveMetastore should
always do chown user:user on create_table_core() (or selectively look at
the conf and known it is being run as a thrift server and chown only in
that case)?

 

Pradeep

 

-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner

 

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com>
wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

> 

> My warehouse table dir still got created by "root" (the user my thrift


> server is running as) drwxr-xr-x   - root supergroup          0 

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

> 

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

> 

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath
<pr...@yahoo-inc.com> wrote:

>> Hi,

>> 

>>    I suspect this is true but wanted to confirm: If I start a thrift 

>> metastore service as user "joe" then all internal tables created will


>> have directories under the warehouse directory owned by "joe" 

>> regardless of the actual user running the create table statement - is


>> this correct? There is no way for the thrift server to create the
directory as the actual user?

>> However if thrift service is not used and the hive client directly 

>> works against the metastore database, then the directories are 

>> created by the actual user - is this correct?

>> 

>> 

>> 

>> Thanks,

>> 

>> Pradeep

> 

> The hive web interface does this:

> 

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

> 

> You should be able to accomplish the same thing using set commands 

> with the Thrift Server to impersonate.

> 

> Regards,

> Edward

> 

 

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

 

hive> dfs -chown user:user /user/hive/warehouse/foo2;

 

Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?

 

If you are enforcing permissions only the hadoop superuser (hadoop) will
be able to chown files to other users and groups.


RE: Thrift metastore server and dfs file owner

Posted by Paul Yang <py...@facebook.com>.
That approach would work for the CLI, but then the semantics for the create table/create partition calls for thrift clients would be different - it would no longer create the table directory. This might be a problem if there are scripts that rely on this property for copying/moving files. Also, table renaming code would need to be modified as well.

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Monday, July 19, 2010 10:24 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

I was thinking about this a little more and was wondering if the following alternative approach is feasible:
Instead of the Metastore code creating the directories why not have HiveMetastoreClient create it in createTable() after the table is created - i.e. it can do a getTable().getSd().getLocation() and perform wh.mkdirs() on that path. We could do the same thing with addPartition().

This way, we can have the metastore thrift server running as a non-hdfs-superuser. Also, we no longer need to keep track or user/group information since the client already is running with the right user/group credentials.

Thoughts?

Pradeep

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Thursday, July 15, 2010 10:23 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

Currently group information is not present in the Table and both owner and group information are absent from Database. If these are added to these classes, we could change Warehouse.mkdirs(). This method is also called form addPartition(), should we just use the table's owner/group in this case? - could potentially fail in non thrift case if some other user is creating the partitions OR we would need to add owner/group to Partition as well with the implication that table and partition owner's could differ causing query failures.

Paul's concern about security is valid but is there any other way around this?

Pradeep

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Wednesday, July 14, 2010 3:18 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method.

If the thrift server has full permissions for DFS though, wouldn't this present a security hole? 

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com] 
Sent: Wednesday, July 14, 2010 12:34 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 14, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift 
> server is running as) drwxr-xr-x   - root supergroup          0 
> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift 
>> metastore service as user "joe" then all internal tables created will 
>> have directories under the warehouse directory owned by "joe" 
>> regardless of the actual user running the create table statement - is 
>> this correct? There is no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly 
>> works against the metastore database, then the directories are 
>> created by the actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands 
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
I was thinking about this a little more and was wondering if the following alternative approach is feasible:
Instead of the Metastore code creating the directories why not have HiveMetastoreClient create it in createTable() after the table is created - i.e. it can do a getTable().getSd().getLocation() and perform wh.mkdirs() on that path. We could do the same thing with addPartition().

This way, we can have the metastore thrift server running as a non-hdfs-superuser. Also, we no longer need to keep track or user/group information since the client already is running with the right user/group credentials.

Thoughts?

Pradeep

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Thursday, July 15, 2010 10:23 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

Currently group information is not present in the Table and both owner and group information are absent from Database. If these are added to these classes, we could change Warehouse.mkdirs(). This method is also called form addPartition(), should we just use the table's owner/group in this case? - could potentially fail in non thrift case if some other user is creating the partitions OR we would need to add owner/group to Partition as well with the implication that table and partition owner's could differ causing query failures.

Paul's concern about security is valid but is there any other way around this?

Pradeep

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Wednesday, July 14, 2010 3:18 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method.

If the thrift server has full permissions for DFS though, wouldn't this present a security hole? 

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com] 
Sent: Wednesday, July 14, 2010 12:34 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 14, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift 
> server is running as) drwxr-xr-x   - root supergroup          0 
> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift 
>> metastore service as user "joe" then all internal tables created will 
>> have directories under the warehouse directory owned by "joe" 
>> regardless of the actual user running the create table statement - is 
>> this correct? There is no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly 
>> works against the metastore database, then the directories are 
>> created by the actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands 
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Currently group information is not present in the Table and both owner and group information are absent from Database. If these are added to these classes, we could change Warehouse.mkdirs(). This method is also called form addPartition(), should we just use the table's owner/group in this case? - could potentially fail in non thrift case if some other user is creating the partitions OR we would need to add owner/group to Partition as well with the implication that table and partition owner's could differ causing query failures.

Paul's concern about security is valid but is there any other way around this?

Pradeep

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Wednesday, July 14, 2010 3:18 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method.

If the thrift server has full permissions for DFS though, wouldn't this present a security hole? 

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com] 
Sent: Wednesday, July 14, 2010 12:34 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 14, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift 
> server is running as) drwxr-xr-x   - root supergroup          0 
> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift 
>> metastore service as user "joe" then all internal tables created will 
>> have directories under the warehouse directory owned by "joe" 
>> regardless of the actual user running the create table statement - is 
>> this correct? There is no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly 
>> works against the metastore database, then the directories are 
>> created by the actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands 
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Paul Yang <py...@facebook.com>.
Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method.

If the thrift server has full permissions for DFS though, wouldn't this present a security hole? 

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com] 
Sent: Wednesday, July 14, 2010 12:34 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 14, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift 
> server is running as) drwxr-xr-x   - root supergroup          0 
> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift 
>> metastore service as user "joe" then all internal tables created will 
>> have directories under the warehouse directory owned by "joe" 
>> regardless of the actual user running the create table statement - is 
>> this correct? There is no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly 
>> works against the metastore database, then the directories are 
>> created by the actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands 
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Ashish Thusoo <at...@facebook.com>.
We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no?

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 14, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift 
> server is running as) drwxr-xr-x   - root supergroup          0 
> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift 
>> metastore service as user "joe" then all internal tables created will 
>> have directories under the warehouse directory owned by "joe" 
>> regardless of the actual user running the create table statement - is 
>> this correct? There is no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly 
>> works against the metastore database, then the directories are 
>> created by the actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands 
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
<name>dfs.permissions</name>
<value>true</value>
..
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>

You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)?

Pradeep
 
-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Tuesday, July 13, 2010 4:52 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift server is running as)
> drwxr-xr-x   - root supergroup          0 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift
>> metastore service as user "joe" then all internal tables created will have
>> directories under the warehouse directory owned by "joe" regardless of the
>> actual user running the create table statement - is this correct? There is
>> no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly works
>> against the metastore database, then the directories are created by the
>> actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop)
will be able to chown files to other users and groups.

Re: Thrift metastore server and dfs file owner

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> I tried:
> hive -e "set user.name=$USER;create table foo2 ( name string);"
>
> My warehouse table dir still got created by "root" (the user my thrift server is running as)
> drwxr-xr-x   - root supergroup          0 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Tuesday, July 13, 2010 2:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Thrift metastore server and dfs file owner
>
> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>    I suspect this is true but wanted to confirm: If I start a thrift
>> metastore service as user "joe" then all internal tables created will have
>> directories under the warehouse directory owned by "joe" regardless of the
>> actual user running the create table statement - is this correct? There is
>> no way for the thrift server to create the directory as the actual user?
>> However if thrift service is not used and the hive client directly works
>> against the metastore database, then the directories are created by the
>> actual user - is this correct?
>>
>>
>>
>> Thanks,
>>
>> Pradeep
>
> The hive web interface does this:
>
>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
>        + auth.getGroups()[0]);
>    queries.add("set user.name=" + auth.getUser());
>
> You should be able to accomplish the same thing using set commands
> with the Thrift Server to impersonate.
>
> Regards,
> Edward
>

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

hive> dfs -chown user:user /user/hive/warehouse/foo2;

Questions:
Who is your hadoop superuser?
Are you enforcing dfs permissions?

If you are enforcing permissions only the hadoop superuser (hadoop)
will be able to chown files to other users and groups.

RE: Thrift metastore server and dfs file owner

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
I tried:
hive -e "set user.name=$USER;create table foo2 ( name string);"

My warehouse table dir still got created by "root" (the user my thrift server is running as)
drwxr-xr-x   - root supergroup          0 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Tuesday, July 13, 2010 2:47 PM
To: hive-user@hadoop.apache.org
Subject: Re: Thrift metastore server and dfs file owner

On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> Hi,
>
>    I suspect this is true but wanted to confirm: If I start a thrift
> metastore service as user "joe" then all internal tables created will have
> directories under the warehouse directory owned by "joe" regardless of the
> actual user running the create table statement - is this correct? There is
> no way for the thrift server to create the directory as the actual user?
> However if thrift service is not used and the hive client directly works
> against the metastore database, then the directories are created by the
> actual user - is this correct?
>
>
>
> Thanks,
>
> Pradeep

The hive web interface does this:

    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
        + auth.getGroups()[0]);
    queries.add("set user.name=" + auth.getUser());

You should be able to accomplish the same thing using set commands
with the Thrift Server to impersonate.

Regards,
Edward

Re: Thrift metastore server and dfs file owner

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> Hi,
>
>    I suspect this is true but wanted to confirm: If I start a thrift
> metastore service as user “joe” then all internal tables created will have
> directories under the warehouse directory owned by “joe” regardless of the
> actual user running the create table statement – is this correct? There is
> no way for the thrift server to create the directory as the actual user?
> However if thrift service is not used and the hive client directly works
> against the metastore database, then the directories are created by the
> actual user – is this correct?
>
>
>
> Thanks,
>
> Pradeep

The hive web interface does this:

    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","
        + auth.getGroups()[0]);
    queries.add("set user.name=" + auth.getUser());

You should be able to accomplish the same thing using set commands
with the Thrift Server to impersonate.

Regards,
Edward