You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by wang <ww...@126.com> on 2013/02/09 10:54:40 UTC

in security mode, one MR job visit two user's data

Hi,

In security mode, Is it possible in one mr job visit two user's data in
hdfs? Means: there are two maps in one job, one map read user1's data,
another read user2's data.  As I know, before submit job, jobclient get the
delegation token for MR task, but in class credentials, the tokenmap can
only take one token for one type of service. If I get user2's token, and add
to credentials, the user1's token will be overwrite.

Anyone met the same situation or someone can give some suggestions? The
background is in hive, one sql maybe visit different user's data. Tks.

 

Regards

wwli


答复: 答复: in security mode, one MR job visit two user's data

Posted by wang <ww...@126.com>.
I am very happy to get the response. Thank you.

Give read permission to group is also not ok, because that means other user
can use dfsclient read the data easily. So what I thought is one user's data
in hdfs, should only be 700.

Coming to what you said ACLs in HDFS,  what my understanding is , the acl in
hdfs is too simple, it only have one owner, so if I set the permission to
700, how I can give other user the read permission?

Let me give more background : 
We want to implement the hive security,  we thoughts the user of hive should
be propagate to hdfs and mr, but currently, just using hiveserver's
principal.
We thoughts the user's table data in hdfs should only be 700, otherwise,
other user can directly use hdfsapi to get the data easily

In hive, one sql visit multiple user's data should be allowed, in rdb like
oracle, this requiremens is basic function.
So in hiveserver side, we will check whether user has permission to visit
other user's table, once true,  it means one sql maybe visit multiple user's
table data in hdfs by mr job

According to what you said below, then it is difficult for this requirement.
I will think more. Thanks you , also welcome more suggestions. 



-----邮件原件-----
发件人: yarn-dev-return-900-wwli05=126.com@hadoop.apache.org
[mailto:yarn-dev-return-900-wwli05=126.com@hadoop.apache.org] 代表 Robert
Evans
发送时间: 2013年2月11日 23:19
收件人: yarn-dev@hadoop.apache.org
主题: Re: 答复: in security mode, one MR job visit two user's data

I think he is talking about using groups and read only permissions.

Once the table is loaded into hive you can make the files read only by a
group that both users share.  The Hadoop code is really not setup to allow a
single job to pretend to be more then one user.  You might be able to fake
it, but because the assumption has always been one user there are likely to
be other problems that you run into, even if you get the tokens to work.  I
think the preferable alternative would be to work for true ACLs in HDFS.
Then you can set up an ACL to give read only access to the table for the one
user that needs it, and you don't have to set up a special HDFS group for
it.

--Bobby

On 2/9/13 8:31 PM, "wang" <ww...@126.com> wrote:

>Thank your 's response~
>In hive, user can directly execute load path command, if the dir is 
>accessible by two user, then, one user can directly load another user's 
>data into his table. Also. User can execute dfs command directly 
>through hiveserver. so the user's data in hdfs is better be 700.
>
>Whether it is possible I customize the TokenSelector? what i want is at 
>job client , I got all user's delegation token, and in map task, it can 
>choose the correct user's token according the pat it accessed.
>
>I am not sure whether I can achieve this or how much effort it 
>required. I still think of this, welcome the guide from yours.
>
>-----邮件原件-----
>发件人: yarn-dev-return-893-wwli05=126.com@hadoop.apache.org
>[mailto:yarn-dev-return-893-wwli05=126.com@hadoop.apache.org] 代表 
>Alejandro Abdelnur
>发送时间: 2013年2月10日 0:21
>收件人: yarn-dev@hadoop.apache.org
>主题: Re: in security mode, one MR job visit two user's data
>
>How about leveraging filesystem permissions so the user has access to 
>both dir?
>
>On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:
>
>> Hi,
>> 
>> In security mode, Is it possible in one mr job visit two user's data  
>>in hdfs? Means: there are two maps in one job, one map read user1's  
>>data, another read user2's data.  As I know, before submit job,  
>>jobclient get the delegation token for MR task, but in class  
>>credentials, the tokenmap can only take one token for one type of  
>>service. If I get user2's token, and add to credentials, the user1's 
>>token
>will be overwrite.
>> 
>> Anyone met the same situation or someone can give some suggestions?
>> The background is in hive, one sql maybe visit different user's data.
>>Tks.
>> 
>> 
>> 
>> Regards
>> 
>> wwli
>> 
>
>




答复: 答复: in security mode, one MR job visit two user's data

Posted by wang <ww...@126.com>.
I am very happy to get the response. Thank you.

Give read permission to group is also not ok, because that means other user
can use dfsclient read the data easily. So what I thought is one user's data
in hdfs, should only be 700.

Coming to what you said ACLs in HDFS,  what my understanding is , the acl in
hdfs is too simple, it only have one owner, so if I set the permission to
700, how I can give other user the read permission?

Let me give more background : 
We want to implement the hive security,  we thoughts the user of hive should
be propagate to hdfs and mr, but currently, just using hiveserver's
principal.
We thoughts the user's table data in hdfs should only be 700, otherwise,
other user can directly use hdfsapi to get the data easily

In hive, one sql visit multiple user's data should be allowed, in rdb like
oracle, this requiremens is basic function.
So in hiveserver side, we will check whether user has permission to visit
other user's table, once true,  it means one sql maybe visit multiple user's
table data in hdfs by mr job

According to what you said below, then it is difficult for this requirement.
I will think more. Thanks you , also welcome more suggestions. 



-----邮件原件-----
发件人: yarn-dev-return-900-wwli05=126.com@hadoop.apache.org
[mailto:yarn-dev-return-900-wwli05=126.com@hadoop.apache.org] 代表 Robert
Evans
发送时间: 2013年2月11日 23:19
收件人: yarn-dev@hadoop.apache.org
主题: Re: 答复: in security mode, one MR job visit two user's data

I think he is talking about using groups and read only permissions.

Once the table is loaded into hive you can make the files read only by a
group that both users share.  The Hadoop code is really not setup to allow a
single job to pretend to be more then one user.  You might be able to fake
it, but because the assumption has always been one user there are likely to
be other problems that you run into, even if you get the tokens to work.  I
think the preferable alternative would be to work for true ACLs in HDFS.
Then you can set up an ACL to give read only access to the table for the one
user that needs it, and you don't have to set up a special HDFS group for
it.

--Bobby

On 2/9/13 8:31 PM, "wang" <ww...@126.com> wrote:

>Thank your 's response~
>In hive, user can directly execute load path command, if the dir is 
>accessible by two user, then, one user can directly load another user's 
>data into his table. Also. User can execute dfs command directly 
>through hiveserver. so the user's data in hdfs is better be 700.
>
>Whether it is possible I customize the TokenSelector? what i want is at 
>job client , I got all user's delegation token, and in map task, it can 
>choose the correct user's token according the pat it accessed.
>
>I am not sure whether I can achieve this or how much effort it 
>required. I still think of this, welcome the guide from yours.
>
>-----邮件原件-----
>发件人: yarn-dev-return-893-wwli05=126.com@hadoop.apache.org
>[mailto:yarn-dev-return-893-wwli05=126.com@hadoop.apache.org] 代表 
>Alejandro Abdelnur
>发送时间: 2013年2月10日 0:21
>收件人: yarn-dev@hadoop.apache.org
>主题: Re: in security mode, one MR job visit two user's data
>
>How about leveraging filesystem permissions so the user has access to 
>both dir?
>
>On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:
>
>> Hi,
>> 
>> In security mode, Is it possible in one mr job visit two user's data  
>>in hdfs? Means: there are two maps in one job, one map read user1's  
>>data, another read user2's data.  As I know, before submit job,  
>>jobclient get the delegation token for MR task, but in class  
>>credentials, the tokenmap can only take one token for one type of  
>>service. If I get user2's token, and add to credentials, the user1's 
>>token
>will be overwrite.
>> 
>> Anyone met the same situation or someone can give some suggestions?
>> The background is in hive, one sql maybe visit different user's data.
>>Tks.
>> 
>> 
>> 
>> Regards
>> 
>> wwli
>> 
>
>




答复: 答复: in security mode, one MR job visit two user's data

Posted by wang <ww...@126.com>.
I am very happy to get the response. Thank you.

Give read permission to group is also not ok, because that means other user
can use dfsclient read the data easily. So what I thought is one user's data
in hdfs, should only be 700.

Coming to what you said ACLs in HDFS,  what my understanding is , the acl in
hdfs is too simple, it only have one owner, so if I set the permission to
700, how I can give other user the read permission?

Let me give more background : 
We want to implement the hive security,  we thoughts the user of hive should
be propagate to hdfs and mr, but currently, just using hiveserver's
principal.
We thoughts the user's table data in hdfs should only be 700, otherwise,
other user can directly use hdfsapi to get the data easily

In hive, one sql visit multiple user's data should be allowed, in rdb like
oracle, this requiremens is basic function.
So in hiveserver side, we will check whether user has permission to visit
other user's table, once true,  it means one sql maybe visit multiple user's
table data in hdfs by mr job

According to what you said below, then it is difficult for this requirement.
I will think more. Thanks you , also welcome more suggestions. 



-----邮件原件-----
发件人: yarn-dev-return-900-wwli05=126.com@hadoop.apache.org
[mailto:yarn-dev-return-900-wwli05=126.com@hadoop.apache.org] 代表 Robert
Evans
发送时间: 2013年2月11日 23:19
收件人: yarn-dev@hadoop.apache.org
主题: Re: 答复: in security mode, one MR job visit two user's data

I think he is talking about using groups and read only permissions.

Once the table is loaded into hive you can make the files read only by a
group that both users share.  The Hadoop code is really not setup to allow a
single job to pretend to be more then one user.  You might be able to fake
it, but because the assumption has always been one user there are likely to
be other problems that you run into, even if you get the tokens to work.  I
think the preferable alternative would be to work for true ACLs in HDFS.
Then you can set up an ACL to give read only access to the table for the one
user that needs it, and you don't have to set up a special HDFS group for
it.

--Bobby

On 2/9/13 8:31 PM, "wang" <ww...@126.com> wrote:

>Thank your 's response~
>In hive, user can directly execute load path command, if the dir is 
>accessible by two user, then, one user can directly load another user's 
>data into his table. Also. User can execute dfs command directly 
>through hiveserver. so the user's data in hdfs is better be 700.
>
>Whether it is possible I customize the TokenSelector? what i want is at 
>job client , I got all user's delegation token, and in map task, it can 
>choose the correct user's token according the pat it accessed.
>
>I am not sure whether I can achieve this or how much effort it 
>required. I still think of this, welcome the guide from yours.
>
>-----邮件原件-----
>发件人: yarn-dev-return-893-wwli05=126.com@hadoop.apache.org
>[mailto:yarn-dev-return-893-wwli05=126.com@hadoop.apache.org] 代表 
>Alejandro Abdelnur
>发送时间: 2013年2月10日 0:21
>收件人: yarn-dev@hadoop.apache.org
>主题: Re: in security mode, one MR job visit two user's data
>
>How about leveraging filesystem permissions so the user has access to 
>both dir?
>
>On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:
>
>> Hi,
>> 
>> In security mode, Is it possible in one mr job visit two user's data  
>>in hdfs? Means: there are two maps in one job, one map read user1's  
>>data, another read user2's data.  As I know, before submit job,  
>>jobclient get the delegation token for MR task, but in class  
>>credentials, the tokenmap can only take one token for one type of  
>>service. If I get user2's token, and add to credentials, the user1's 
>>token
>will be overwrite.
>> 
>> Anyone met the same situation or someone can give some suggestions?
>> The background is in hive, one sql maybe visit different user's data.
>>Tks.
>> 
>> 
>> 
>> Regards
>> 
>> wwli
>> 
>
>




Re: 答复: in security mode, one MR job visit two user's data

Posted by Robert Evans <ev...@yahoo-inc.com>.
I think he is talking about using groups and read only permissions.

Once the table is loaded into hive you can make the files read only by a
group that both users share.  The Hadoop code is really not setup to allow
a single job to pretend to be more then one user.  You might be able to
fake it, but because the assumption has always been one user there are
likely to be other problems that you run into, even if you get the tokens
to work.  I think the preferable alternative would be to work for true
ACLs in HDFS.  Then you can set up an ACL to give read only access to the
table for the one user that needs it, and you don't have to set up a
special HDFS group for it.

--Bobby

On 2/9/13 8:31 PM, "wang" <ww...@126.com> wrote:

>Thank your 's response~
>In hive, user can directly execute load path command, if the dir is
>accessible by two user, then, one user can directly load another user's
>data
>into his table. Also. User can execute dfs command directly through
>hiveserver. so the user's data in hdfs is better be 700.
>
>Whether it is possible I customize the TokenSelector? what i want is at
>job
>client , I got all user's delegation token, and in map task, it can choose
>the correct user's token according the pat it accessed.
>
>I am not sure whether I can achieve this or how much effort it required. I
>still think of this, welcome the guide from yours.
>
>-----邮件原件-----
>发件人: yarn-dev-return-893-wwli05=126.com@hadoop.apache.org
>[mailto:yarn-dev-return-893-wwli05=126.com@hadoop.apache.org] 代表 Alejandro
>Abdelnur
>发送时间: 2013年2月10日 0:21
>收件人: yarn-dev@hadoop.apache.org
>主题: Re: in security mode, one MR job visit two user's data
>
>How about leveraging filesystem permissions so the user has access to both
>dir?
>
>On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:
>
>> Hi,
>> 
>> In security mode, Is it possible in one mr job visit two user's data
>> in hdfs? Means: there are two maps in one job, one map read user1's
>> data, another read user2's data.  As I know, before submit job,
>> jobclient get the delegation token for MR task, but in class
>> credentials, the tokenmap can only take one token for one type of
>> service. If I get user2's token, and add to credentials, the user1's
>>token
>will be overwrite.
>> 
>> Anyone met the same situation or someone can give some suggestions?
>> The background is in hive, one sql maybe visit different user's data.
>>Tks.
>> 
>> 
>> 
>> Regards
>> 
>> wwli
>> 
>
>


答复: in security mode, one MR job visit two user's data

Posted by wang <ww...@126.com>.
Thank your 's response~
In hive, user can directly execute load path command, if the dir is
accessible by two user, then, one user can directly load another user's data
into his table. Also. User can execute dfs command directly through
hiveserver. so the user's data in hdfs is better be 700. 

Whether it is possible I customize the TokenSelector? what i want is at job
client , I got all user's delegation token, and in map task, it can choose
the correct user's token according the pat it accessed.

I am not sure whether I can achieve this or how much effort it required. I
still think of this, welcome the guide from yours.

-----邮件原件-----
发件人: yarn-dev-return-893-wwli05=126.com@hadoop.apache.org
[mailto:yarn-dev-return-893-wwli05=126.com@hadoop.apache.org] 代表 Alejandro
Abdelnur
发送时间: 2013年2月10日 0:21
收件人: yarn-dev@hadoop.apache.org
主题: Re: in security mode, one MR job visit two user's data

How about leveraging filesystem permissions so the user has access to both
dir?

On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:

> Hi,
> 
> In security mode, Is it possible in one mr job visit two user's data 
> in hdfs? Means: there are two maps in one job, one map read user1's 
> data, another read user2's data.  As I know, before submit job, 
> jobclient get the delegation token for MR task, but in class 
> credentials, the tokenmap can only take one token for one type of 
> service. If I get user2's token, and add to credentials, the user1's token
will be overwrite.
> 
> Anyone met the same situation or someone can give some suggestions? 
> The background is in hive, one sql maybe visit different user's data. Tks.
> 
> 
> 
> Regards
> 
> wwli
> 



Re: in security mode, one MR job visit two user's data

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
How about leveraging filesystem permissions so the user has access to both dir?

On Feb 9, 2013, at 1:54 AM, "wang" <ww...@126.com> wrote:

> Hi,
> 
> In security mode, Is it possible in one mr job visit two user's data in
> hdfs? Means: there are two maps in one job, one map read user1's data,
> another read user2's data.  As I know, before submit job, jobclient get the
> delegation token for MR task, but in class credentials, the tokenmap can
> only take one token for one type of service. If I get user2's token, and add
> to credentials, the user1's token will be overwrite.
> 
> Anyone met the same situation or someone can give some suggestions? The
> background is in hive, one sql maybe visit different user's data. Tks.
> 
> 
> 
> Regards
> 
> wwli
>