You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2012/09/08 21:33:41 UTC

Re: group assignment on HDFS from Hadoop and Hive

Hi Chen,

Inline.

On Mon, Aug 13, 2012 at 9:51 PM, Chen Song <ch...@gmail.com> wrote:
> I am wondering how Hadoop assign groups when dirs/files are being created by
> a user and below are some tests I have done. In my cluster, group hadoop is
> configured as the supergroup.

A good article on how the group resolution generally works on Hadoop
(on defaults) can be found here:
http://www.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop/

>> hadoop fs -ls /tmp
> drwxrwxrwx   - abc hadoop          0 2012-08-10 23:02 /tmp/abc
> drwxrwxrwx   - def other_group          0 2012-08-10 23:02 /tmp/def
>
>> groups apache
> apache: apache wheel
>
>> sudo -u apache hadoop fs -put somefile /tmp/abc
>> hadoop fs -ls /tmp/abc
> -rw-rw-r--   3 apache hadoop     120962 2012-08-13 16:03 /tmp/abc/somefile
>
>> sudo -u apache hadoop fs -put somefile /tmp/def
>> hadoop fs -lsr /tmp/def
> -rw-rw-r--   3 apache other_group     120962 2012-08-13 16:03
> /tmp/abc/somefile
>
> Based on the experiments above, it looks like the file got pushed on hdfs is
> always inheriting its group from the parent including folder. Is that always
> the case?

Yes, HDFS follows the BSD-style semantics here. Groups are inherited
from parents unless specifically set. Groups are set properly (by
default) if the user and their groups both exist on the NameNode. The
above article explains this.

> A follow-up question on one finding in Hive is: when executing a query to
> overwrite a table (or a partition within a table), the newly written
> overriding directory always end up as belong to HDFS's supergroup, no matter
> what context it is running from
> 1. The user who is executing the hive query
> 2. The group where the user belongs to
> 3. The group the parent table directory is belonging to.
> Is it always expected in Hive?
>
> For example, table A is stored on /path/A and is partitioned on column dh.
> /path/A is with group other_group.
> After running insert overwrite A partition (dh = "12") select column list
> from ... where ...
>
> /path/A/12 ends up with hadoop as always. This has contradicts to the
> assumption of inheritance I have drawn out above. Any thoughts would be
> appreciated.

Can you post a few set of commands that can reproduce this, like via
hadoop fs -xyz commands? That way we can eliminate if Hive is being
the issue here.

> Thanks
> Chen
>
>
>
>



-- 
Harsh J