You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Kumar, Amit H." <AH...@odu.edu> on 2011/02/09 20:12:56 UTC

Hadoop Multi user - Cluster Setup

Dear All,

I am trying to setup Hadoop for multiple users in a class, on our cluster. For some reason I don't seem to get it right. If only one user is running it works great. 
I would want to have all of the users submit a Hadoop job to the existing DataNode and on the cluster, not sure if this is right.
Do I need to start a DataNode for every user, if so I was not able to do because I ran into issues of port already being used. 
Please advise. Below are few of the config files. 

Also I have tired searching for other documents, that tell us to create a user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop user. This didn't work for me as well.  I am sure I am doing something wrong. Could anyone please thrown in some more ideas. 

=>List of env changed in Hadoop-env.sh: 
export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

#cat core-site.xml
<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://frontend:9000</value>
     </property>
     <property>
        <name>hadoop.tmp.dir</name>
        <value>/scratch/${user.name}/hadoop-FS</value>
        <description>A base for other temporary directories.</description>
     </property>
</configuration>

# cat hdfs-site.xml
<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
     <property>
         <name>dfs.name.dir</name>
         <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
     </property>
</configuration>

# cat mapred-site.xml
<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>frontend:9001</value>
     </property>
     <property>
         <name>mapreduce.tasktracker.map.tasks.maximum</name>
         <value>2</value>
     </property>
     <property>
         <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
         <value>2</value>
     </property>
</configuration>


Thank you,
Amit

Re: Hadoop Multi user - Cluster Setup

Posted by Harsh J <qw...@gmail.com>.

Please read the HDFS Permissions guide which explains the
understanding required to have a working permissions model on the DFS:
http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html

On Thu, Feb 10, 2011 at 11:15 PM, Kumar, Amit H. <AH...@odu.edu> wrote:
> Li Ping: Disabling dfs.permissions did the charm!.
>
> I have the following questions, if you can help me understand this better:
> 1. Not sure what are the consequences of disabling it or even doing chmod o+w on the entire filesyste(/).
> 2. Is there any need to have the permissions in place, other than securing users from each other's work.
> 3. Is it still possible to have the hdfs permissions enabled and yet be able to run multiple user submitting jobs to a common pool of resources.
>
> Thank you so much for your help!
> Amit
>
>
>> -----Original Message-----
>> From: li ping [mailto:li.j2ee@gmail.com]
>> Sent: Wednesday, February 09, 2011 9:00 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Hadoop Multi user - Cluster Setup
>>
>> If can check this property in hdfs-site.xml
>>
>> <property>
>>   <name>dfs.permissions</name>
>>   <value>true</value>
>>   <description>
>>     If "true", enable permission checking in HDFS.
>>     If "false", permission checking is turned off,
>>     but all other behavior is unchanged.
>>     Switching from one parameter value to the other does not change the
>> mode,
>>     owner or group of files or directories.
>>   </description>
>> </property>
>>
>> You can disable this option.
>>
>> the second way is:
>> running the command in hadoop. hadoop fs -chmod o+w /
>> It has the same effect with first one
>>
>> On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. <AH...@odu.edu>
>> wrote:
>>
>> > Dear All,
>> >
>> > I am trying to setup Hadoop for multiple users in a class, on our
>> cluster.
>> > For some reason I don't seem to get it right. If only one user is
>> running it
>> > works great.
>> > I would want to have all of the users submit a Hadoop job to the
>> existing
>> > DataNode and on the cluster, not sure if this is right.
>> > Do I need to start a DataNode for every user, if so I was not able to
>> do
>> > because I ran into issues of port already being used.
>> > Please advise. Below are few of the config files.
>> >
>> > Also I have tired searching for other documents, that tell us to
>> create a
>> > user "Hadoop" and a group "Hadoop" and then start the daemons as
>> Hadoop
>> > user. This didn't work for me as well.  I am sure I am doing
>> something
>> > wrong. Could anyone please thrown in some more ideas.
>> >
>> > =>List of env changed in Hadoop-env.sh:
>> > export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
>> > export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids
>> >
>> > #cat core-site.xml
>> > <configuration>
>> >     <property>
>> >         <name>fs.default.name</name>
>> >         <value>hdfs://frontend:9000</value>
>> >     </property>
>> >     <property>
>> >        <name>hadoop.tmp.dir</name>
>> >        <value>/scratch/${user.name}/hadoop-FS</value>
>> >        <description>A base for other temporary
>> directories.</description>
>> >     </property>
>> > </configuration>
>> >
>> > # cat hdfs-site.xml
>> > <configuration>
>> >     <property>
>> >         <name>dfs.replication</name>
>> >         <value>1</value>
>> >     </property>
>> >     <property>
>> >         <name>dfs.name.dir</name>
>> >
>> <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
>> >     </property>
>> > </configuration>
>> >
>> > # cat mapred-site.xml
>> > <configuration>
>> >     <property>
>> >         <name>mapred.job.tracker</name>
>> >         <value>frontend:9001</value>
>> >     </property>
>> >     <property>
>> >         <name>mapreduce.tasktracker.map.tasks.maximum</name>
>> >         <value>2</value>
>> >     </property>
>> >     <property>
>> >         <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>> >         <value>2</value>
>> >     </property>
>> > </configuration>
>> >
>> >
>> > Thank you,
>> > Amit
>> >
>> >
>> >
>>
>>
>> --
>> -----李平
>>
>>
>> --
>> BEGIN-ANTISPAM-VOTING-LINKS
>> ------------------------------------------------------
>>
>> Teach CanIt if this mail (ID 444122709) is spam:
>> Spam:
>> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
>> 9&c=s
>> Not spam:
>> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
>> 9&c=n
>> Forget vote:
>> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
>> 9&c=f
>> ------------------------------------------------------
>> END-ANTISPAM-VOTING-LINKS
>
>



-- 
Harsh J
www.harshj.com

RE: Hadoop Multi user - Cluster Setup

Posted by "Kumar, Amit H." <AH...@odu.edu>.

Li Ping: Disabling dfs.permissions did the charm!. 

I have the following questions, if you can help me understand this better: 
1. Not sure what are the consequences of disabling it or even doing chmod o+w on the entire filesyste(/). 
2. Is there any need to have the permissions in place, other than securing users from each other's work. 
3. Is it still possible to have the hdfs permissions enabled and yet be able to run multiple user submitting jobs to a common pool of resources.

Thank you so much for your help!
Amit


> -----Original Message-----
> From: li ping [mailto:li.j2ee@gmail.com]
> Sent: Wednesday, February 09, 2011 9:00 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop Multi user - Cluster Setup
> 
> If can check this property in hdfs-site.xml
> 
> <property>
>   <name>dfs.permissions</name>
>   <value>true</value>
>   <description>
>     If "true", enable permission checking in HDFS.
>     If "false", permission checking is turned off,
>     but all other behavior is unchanged.
>     Switching from one parameter value to the other does not change the
> mode,
>     owner or group of files or directories.
>   </description>
> </property>
> 
> You can disable this option.
> 
> the second way is:
> running the command in hadoop. hadoop fs -chmod o+w /
> It has the same effect with first one
> 
> On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. <AH...@odu.edu>
> wrote:
> 
> > Dear All,
> >
> > I am trying to setup Hadoop for multiple users in a class, on our
> cluster.
> > For some reason I don't seem to get it right. If only one user is
> running it
> > works great.
> > I would want to have all of the users submit a Hadoop job to the
> existing
> > DataNode and on the cluster, not sure if this is right.
> > Do I need to start a DataNode for every user, if so I was not able to
> do
> > because I ran into issues of port already being used.
> > Please advise. Below are few of the config files.
> >
> > Also I have tired searching for other documents, that tell us to
> create a
> > user "Hadoop" and a group "Hadoop" and then start the daemons as
> Hadoop
> > user. This didn't work for me as well.  I am sure I am doing
> something
> > wrong. Could anyone please thrown in some more ideas.
> >
> > =>List of env changed in Hadoop-env.sh:
> > export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
> > export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids
> >
> > #cat core-site.xml
> > <configuration>
> >     <property>
> >         <name>fs.default.name</name>
> >         <value>hdfs://frontend:9000</value>
> >     </property>
> >     <property>
> >        <name>hadoop.tmp.dir</name>
> >        <value>/scratch/${user.name}/hadoop-FS</value>
> >        <description>A base for other temporary
> directories.</description>
> >     </property>
> > </configuration>
> >
> > # cat hdfs-site.xml
> > <configuration>
> >     <property>
> >         <name>dfs.replication</name>
> >         <value>1</value>
> >     </property>
> >     <property>
> >         <name>dfs.name.dir</name>
> >
> <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
> >     </property>
> > </configuration>
> >
> > # cat mapred-site.xml
> > <configuration>
> >     <property>
> >         <name>mapred.job.tracker</name>
> >         <value>frontend:9001</value>
> >     </property>
> >     <property>
> >         <name>mapreduce.tasktracker.map.tasks.maximum</name>
> >         <value>2</value>
> >     </property>
> >     <property>
> >         <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
> >         <value>2</value>
> >     </property>
> > </configuration>
> >
> >
> > Thank you,
> > Amit
> >
> >
> >
> 
> 
> --
> -----李平
> 
> 
> --
> BEGIN-ANTISPAM-VOTING-LINKS
> ------------------------------------------------------
> 
> Teach CanIt if this mail (ID 444122709) is spam:
> Spam:
> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
> 9&c=s
> Not spam:
> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
> 9&c=n
> Forget vote:
> https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
> 9&c=f
> ------------------------------------------------------
> END-ANTISPAM-VOTING-LINKS

Re: Hadoop Multi user - Cluster Setup

Posted by li ping <li...@gmail.com>.

If can check this property in hdfs-site.xml

<property>
  <name>dfs.permissions</name>
  <value>true</value>
  <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the
mode,
    owner or group of files or directories.
  </description>
</property>

You can disable this option.

the second way is:
running the command in hadoop. hadoop fs -chmod o+w /
It has the same effect with first one

On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. <AH...@odu.edu> wrote:

> Dear All,
>
> I am trying to setup Hadoop for multiple users in a class, on our cluster.
> For some reason I don't seem to get it right. If only one user is running it
> works great.
> I would want to have all of the users submit a Hadoop job to the existing
> DataNode and on the cluster, not sure if this is right.
> Do I need to start a DataNode for every user, if so I was not able to do
> because I ran into issues of port already being used.
> Please advise. Below are few of the config files.
>
> Also I have tired searching for other documents, that tell us to create a
> user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
> user. This didn't work for me as well.  I am sure I am doing something
> wrong. Could anyone please thrown in some more ideas.
>
> =>List of env changed in Hadoop-env.sh:
> export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
> export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids
>
> #cat core-site.xml
> <configuration>
>     <property>
>         <name>fs.default.name</name>
>         <value>hdfs://frontend:9000</value>
>     </property>
>     <property>
>        <name>hadoop.tmp.dir</name>
>        <value>/scratch/${user.name}/hadoop-FS</value>
>        <description>A base for other temporary directories.</description>
>     </property>
> </configuration>
>
> # cat hdfs-site.xml
> <configuration>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
>     <property>
>         <name>dfs.name.dir</name>
>         <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
>     </property>
> </configuration>
>
> # cat mapred-site.xml
> <configuration>
>     <property>
>         <name>mapred.job.tracker</name>
>         <value>frontend:9001</value>
>     </property>
>     <property>
>         <name>mapreduce.tasktracker.map.tasks.maximum</name>
>         <value>2</value>
>     </property>
>     <property>
>         <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>         <value>2</value>
>     </property>
> </configuration>
>
>
> Thank you,
> Amit
>
>
>


-- 
-----李平

Re: Hadoop Multi user - Cluster Setup

Posted by Piyush Joshi <sr...@gmail.com>.

Hey Amit, please try HOD or hadoop on demand tool. This will suffice to your
need for creating multiple users on ur cluster.

-Piyush

On Thu, Feb 10, 2011 at 12:42 AM, Kumar, Amit H. <AH...@odu.edu> wrote:

> Dear All,
>
> I am trying to setup Hadoop for multiple users in a class, on our cluster.
> For some reason I don't seem to get it right. If only one user is running it
> works great.
> I would want to have all of the users submit a Hadoop job to the existing
> DataNode and on the cluster, not sure if this is right.
> Do I need to start a DataNode for every user, if so I was not able to do
> because I ran into issues of port already being used.
> Please advise. Below are few of the config files.
>
> Also I have tired searching for other documents, that tell us to create a
> user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
> user. This didn't work for me as well.  I am sure I am doing something
> wrong. Could anyone please thrown in some more ideas.
>
> =>List of env changed in Hadoop-env.sh:
> export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
> export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids
>
> #cat core-site.xml
> <configuration>
>     <property>
>         <name>fs.default.name</name>
>         <value>hdfs://frontend:9000</value>
>     </property>
>     <property>
>        <name>hadoop.tmp.dir</name>
>        <value>/scratch/${user.name}/hadoop-FS</value>
>        <description>A base for other temporary directories.</description>
>     </property>
> </configuration>
>
> # cat hdfs-site.xml
> <configuration>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
>     <property>
>         <name>dfs.name.dir</name>
>         <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
>     </property>
> </configuration>
>
> # cat mapred-site.xml
> <configuration>
>     <property>
>         <name>mapred.job.tracker</name>
>         <value>frontend:9001</value>
>     </property>
>     <property>
>         <name>mapreduce.tasktracker.map.tasks.maximum</name>
>         <value>2</value>
>     </property>
>     <property>
>         <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>         <value>2</value>
>     </property>
> </configuration>
>
>
> Thank you,
> Amit
>
>
>