You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arvind Sundararajan <ar...@gmail.com> on 2015/08/05 19:32:46 UTC

hadoop installation in pseudo distributed mode regular user vs dedicated user

Hi All,

I have a laptop running Ubuntu 14.04 LTS and am trying to install hadoop
2.7.1 (current stable version) in pseudo-distributed mode.

I have a regular user account on my laptop, but am confused if i should
install hadoop using a dedicated hadoop user on my laptop.
NOTE: By 'regular user', i mean the linux user account that i use for
day-to-day personal work

The current hadoop documentation at [1] does not mention setting up a
dedicated user for hadoop installation.

However, the hadoop installation tutorial at [2] mentions setting up a
dedicated user for hadoop installation in pseudo-distributed mode on a
single machine. This tutorial references an outdated hadoop installation
tutorial [3] which too mentions setting up a dedicated user for hadoop
installation in pseudo-distributed mode on a single machine.

I found several tutorials online which all seem to mention setting up
dedicated user for hadoop installation in pseudo-distributed mode on a
single machine, without mentioning why we should set up a dedicated user.

My questions are as follows:

a) Is it possible for me to execute hadoop programs as a regular user even
if hadoop is installed in pseudo-distributed mode via a dedicated 'hadoop'
user?
If yes, what linux filesystem folder permissions and HDFS permissions do i
need to give to the regular user for executing hadoop programs?

b) Quoting from the outdated hadoop installation tutorial [3]:

    "We will use a dedicated Hadoop user account for running Hadoop.
     While that's not required it is recommended because it helps to separate
     the Hadoop installation from other software applications and
     user accounts running on the same machine
     (think: security, permissions, backups, etc)."

Can someone elaborate on this? what are the issues regarding security,
permissions, backups when running hadoop in pseudo-distributed mode on a
single laptop which will most likely have only one user account (my current
user account) ?

c) Can someone please elaborate on the pros and cons of running hadoop in
pseudo-distributed mode on a single machine as the regular user versus
creating a dedicated user?

My thoughts on the cons, thus far has been:

    i) if hadoop is unable to execute from a 'regular user' and
    only works from the dedicated hadoop user account, then i
    will have to edit my hadoop java programs from my
    'regular user' account where i have my development environment
    and IDE/text editor setup, copy the .jar files to the
    dedicated hadoop user account and execute. if any error occurs,
    i have to go back to the 'regular user' account, edit and
    then copy the new .jar files and execute again. this moving
    back and forth between accounts is a definite pain while
    working in pseudo-distributed mode and i have experienced
    this while working in Hadoop 1.x version

    ii) if hadoop is unable to execute from a 'regular user' and
    only works from the dedicated hadoop user account, then
    the hadoop operations copyFromLocal and copyToLocal will
    require a shared folder for both user accounts.

P.S. I also referred [4] and [5] before asking this question.

References:

[1]
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
[2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
[3]
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
[4]
http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
[5]
http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running

Re: hadoop installation in pseudo distributed mode regular user vs dedicated user

Posted by Alexander Striffeler <a....@students.unibe.ch>.
Hi Arvind

I can't fully answer your questions on how to install Hadoop in 
pseudo-distributed mode, but I can kind of invalidate your cons: By 
using sudo su <user> in your shell, you can easily users during a 
session. Giving the hadoop-user access to your directories should then 
be an issue of two minutes at max...

Cheers,
alex


On 05.08.2015 19:32, Arvind Sundararajan wrote:
>
> Hi All,
>
> I have a laptop running Ubuntu 14.04 LTS and am trying to install 
> hadoop 2.7.1 (current stable version) in pseudo-distributed mode.
>
> I have a regular user account on my laptop, but am confused if i 
> should install hadoop using a dedicated hadoop user on my laptop.
> NOTE: By 'regular user', i mean the linux user account that i use for 
> day-to-day personal work
>
> The current hadoop documentation at [1] does not mention setting up a 
> dedicated user for hadoop installation.
>
> However, the hadoop installation tutorial at [2] mentions setting up a 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine. This tutorial references an outdated hadoop 
> installation tutorial [3] which too mentions setting up a dedicated 
> user for hadoop installation in pseudo-distributed mode on a single 
> machine.
>
> I found several tutorials online which all seem to mention setting up 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine, without mentioning why we should set up a dedicated user.
>
> My questions are as follows:
>
> a) Is it possible for me to execute hadoop programs as a regular user 
> even if hadoop is installed in pseudo-distributed mode via a dedicated 
> 'hadoop' user?
> If yes, what linux filesystem folder permissions and HDFS permissions 
> do i need to give to the regular user for executing hadoop programs?
>
> b) Quoting from the outdated hadoop installation tutorial [3]:
>
> |     "We will use a dedicated Hadoop user account for running Hadoop.
>       While that's not required it is recommended because it helps to separate
>       the Hadoop installation from other software applications and
>       user accounts running on the same machine
>       (think: security, permissions, backups, etc)."
> |
>
> Can someone elaborate on this? what are the issues regarding security, 
> permissions, backups when running hadoop in pseudo-distributed mode on 
> a single laptop which will most likely have only one user account (my 
> current user account) ?
>
> c) Can someone please elaborate on the pros and cons of running hadoop 
> in pseudo-distributed mode on a single machine as the regular user 
> versus creating a dedicated user?
>
> My thoughts on the cons, thus far has been:
>
> |     i) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then i
>      will have to edit my hadoop java programs from my
>      'regular user' account where i have my development environment
>      and IDE/text editor setup, copy the .jar files to the
>      dedicated hadoop user account and execute. if any error occurs,
>      i have to go back to the 'regular user' account, edit and
>      then copy the new .jar files and execute again. this moving
>      back and forth between accounts is a definite pain while
>      working in pseudo-distributed mode and i have experienced
>      this while working in Hadoop 1.x version
>
>      ii) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then
>      the hadoop operations copyFromLocal and copyToLocal will
>      require a shared folder for both user accounts.
> |
>
> P.S. I also referred [4] and [5] before asking this question.
>
> References:
>
> [1] 
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
> [2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
> [3] 
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> [4] 
> http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
> [5] 
> http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running
>


Re: hadoop installation in pseudo distributed mode regular user vs dedicated user

Posted by Alexander Striffeler <a....@students.unibe.ch>.
Hi Arvind

I can't fully answer your questions on how to install Hadoop in 
pseudo-distributed mode, but I can kind of invalidate your cons: By 
using sudo su <user> in your shell, you can easily users during a 
session. Giving the hadoop-user access to your directories should then 
be an issue of two minutes at max...

Cheers,
alex


On 05.08.2015 19:32, Arvind Sundararajan wrote:
>
> Hi All,
>
> I have a laptop running Ubuntu 14.04 LTS and am trying to install 
> hadoop 2.7.1 (current stable version) in pseudo-distributed mode.
>
> I have a regular user account on my laptop, but am confused if i 
> should install hadoop using a dedicated hadoop user on my laptop.
> NOTE: By 'regular user', i mean the linux user account that i use for 
> day-to-day personal work
>
> The current hadoop documentation at [1] does not mention setting up a 
> dedicated user for hadoop installation.
>
> However, the hadoop installation tutorial at [2] mentions setting up a 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine. This tutorial references an outdated hadoop 
> installation tutorial [3] which too mentions setting up a dedicated 
> user for hadoop installation in pseudo-distributed mode on a single 
> machine.
>
> I found several tutorials online which all seem to mention setting up 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine, without mentioning why we should set up a dedicated user.
>
> My questions are as follows:
>
> a) Is it possible for me to execute hadoop programs as a regular user 
> even if hadoop is installed in pseudo-distributed mode via a dedicated 
> 'hadoop' user?
> If yes, what linux filesystem folder permissions and HDFS permissions 
> do i need to give to the regular user for executing hadoop programs?
>
> b) Quoting from the outdated hadoop installation tutorial [3]:
>
> |     "We will use a dedicated Hadoop user account for running Hadoop.
>       While that's not required it is recommended because it helps to separate
>       the Hadoop installation from other software applications and
>       user accounts running on the same machine
>       (think: security, permissions, backups, etc)."
> |
>
> Can someone elaborate on this? what are the issues regarding security, 
> permissions, backups when running hadoop in pseudo-distributed mode on 
> a single laptop which will most likely have only one user account (my 
> current user account) ?
>
> c) Can someone please elaborate on the pros and cons of running hadoop 
> in pseudo-distributed mode on a single machine as the regular user 
> versus creating a dedicated user?
>
> My thoughts on the cons, thus far has been:
>
> |     i) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then i
>      will have to edit my hadoop java programs from my
>      'regular user' account where i have my development environment
>      and IDE/text editor setup, copy the .jar files to the
>      dedicated hadoop user account and execute. if any error occurs,
>      i have to go back to the 'regular user' account, edit and
>      then copy the new .jar files and execute again. this moving
>      back and forth between accounts is a definite pain while
>      working in pseudo-distributed mode and i have experienced
>      this while working in Hadoop 1.x version
>
>      ii) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then
>      the hadoop operations copyFromLocal and copyToLocal will
>      require a shared folder for both user accounts.
> |
>
> P.S. I also referred [4] and [5] before asking this question.
>
> References:
>
> [1] 
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
> [2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
> [3] 
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> [4] 
> http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
> [5] 
> http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running
>


Re: hadoop installation in pseudo distributed mode regular user vs dedicated user

Posted by Alexander Striffeler <a....@students.unibe.ch>.
Hi Arvind

I can't fully answer your questions on how to install Hadoop in 
pseudo-distributed mode, but I can kind of invalidate your cons: By 
using sudo su <user> in your shell, you can easily users during a 
session. Giving the hadoop-user access to your directories should then 
be an issue of two minutes at max...

Cheers,
alex


On 05.08.2015 19:32, Arvind Sundararajan wrote:
>
> Hi All,
>
> I have a laptop running Ubuntu 14.04 LTS and am trying to install 
> hadoop 2.7.1 (current stable version) in pseudo-distributed mode.
>
> I have a regular user account on my laptop, but am confused if i 
> should install hadoop using a dedicated hadoop user on my laptop.
> NOTE: By 'regular user', i mean the linux user account that i use for 
> day-to-day personal work
>
> The current hadoop documentation at [1] does not mention setting up a 
> dedicated user for hadoop installation.
>
> However, the hadoop installation tutorial at [2] mentions setting up a 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine. This tutorial references an outdated hadoop 
> installation tutorial [3] which too mentions setting up a dedicated 
> user for hadoop installation in pseudo-distributed mode on a single 
> machine.
>
> I found several tutorials online which all seem to mention setting up 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine, without mentioning why we should set up a dedicated user.
>
> My questions are as follows:
>
> a) Is it possible for me to execute hadoop programs as a regular user 
> even if hadoop is installed in pseudo-distributed mode via a dedicated 
> 'hadoop' user?
> If yes, what linux filesystem folder permissions and HDFS permissions 
> do i need to give to the regular user for executing hadoop programs?
>
> b) Quoting from the outdated hadoop installation tutorial [3]:
>
> |     "We will use a dedicated Hadoop user account for running Hadoop.
>       While that's not required it is recommended because it helps to separate
>       the Hadoop installation from other software applications and
>       user accounts running on the same machine
>       (think: security, permissions, backups, etc)."
> |
>
> Can someone elaborate on this? what are the issues regarding security, 
> permissions, backups when running hadoop in pseudo-distributed mode on 
> a single laptop which will most likely have only one user account (my 
> current user account) ?
>
> c) Can someone please elaborate on the pros and cons of running hadoop 
> in pseudo-distributed mode on a single machine as the regular user 
> versus creating a dedicated user?
>
> My thoughts on the cons, thus far has been:
>
> |     i) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then i
>      will have to edit my hadoop java programs from my
>      'regular user' account where i have my development environment
>      and IDE/text editor setup, copy the .jar files to the
>      dedicated hadoop user account and execute. if any error occurs,
>      i have to go back to the 'regular user' account, edit and
>      then copy the new .jar files and execute again. this moving
>      back and forth between accounts is a definite pain while
>      working in pseudo-distributed mode and i have experienced
>      this while working in Hadoop 1.x version
>
>      ii) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then
>      the hadoop operations copyFromLocal and copyToLocal will
>      require a shared folder for both user accounts.
> |
>
> P.S. I also referred [4] and [5] before asking this question.
>
> References:
>
> [1] 
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
> [2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
> [3] 
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> [4] 
> http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
> [5] 
> http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running
>


Re: hadoop installation in pseudo distributed mode regular user vs dedicated user

Posted by Alexander Striffeler <a....@students.unibe.ch>.
Hi Arvind

I can't fully answer your questions on how to install Hadoop in 
pseudo-distributed mode, but I can kind of invalidate your cons: By 
using sudo su <user> in your shell, you can easily users during a 
session. Giving the hadoop-user access to your directories should then 
be an issue of two minutes at max...

Cheers,
alex


On 05.08.2015 19:32, Arvind Sundararajan wrote:
>
> Hi All,
>
> I have a laptop running Ubuntu 14.04 LTS and am trying to install 
> hadoop 2.7.1 (current stable version) in pseudo-distributed mode.
>
> I have a regular user account on my laptop, but am confused if i 
> should install hadoop using a dedicated hadoop user on my laptop.
> NOTE: By 'regular user', i mean the linux user account that i use for 
> day-to-day personal work
>
> The current hadoop documentation at [1] does not mention setting up a 
> dedicated user for hadoop installation.
>
> However, the hadoop installation tutorial at [2] mentions setting up a 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine. This tutorial references an outdated hadoop 
> installation tutorial [3] which too mentions setting up a dedicated 
> user for hadoop installation in pseudo-distributed mode on a single 
> machine.
>
> I found several tutorials online which all seem to mention setting up 
> dedicated user for hadoop installation in pseudo-distributed mode on a 
> single machine, without mentioning why we should set up a dedicated user.
>
> My questions are as follows:
>
> a) Is it possible for me to execute hadoop programs as a regular user 
> even if hadoop is installed in pseudo-distributed mode via a dedicated 
> 'hadoop' user?
> If yes, what linux filesystem folder permissions and HDFS permissions 
> do i need to give to the regular user for executing hadoop programs?
>
> b) Quoting from the outdated hadoop installation tutorial [3]:
>
> |     "We will use a dedicated Hadoop user account for running Hadoop.
>       While that's not required it is recommended because it helps to separate
>       the Hadoop installation from other software applications and
>       user accounts running on the same machine
>       (think: security, permissions, backups, etc)."
> |
>
> Can someone elaborate on this? what are the issues regarding security, 
> permissions, backups when running hadoop in pseudo-distributed mode on 
> a single laptop which will most likely have only one user account (my 
> current user account) ?
>
> c) Can someone please elaborate on the pros and cons of running hadoop 
> in pseudo-distributed mode on a single machine as the regular user 
> versus creating a dedicated user?
>
> My thoughts on the cons, thus far has been:
>
> |     i) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then i
>      will have to edit my hadoop java programs from my
>      'regular user' account where i have my development environment
>      and IDE/text editor setup, copy the .jar files to the
>      dedicated hadoop user account and execute. if any error occurs,
>      i have to go back to the 'regular user' account, edit and
>      then copy the new .jar files and execute again. this moving
>      back and forth between accounts is a definite pain while
>      working in pseudo-distributed mode and i have experienced
>      this while working in Hadoop 1.x version
>
>      ii) if hadoop is unable to execute from a 'regular user' and
>      only works from the dedicated hadoop user account, then
>      the hadoop operations copyFromLocal and copyToLocal will
>      require a shared folder for both user accounts.
> |
>
> P.S. I also referred [4] and [5] before asking this question.
>
> References:
>
> [1] 
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
> [2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
> [3] 
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> [4] 
> http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
> [5] 
> http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running
>