You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by sindhu hosamane <si...@gmail.com> on 2014/08/12 16:25:38 UTC

Pseudo -distributed mode

Can Setting up 2 datanodes on same machine  be considered as
pseudo-distributed mode hadoop  ?

Thanks,
Sindhu

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:

 1. local mode - when you install hadoop you have empty configs, you
    have no daemon processes. In this case your file system would be
    used instead of HDFS, map-reduce jobs would be processed in he same
    process as hadoop client.
 2. pseudo distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
    have simple configuration with namenode, secondarynamenode,
    datanode, jobtracker and tasktracker daemons. In this case
    map-reduce jobs would be processed in separate child processes of
    tasktracker.
 3. distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
    multiple computers with complicated daemon distribution. In general
    case you map-reduce program can run on every node in cluster.

> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce).  On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.

--
Thanks,
Sergey

On 12/08/14 18:36, sindhu hosamane wrote:

> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
>     Yes :)
>
>     Pseudo-distributed mode is such configuration when we have some Hadoop
>     environment on single computer.
>
>     On 12/08/14 18:25, sindhu hosamane wrote:
>     > Can Setting up 2 datanodes on same machine  be considered as
>     > pseudo-distributed mode hadoop  ?
>     >
>     > Thanks,
>     > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:

 1. local mode - when you install hadoop you have empty configs, you
    have no daemon processes. In this case your file system would be
    used instead of HDFS, map-reduce jobs would be processed in he same
    process as hadoop client.
 2. pseudo distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
    have simple configuration with namenode, secondarynamenode,
    datanode, jobtracker and tasktracker daemons. In this case
    map-reduce jobs would be processed in separate child processes of
    tasktracker.
 3. distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
    multiple computers with complicated daemon distribution. In general
    case you map-reduce program can run on every node in cluster.

> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce).  On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.

--
Thanks,
Sergey

On 12/08/14 18:36, sindhu hosamane wrote:

> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
>     Yes :)
>
>     Pseudo-distributed mode is such configuration when we have some Hadoop
>     environment on single computer.
>
>     On 12/08/14 18:25, sindhu hosamane wrote:
>     > Can Setting up 2 datanodes on same machine  be considered as
>     > pseudo-distributed mode hadoop  ?
>     >
>     > Thanks,
>     > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:

 1. local mode - when you install hadoop you have empty configs, you
    have no daemon processes. In this case your file system would be
    used instead of HDFS, map-reduce jobs would be processed in he same
    process as hadoop client.
 2. pseudo distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
    have simple configuration with namenode, secondarynamenode,
    datanode, jobtracker and tasktracker daemons. In this case
    map-reduce jobs would be processed in separate child processes of
    tasktracker.
 3. distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
    multiple computers with complicated daemon distribution. In general
    case you map-reduce program can run on every node in cluster.

> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce).  On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.

--
Thanks,
Sergey

On 12/08/14 18:36, sindhu hosamane wrote:

> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
>     Yes :)
>
>     Pseudo-distributed mode is such configuration when we have some Hadoop
>     environment on single computer.
>
>     On 12/08/14 18:25, sindhu hosamane wrote:
>     > Can Setting up 2 datanodes on same machine  be considered as
>     > pseudo-distributed mode hadoop  ?
>     >
>     > Thanks,
>     > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:

 1. local mode - when you install hadoop you have empty configs, you
    have no daemon processes. In this case your file system would be
    used instead of HDFS, map-reduce jobs would be processed in he same
    process as hadoop client.
 2. pseudo distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
    have simple configuration with namenode, secondarynamenode,
    datanode, jobtracker and tasktracker daemons. In this case
    map-reduce jobs would be processed in separate child processes of
    tasktracker.
 3. distributed mode
    <http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
    multiple computers with complicated daemon distribution. In general
    case you map-reduce program can run on every node in cluster.

> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce).  On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.

--
Thanks,
Sergey

On 12/08/14 18:36, sindhu hosamane wrote:

> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
>     Yes :)
>
>     Pseudo-distributed mode is such configuration when we have some Hadoop
>     environment on single computer.
>
>     On 12/08/14 18:25, sindhu hosamane wrote:
>     > Can Setting up 2 datanodes on same machine  be considered as
>     > pseudo-distributed mode hadoop  ?
>     >
>     > Thanks,
>     > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by sindhu hosamane <si...@gmail.com>.

I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .

But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?

On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:

> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by sindhu hosamane <si...@gmail.com>.

I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .

But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?

On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:

> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by sindhu hosamane <si...@gmail.com>.

I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .

But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?

On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:

> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by sindhu hosamane <si...@gmail.com>.

I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .

But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?

On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:

> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine  be considered as
> > pseudo-distributed mode hadoop  ?
> >
> > Thanks,
> > Sindhu
>
>
>

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Yes :)

Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.

On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine  be considered as
> pseudo-distributed mode hadoop  ?
>
> Thanks,
> Sindhu

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Yes :)

Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.

On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine  be considered as
> pseudo-distributed mode hadoop  ?
>
> Thanks,
> Sindhu

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Yes :)

Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.

On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine  be considered as
> pseudo-distributed mode hadoop  ?
>
> Thanks,
> Sindhu

Re: Pseudo -distributed mode

Posted by Sergey Murylev <se...@gmail.com>.

Yes :)

Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.

On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine  be considered as
> pseudo-distributed mode hadoop  ?
>
> Thanks,
> Sindhu