You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by sindhu hosamane <si...@gmail.com> on 2014/08/12 16:25:38 UTC
Pseudo -distributed mode
Can Setting up 2 datanodes on same machine be considered as
pseudo-distributed mode hadoop ?
Thanks,
Sindhu
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:
1. local mode - when you install hadoop you have empty configs, you
have no daemon processes. In this case your file system would be
used instead of HDFS, map-reduce jobs would be processed in he same
process as hadoop client.
2. pseudo distributed mode
<http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
have simple configuration with namenode, secondarynamenode,
datanode, jobtracker and tasktracker daemons. In this case
map-reduce jobs would be processed in separate child processes of
tasktracker.
3. distributed mode
<http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
multiple computers with complicated daemon distribution. In general
case you map-reduce program can run on every node in cluster.
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce). On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.
--
Thanks,
Sergey
On 12/08/14 18:36, sindhu hosamane wrote:
> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:
1. local mode - when you install hadoop you have empty configs, you
have no daemon processes. In this case your file system would be
used instead of HDFS, map-reduce jobs would be processed in he same
process as hadoop client.
2. pseudo distributed mode
<http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
have simple configuration with namenode, secondarynamenode,
datanode, jobtracker and tasktracker daemons. In this case
map-reduce jobs would be processed in separate child processes of
tasktracker.
3. distributed mode
<http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
multiple computers with complicated daemon distribution. In general
case you map-reduce program can run on every node in cluster.
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce). On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.
--
Thanks,
Sergey
On 12/08/14 18:36, sindhu hosamane wrote:
> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:
1. local mode - when you install hadoop you have empty configs, you
have no daemon processes. In this case your file system would be
used instead of HDFS, map-reduce jobs would be processed in he same
process as hadoop client.
2. pseudo distributed mode
<http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
have simple configuration with namenode, secondarynamenode,
datanode, jobtracker and tasktracker daemons. In this case
map-reduce jobs would be processed in separate child processes of
tasktracker.
3. distributed mode
<http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
multiple computers with complicated daemon distribution. In general
case you map-reduce program can run on every node in cluster.
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce). On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.
--
Thanks,
Sergey
On 12/08/14 18:36, sindhu hosamane wrote:
> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Probably you don't understand meaning of this phrase. There are 3 ways
to configure hadoop:
1. local mode - when you install hadoop you have empty configs, you
have no daemon processes. In this case your file system would be
used instead of HDFS, map-reduce jobs would be processed in he same
process as hadoop client.
2. pseudo distributed mode
<http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html>- you
have simple configuration with namenode, secondarynamenode,
datanode, jobtracker and tasktracker daemons. In this case
map-reduce jobs would be processed in separate child processes of
tasktracker.
3. distributed mode
<http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html> - you have
multiple computers with complicated daemon distribution. In general
case you map-reduce program can run on every node in cluster.
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
If you have only one tasktracker it doesn't mean that all jobs would be
processed on single cpu. Hadoop has abstraction over threads and
processes. This abstraction called 'slots'. There is two types of slots
- map and reduce, each slot type process appropriate task (map and
reduce). On each task (map or reduce) tasktracker can create child
process, but total number of child processes is limited. According to
Hadoop wiki <http://wiki.apache.org/hadoop/LimitingTaskSlotUsage> you
need to set properties ' mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum' to appropriate values.
Actually I think that if you have hadoop in pseudo distributed mode, you
don't need to manually set number of slots, Hadoop is very clever and it
would set appropriate number of slots automatically.
--
Thanks,
Sergey
On 12/08/14 18:36, sindhu hosamane wrote:
> I have read "By default, Hadoop is configured to run in a
> non-distributed mode, as a single Java process" .
>
> But if my hadoop is pseudo distributed mode , why does it still runs
> as a single Java process and utilizes only 1 cpu core even if there
> are many more ?
>
>
> On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev
> <sergeymurylev@gmail.com <ma...@gmail.com>> wrote:
>
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by sindhu hosamane <si...@gmail.com>.
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .
But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?
On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by sindhu hosamane <si...@gmail.com>.
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .
But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?
On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by sindhu hosamane <si...@gmail.com>.
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .
But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?
On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by sindhu hosamane <si...@gmail.com>.
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .
But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?
On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev <se...@gmail.com>
wrote:
> Yes :)
>
> Pseudo-distributed mode is such configuration when we have some Hadoop
> environment on single computer.
>
> On 12/08/14 18:25, sindhu hosamane wrote:
> > Can Setting up 2 datanodes on same machine be considered as
> > pseudo-distributed mode hadoop ?
> >
> > Thanks,
> > Sindhu
>
>
>
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Yes :)
Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.
On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine be considered as
> pseudo-distributed mode hadoop ?
>
> Thanks,
> Sindhu
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Yes :)
Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.
On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine be considered as
> pseudo-distributed mode hadoop ?
>
> Thanks,
> Sindhu
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Yes :)
Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.
On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine be considered as
> pseudo-distributed mode hadoop ?
>
> Thanks,
> Sindhu
Re: Pseudo -distributed mode
Posted by Sergey Murylev <se...@gmail.com>.
Yes :)
Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.
On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine be considered as
> pseudo-distributed mode hadoop ?
>
> Thanks,
> Sindhu