You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by kishore alajangi <al...@gmail.com> on 2013/12/12 06:04:07 UTC

Hadoop setup

Hi Experts,

Today I have a task to build hadoop cluster with 4 nodes in hardware.
Anybody suggest me the hardware specifications, OS and Hadoop version.

-- 
Thanks,
Kishore.

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
Again, hardware may depend what types of frameworks and applications, you
aim to run on the YARN cluster. If you mostly run MapReduce jobs, then
there should not be any significant difference. In our case, we simply
migrated our cluster from MRv1 to YARN on the same hardware.

Installing YARN is not necessary - you can choice YARN or MRv1. On the
other hand, YARN has a couple of advantages over MRv1 and I can mention
some of them:

1) Possibility to run alternative frameworks
Without YARN, the Hadoop cluster contains MapReduce computation engine that
can only run MapReduce jobs. On the other hand, with YARN, the Hadoop
cluster becomes much more generic+function, and we can run various types of
distributed applications on the same cluster. These applications can
include MapReduce, but also Tez, (optimized) Giraph, Hama, MPI etc.

2) Better utilization of the cluster's resources
Without YARN, the computational resources in the Hadoop cluster must be
artificially divided into separate map and separate reduce slots. Then, map
tasks can be only run in map slots, and reduce tasks can run only in reduce
slots (a map task can not run in a reduce slot). Due to of that separation,
we can see many situations all map slots are occupied, and we want more of
them, but we will not get them even if many reduce slots are available (and
vice versa).

With YARN, we can say that that we simply have a slot, that can run any
task, not matter whether it is a map or reduce task. Thanks to that we have
perfect utilization of our Hadoop cluster, and all the times our machines
are busy processing our data (assuming that you have jobs to run and
properly configure a scheduler/cluster).

3) Better scalabilty (but it is more related to a large cluster)





2013/12/14 kishore alajangi <al...@gmail.com>

> what makes difference in H/W selection, when we choosed "yarn" to
> install, and is necessary ?
>
> On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> > In general, it is very open question and there are many possibilities
> > depending on your workload (e.g. CPU-bound, IO-bound etc).
> >
> > If it is your first Hadoop cluster, and you do not know too much about
> what
> > types of jobs you will be running, I would recommend just to collect any
> > available machines that you have in your data-center (they should not be
> a
> > garage machines, though). Personally, I try to avoid buying hardware, if
> I
> > am not sure what to buy :)
> >
> > If you type "hadoop hardware recommnedations" in Google, you will get
> many
> > interesting links:
> > e.g.
> >
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> >
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> > http://www.youtube.com/watch?v=UQJnJvwcsA8
> >
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
> >
> >
> > 2013/12/12 kishore alajangi <al...@gmail.com>
> >
> >> Hi Experts,
> >>
> >> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> >> Anybody suggest me the hardware specifications, OS and Hadoop version.
> >>
> >> --
> >> Thanks,
> >> Kishore.
> >>
> >
>
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
Again, hardware may depend what types of frameworks and applications, you
aim to run on the YARN cluster. If you mostly run MapReduce jobs, then
there should not be any significant difference. In our case, we simply
migrated our cluster from MRv1 to YARN on the same hardware.

Installing YARN is not necessary - you can choice YARN or MRv1. On the
other hand, YARN has a couple of advantages over MRv1 and I can mention
some of them:

1) Possibility to run alternative frameworks
Without YARN, the Hadoop cluster contains MapReduce computation engine that
can only run MapReduce jobs. On the other hand, with YARN, the Hadoop
cluster becomes much more generic+function, and we can run various types of
distributed applications on the same cluster. These applications can
include MapReduce, but also Tez, (optimized) Giraph, Hama, MPI etc.

2) Better utilization of the cluster's resources
Without YARN, the computational resources in the Hadoop cluster must be
artificially divided into separate map and separate reduce slots. Then, map
tasks can be only run in map slots, and reduce tasks can run only in reduce
slots (a map task can not run in a reduce slot). Due to of that separation,
we can see many situations all map slots are occupied, and we want more of
them, but we will not get them even if many reduce slots are available (and
vice versa).

With YARN, we can say that that we simply have a slot, that can run any
task, not matter whether it is a map or reduce task. Thanks to that we have
perfect utilization of our Hadoop cluster, and all the times our machines
are busy processing our data (assuming that you have jobs to run and
properly configure a scheduler/cluster).

3) Better scalabilty (but it is more related to a large cluster)





2013/12/14 kishore alajangi <al...@gmail.com>

> what makes difference in H/W selection, when we choosed "yarn" to
> install, and is necessary ?
>
> On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> > In general, it is very open question and there are many possibilities
> > depending on your workload (e.g. CPU-bound, IO-bound etc).
> >
> > If it is your first Hadoop cluster, and you do not know too much about
> what
> > types of jobs you will be running, I would recommend just to collect any
> > available machines that you have in your data-center (they should not be
> a
> > garage machines, though). Personally, I try to avoid buying hardware, if
> I
> > am not sure what to buy :)
> >
> > If you type "hadoop hardware recommnedations" in Google, you will get
> many
> > interesting links:
> > e.g.
> >
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> >
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> > http://www.youtube.com/watch?v=UQJnJvwcsA8
> >
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
> >
> >
> > 2013/12/12 kishore alajangi <al...@gmail.com>
> >
> >> Hi Experts,
> >>
> >> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> >> Anybody suggest me the hardware specifications, OS and Hadoop version.
> >>
> >> --
> >> Thanks,
> >> Kishore.
> >>
> >
>
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
Again, hardware may depend what types of frameworks and applications, you
aim to run on the YARN cluster. If you mostly run MapReduce jobs, then
there should not be any significant difference. In our case, we simply
migrated our cluster from MRv1 to YARN on the same hardware.

Installing YARN is not necessary - you can choice YARN or MRv1. On the
other hand, YARN has a couple of advantages over MRv1 and I can mention
some of them:

1) Possibility to run alternative frameworks
Without YARN, the Hadoop cluster contains MapReduce computation engine that
can only run MapReduce jobs. On the other hand, with YARN, the Hadoop
cluster becomes much more generic+function, and we can run various types of
distributed applications on the same cluster. These applications can
include MapReduce, but also Tez, (optimized) Giraph, Hama, MPI etc.

2) Better utilization of the cluster's resources
Without YARN, the computational resources in the Hadoop cluster must be
artificially divided into separate map and separate reduce slots. Then, map
tasks can be only run in map slots, and reduce tasks can run only in reduce
slots (a map task can not run in a reduce slot). Due to of that separation,
we can see many situations all map slots are occupied, and we want more of
them, but we will not get them even if many reduce slots are available (and
vice versa).

With YARN, we can say that that we simply have a slot, that can run any
task, not matter whether it is a map or reduce task. Thanks to that we have
perfect utilization of our Hadoop cluster, and all the times our machines
are busy processing our data (assuming that you have jobs to run and
properly configure a scheduler/cluster).

3) Better scalabilty (but it is more related to a large cluster)





2013/12/14 kishore alajangi <al...@gmail.com>

> what makes difference in H/W selection, when we choosed "yarn" to
> install, and is necessary ?
>
> On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> > In general, it is very open question and there are many possibilities
> > depending on your workload (e.g. CPU-bound, IO-bound etc).
> >
> > If it is your first Hadoop cluster, and you do not know too much about
> what
> > types of jobs you will be running, I would recommend just to collect any
> > available machines that you have in your data-center (they should not be
> a
> > garage machines, though). Personally, I try to avoid buying hardware, if
> I
> > am not sure what to buy :)
> >
> > If you type "hadoop hardware recommnedations" in Google, you will get
> many
> > interesting links:
> > e.g.
> >
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> >
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> > http://www.youtube.com/watch?v=UQJnJvwcsA8
> >
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
> >
> >
> > 2013/12/12 kishore alajangi <al...@gmail.com>
> >
> >> Hi Experts,
> >>
> >> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> >> Anybody suggest me the hardware specifications, OS and Hadoop version.
> >>
> >> --
> >> Thanks,
> >> Kishore.
> >>
> >
>
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
Again, hardware may depend what types of frameworks and applications, you
aim to run on the YARN cluster. If you mostly run MapReduce jobs, then
there should not be any significant difference. In our case, we simply
migrated our cluster from MRv1 to YARN on the same hardware.

Installing YARN is not necessary - you can choice YARN or MRv1. On the
other hand, YARN has a couple of advantages over MRv1 and I can mention
some of them:

1) Possibility to run alternative frameworks
Without YARN, the Hadoop cluster contains MapReduce computation engine that
can only run MapReduce jobs. On the other hand, with YARN, the Hadoop
cluster becomes much more generic+function, and we can run various types of
distributed applications on the same cluster. These applications can
include MapReduce, but also Tez, (optimized) Giraph, Hama, MPI etc.

2) Better utilization of the cluster's resources
Without YARN, the computational resources in the Hadoop cluster must be
artificially divided into separate map and separate reduce slots. Then, map
tasks can be only run in map slots, and reduce tasks can run only in reduce
slots (a map task can not run in a reduce slot). Due to of that separation,
we can see many situations all map slots are occupied, and we want more of
them, but we will not get them even if many reduce slots are available (and
vice versa).

With YARN, we can say that that we simply have a slot, that can run any
task, not matter whether it is a map or reduce task. Thanks to that we have
perfect utilization of our Hadoop cluster, and all the times our machines
are busy processing our data (assuming that you have jobs to run and
properly configure a scheduler/cluster).

3) Better scalabilty (but it is more related to a large cluster)





2013/12/14 kishore alajangi <al...@gmail.com>

> what makes difference in H/W selection, when we choosed "yarn" to
> install, and is necessary ?
>
> On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> > In general, it is very open question and there are many possibilities
> > depending on your workload (e.g. CPU-bound, IO-bound etc).
> >
> > If it is your first Hadoop cluster, and you do not know too much about
> what
> > types of jobs you will be running, I would recommend just to collect any
> > available machines that you have in your data-center (they should not be
> a
> > garage machines, though). Personally, I try to avoid buying hardware, if
> I
> > am not sure what to buy :)
> >
> > If you type "hadoop hardware recommnedations" in Google, you will get
> many
> > interesting links:
> > e.g.
> >
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> >
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> > http://www.youtube.com/watch?v=UQJnJvwcsA8
> >
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
> >
> >
> > 2013/12/12 kishore alajangi <al...@gmail.com>
> >
> >> Hi Experts,
> >>
> >> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> >> Anybody suggest me the hardware specifications, OS and Hadoop version.
> >>
> >> --
> >> Thanks,
> >> Kishore.
> >>
> >
>
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by kishore alajangi <al...@gmail.com>.
what makes difference in H/W selection, when we choosed "yarn" to
install, and is necessary ?

On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> In general, it is very open question and there are many possibilities
> depending on your workload (e.g. CPU-bound, IO-bound etc).
>
> If it is your first Hadoop cluster, and you do not know too much about what
> types of jobs you will be running, I would recommend just to collect any
> available machines that you have in your data-center (they should not be a
> garage machines, though). Personally, I try to avoid buying hardware, if I
> am not sure what to buy :)
>
> If you type "hadoop hardware recommnedations" in Google, you will get many
> interesting links:
> e.g.
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> http://www.youtube.com/watch?v=UQJnJvwcsA8
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
>
>
> 2013/12/12 kishore alajangi <al...@gmail.com>
>
>> Hi Experts,
>>
>> Today I have a task to build hadoop cluster with 4 nodes in hardware.
>> Anybody suggest me the hardware specifications, OS and Hadoop version.
>>
>> --
>> Thanks,
>> Kishore.
>>
>


-- 
Thanks,
Kishore.

Re: Hadoop setup

Posted by kishore alajangi <al...@gmail.com>.
what makes difference in H/W selection, when we choosed "yarn" to
install, and is necessary ?

On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> In general, it is very open question and there are many possibilities
> depending on your workload (e.g. CPU-bound, IO-bound etc).
>
> If it is your first Hadoop cluster, and you do not know too much about what
> types of jobs you will be running, I would recommend just to collect any
> available machines that you have in your data-center (they should not be a
> garage machines, though). Personally, I try to avoid buying hardware, if I
> am not sure what to buy :)
>
> If you type "hadoop hardware recommnedations" in Google, you will get many
> interesting links:
> e.g.
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> http://www.youtube.com/watch?v=UQJnJvwcsA8
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
>
>
> 2013/12/12 kishore alajangi <al...@gmail.com>
>
>> Hi Experts,
>>
>> Today I have a task to build hadoop cluster with 4 nodes in hardware.
>> Anybody suggest me the hardware specifications, OS and Hadoop version.
>>
>> --
>> Thanks,
>> Kishore.
>>
>


-- 
Thanks,
Kishore.

Re: Hadoop setup

Posted by kishore alajangi <al...@gmail.com>.
what makes difference in H/W selection, when we choosed "yarn" to
install, and is necessary ?

On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> In general, it is very open question and there are many possibilities
> depending on your workload (e.g. CPU-bound, IO-bound etc).
>
> If it is your first Hadoop cluster, and you do not know too much about what
> types of jobs you will be running, I would recommend just to collect any
> available machines that you have in your data-center (they should not be a
> garage machines, though). Personally, I try to avoid buying hardware, if I
> am not sure what to buy :)
>
> If you type "hadoop hardware recommnedations" in Google, you will get many
> interesting links:
> e.g.
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> http://www.youtube.com/watch?v=UQJnJvwcsA8
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
>
>
> 2013/12/12 kishore alajangi <al...@gmail.com>
>
>> Hi Experts,
>>
>> Today I have a task to build hadoop cluster with 4 nodes in hardware.
>> Anybody suggest me the hardware specifications, OS and Hadoop version.
>>
>> --
>> Thanks,
>> Kishore.
>>
>


-- 
Thanks,
Kishore.

Re: Hadoop setup

Posted by kishore alajangi <al...@gmail.com>.
what makes difference in H/W selection, when we choosed "yarn" to
install, and is necessary ?

On 12/14/13, Adam Kawa <ka...@gmail.com> wrote:
> In general, it is very open question and there are many possibilities
> depending on your workload (e.g. CPU-bound, IO-bound etc).
>
> If it is your first Hadoop cluster, and you do not know too much about what
> types of jobs you will be running, I would recommend just to collect any
> available machines that you have in your data-center (they should not be a
> garage machines, though). Personally, I try to avoid buying hardware, if I
> am not sure what to buy :)
>
> If you type "hadoop hardware recommnedations" in Google, you will get many
> interesting links:
> e.g.
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
> http://www.youtube.com/watch?v=UQJnJvwcsA8
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
>
>
> 2013/12/12 kishore alajangi <al...@gmail.com>
>
>> Hi Experts,
>>
>> Today I have a task to build hadoop cluster with 4 nodes in hardware.
>> Anybody suggest me the hardware specifications, OS and Hadoop version.
>>
>> --
>> Thanks,
>> Kishore.
>>
>


-- 
Thanks,
Kishore.

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
In general, it is very open question and there are many possibilities
depending on your workload (e.g. CPU-bound, IO-bound etc).

If it is your first Hadoop cluster, and you do not know too much about what
types of jobs you will be running, I would recommend just to collect any
available machines that you have in your data-center (they should not be a
garage machines, though). Personally, I try to avoid buying hardware, if I
am not sure what to buy :)

If you type "hadoop hardware recommnedations" in Google, you will get many
interesting links:
e.g.
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
http://www.youtube.com/watch?v=UQJnJvwcsA8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html


2013/12/12 kishore alajangi <al...@gmail.com>

> Hi Experts,
>
> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> Anybody suggest me the hardware specifications, OS and Hadoop version.
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
In general, it is very open question and there are many possibilities
depending on your workload (e.g. CPU-bound, IO-bound etc).

If it is your first Hadoop cluster, and you do not know too much about what
types of jobs you will be running, I would recommend just to collect any
available machines that you have in your data-center (they should not be a
garage machines, though). Personally, I try to avoid buying hardware, if I
am not sure what to buy :)

If you type "hadoop hardware recommnedations" in Google, you will get many
interesting links:
e.g.
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
http://www.youtube.com/watch?v=UQJnJvwcsA8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html


2013/12/12 kishore alajangi <al...@gmail.com>

> Hi Experts,
>
> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> Anybody suggest me the hardware specifications, OS and Hadoop version.
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
In general, it is very open question and there are many possibilities
depending on your workload (e.g. CPU-bound, IO-bound etc).

If it is your first Hadoop cluster, and you do not know too much about what
types of jobs you will be running, I would recommend just to collect any
available machines that you have in your data-center (they should not be a
garage machines, though). Personally, I try to avoid buying hardware, if I
am not sure what to buy :)

If you type "hadoop hardware recommnedations" in Google, you will get many
interesting links:
e.g.
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
http://www.youtube.com/watch?v=UQJnJvwcsA8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html


2013/12/12 kishore alajangi <al...@gmail.com>

> Hi Experts,
>
> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> Anybody suggest me the hardware specifications, OS and Hadoop version.
>
> --
> Thanks,
> Kishore.
>

Re: Hadoop setup

Posted by Adam Kawa <ka...@gmail.com>.
In general, it is very open question and there are many possibilities
depending on your workload (e.g. CPU-bound, IO-bound etc).

If it is your first Hadoop cluster, and you do not know too much about what
types of jobs you will be running, I would recommend just to collect any
available machines that you have in your data-center (they should not be a
garage machines, though). Personally, I try to avoid buying hardware, if I
am not sure what to buy :)

If you type "hadoop hardware recommnedations" in Google, you will get many
interesting links:
e.g.
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
http://www.youtube.com/watch?v=UQJnJvwcsA8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_cluster-planning-guide/content/ch_hardware-recommendations.html


2013/12/12 kishore alajangi <al...@gmail.com>

> Hi Experts,
>
> Today I have a task to build hadoop cluster with 4 nodes in hardware.
> Anybody suggest me the hardware specifications, OS and Hadoop version.
>
> --
> Thanks,
> Kishore.
>