You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ronen Itkin <ro...@taykey.com> on 2011/09/15 11:01:21 UTC

Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection

 Hi,

I am wondering if someone can recommend on the best practice with selecting
the right AMAZON EC2 instances combination for the following implementation:

Cloudera Hadoop HDFS and MapReduce:

   - 1 NameNode + JobTracker servers.
   - 1 SecondaryNameNode server.
   - 3 DataNodes + TastTrackers.


Cloudera HBase:

   - 2 HMaster servers
   - 3 ZooKeeper Servers
   - 2 Region Servers.


>From your own experience what AMAZON EC2 instances should I choose?
How would you combine and place the above implementation across the
instances?
Should I place datanode & task tracker with HRegionServer on the same
instance?

Thanks !

-- 
*
Ronen.*

<http://www.taykey.com/>

Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection

Posted by Ronen Itkin <ro...@taykey.com>.
Thanks Gary!!


On Thu, Sep 15, 2011 at 10:34 PM, Gary Helmling <gh...@gmail.com> wrote:

> Running on EC2 has been discussed on the list quite a bit in the past, so
> you might want to do some searches on the archives.  Here are a few threads
> I pulled up:
>
> http://search-hadoop.com/m/paQmKTxSgj
>
> http://search-hadoop.com/m/7E9PaA6U1V
>
> http://search-hadoop.com/m/sGXTATdlIg2
>
> For instance types, it appears that only c1.xlarge, m2.4xlarge and
> cc1.xlarge instances will get you a physical server for each instance, so
> you will pay the least IO virtualization "tax" using these with instance
> storage.  But even with that expect reduced IO performance vs physical
> hardware.
>
> For the node layout, I'd suggest something like:
>
> 1 - NameNode, JobTracker, ZooKeeper, HMaster
> 1 - SecondaryNameNode, HMaster
> 3 - DataNode, TaskTracker, RegionServer
>
> You could run more ZK instances on smaller instance types (m1.medium?), but
> beware that these could be more subject to erratic IO throughput due to
> other instances running on the same physical server, which could negatively
> impact zookeeper performance and overall cluster stability.  So for a
> cluster this small, I don't think I would bother.
>
> For instance types, it'll depend on your workload and memory requirements.
> I usually use c1.xlarge for HBase testing, but those have somewhat limited
> memory, so you'll be constrained on the number of MR tasks you can run
> without overcommitting memory (you want to avoid swapping at all costs).
>
> I would say to do some testing with your workload and see what instance
> types give you the best performance at an acceptable price.
>
> --gh
>
>
> On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <ro...@taykey.com> wrote:
>
> >  Hi,
> >
> > I am wondering if someone can recommend on the best practice with
> selecting
> > the right AMAZON EC2 instances combination for the following
> > implementation:
> >
> > Cloudera Hadoop HDFS and MapReduce:
> >
> >   - 1 NameNode + JobTracker servers.
> >   - 1 SecondaryNameNode server.
> >   - 3 DataNodes + TastTrackers.
> >
> >
> > Cloudera HBase:
> >
> >   - 2 HMaster servers
> >   - 3 ZooKeeper Servers
> >   - 2 Region Servers.
> >
> >
> > From your own experience what AMAZON EC2 instances should I choose?
> > How would you combine and place the above implementation across the
> > instances?
> > Should I place datanode & task tracker with HRegionServer on the same
> > instance?
> >
> > Thanks !
> >
> > --
> > *
> > Ronen.*
> >
> > <http://www.taykey.com/>
> >
>



-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection

Posted by Gary Helmling <gh...@gmail.com>.
Running on EC2 has been discussed on the list quite a bit in the past, so
you might want to do some searches on the archives.  Here are a few threads
I pulled up:

http://search-hadoop.com/m/paQmKTxSgj

http://search-hadoop.com/m/7E9PaA6U1V

http://search-hadoop.com/m/sGXTATdlIg2

For instance types, it appears that only c1.xlarge, m2.4xlarge and
cc1.xlarge instances will get you a physical server for each instance, so
you will pay the least IO virtualization "tax" using these with instance
storage.  But even with that expect reduced IO performance vs physical
hardware.

For the node layout, I'd suggest something like:

1 - NameNode, JobTracker, ZooKeeper, HMaster
1 - SecondaryNameNode, HMaster
3 - DataNode, TaskTracker, RegionServer

You could run more ZK instances on smaller instance types (m1.medium?), but
beware that these could be more subject to erratic IO throughput due to
other instances running on the same physical server, which could negatively
impact zookeeper performance and overall cluster stability.  So for a
cluster this small, I don't think I would bother.

For instance types, it'll depend on your workload and memory requirements.
I usually use c1.xlarge for HBase testing, but those have somewhat limited
memory, so you'll be constrained on the number of MR tasks you can run
without overcommitting memory (you want to avoid swapping at all costs).

I would say to do some testing with your workload and see what instance
types give you the best performance at an acceptable price.

--gh


On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <ro...@taykey.com> wrote:

>  Hi,
>
> I am wondering if someone can recommend on the best practice with selecting
> the right AMAZON EC2 instances combination for the following
> implementation:
>
> Cloudera Hadoop HDFS and MapReduce:
>
>   - 1 NameNode + JobTracker servers.
>   - 1 SecondaryNameNode server.
>   - 3 DataNodes + TastTrackers.
>
>
> Cloudera HBase:
>
>   - 2 HMaster servers
>   - 3 ZooKeeper Servers
>   - 2 Region Servers.
>
>
> From your own experience what AMAZON EC2 instances should I choose?
> How would you combine and place the above implementation across the
> instances?
> Should I place datanode & task tracker with HRegionServer on the same
> instance?
>
> Thanks !
>
> --
> *
> Ronen.*
>
> <http://www.taykey.com/>
>