You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vishal Kapoor <vi...@gmail.com> on 2011/05/09 21:24:12 UTC

VMWare and Hadoop/Hbase

We were wondering if its advisable to provision hbase/hadoop nodes as VMWare
instances?
any suggestions?

thanks,
Vishal

RE: VMWare and Hadoop/Hbase

Posted by Doug Meil <do...@explorysmedical.com>.

For a dev cluster (i.e., something where you aren't trying to do performance testing) it's a reasonable approach.  But I wouldn't do it on a production cluster.

-----Original Message-----
From: Vishal Kapoor [mailto:vishal.kapoor.in@gmail.com] 
Sent: Monday, May 09, 2011 3:24 PM
To: user@hbase.apache.org
Subject: VMWare and Hadoop/Hbase

We were wondering if its advisable to provision hbase/hadoop nodes as VMWare instances?
any suggestions?

thanks,
Vishal

Re: Adding new disks to an Hadoop Cluster

Posted by lohit <lo...@gmail.com>.

Yes, you have to bounce datanode so that it can start using the disk. Also
note that you have to tell datanode to use this disk via dfs.data.dir config
parameter in hdfs-site.xml. Same with tasktracker, if you want tasktracker
to use this disk for its temp output, you have to tell it via
mapred-site.xml

2011/5/9 Pete Haidinyak <ja...@cox.net>

> Hi all,
>   When you add a disk to a Hadoop data node do you have to bounce the node
> (restart mapreduce and dfs) before Hadoop can use the new disk?
>
> Thanks
>
> -Pete
>
>


-- 
Have a Nice Day!
Lohit

Adding new disks to an Hadoop Cluster

Posted by Pete Haidinyak <ja...@cox.net>.

Hi all,
    When you add a disk to a Hadoop data node do you have to bounce the  
node (restart mapreduce and dfs) before Hadoop can use the new disk?

Thanks

-Pete

Re: VMWare and Hadoop/Hbase

Posted by "M.Deniz OKTAR" <de...@gmail.com>.

Thanks david :)

--
M.Deniz OKTAR

iletken Recommendation Technologies
http://www.iletken.com.tr
Tel:      +90(212)328-0290
GSM:   +90(533)477-6358



On Sat, May 14, 2011 at 12:39 AM, Buttler, David <bu...@llnl.gov> wrote:

> This is a fantastic test and should be made more public.  Great work
> Dave
>
>
> -----Original Message-----
> From: M.Deniz OKTAR [mailto:deniz.oktar@gmail.com]
> Sent: Friday, May 13, 2011 2:20 PM
> To: user@hbase.apache.org; apurtell@apache.org
> Subject: Re: VMWare and Hadoop/Hbase
>
> I'v tried this in our internal tests with Xen,
>
> We tried to see if the performance degrades consistently with the amount of
> resources we take away from the standalone machine to the visualized one.
>
> The cluster I tested is below, took one disk  (25% of 4) ,1 core with
> medium
> priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
> virtual machine and the results of the virtual machines with the
> remaining resources were awful . Writing performance degraded around 25%,
> which is cool. But the read was around 3x slower which pointed out that
> Hypervisor is not so good at handling huge random disk accesses.
>
> Tests: Yahoo benchmark  100M
>
> Cluster:
> 5x cluster (1 namenode 4 data nodes)
> xeon 5620
> 12gb ram
> 4x sata 7200 rpm drives
>
> Results:
>
> === TEST 1: 100M inserts ===
>
> ---STANDALONE CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p
> columnfamily=values
> -P myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 5374309.0
> [OVERALL], Throughput(ops/sec), 18607.043249653118
> [INSERT], Operations, 100000000
> [INSERT], AverageLatency(ms), 4.88454509
> [INSERT], MinLatency(ms), 0
> [INSERT], MaxLatency(ms), 92837
>
>
>
>
> ---VIRTUAL CLUSTER---
>
> Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p
> columnfamily=values
> -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 6912776.0
> [OVERALL], Throughput(ops/sec), 14465.968519737946
> [INSERT], Operations, 100000000
> [INSERT], AverageLatency(ms), 6.36836144
> [INSERT], MinLatency(ms), 0
> [INSERT], MaxLatency(ms), 104990
>
>
> === TEST2: 100M read/update/write (transaction) ===
>
>
>
> ---STANDALONE CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 9012970.0
> [OVERALL], Throughput(ops/sec), 3328.53654233843
> [UPDATE], Operations, 4500829
> [UPDATE], AverageLatency(ms), 0.09758402285445637
> [UPDATE], MinLatency(ms), 0
> [UPDATE], MaxLatency(ms), 5477
>
>
>
> ---VIRTUAL CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 2.0272831E7
> [OVERALL], Throughput(ops/sec), 1479.813056203152
> [UPDATE], Operations, 4502501
> [UPDATE], AverageLatency(ms), 2.803586717693122
> [UPDATE], MinLatency(ms), 0
> [UPDATE], MaxLatency(ms), 560605
>
> --
> M.Deniz OKTAR
>
> iletken Recommendation Technologies
> http://www.iletken.com.tr
> Tel:      +90(212)328-0290
> GSM:   +90(533)477-6358
>
>
>
> On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > It is not advisable to do this.
> >
> > Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> > Why add the overhead of Hypervisor mediation on the I/O path then?
> >
> > --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
> >
> > > From: Vishal Kapoor <vi...@gmail.com>
> > > Subject: VMWare and Hadoop/Hbase
> > > To: user@hbase.apache.org
> > > Date: Monday, May 9, 2011, 12:24 PM
> > > We were wondering if its advisable to
> > > provision hbase/hadoop nodes as VMWare
> > > instances?
> > > any suggestions?
> > >
> > > thanks,
> > > Vishal
> > >
> >
>

RE: VMWare and Hadoop/Hbase

Posted by "Buttler, David" <bu...@llnl.gov>.

This is a fantastic test and should be made more public.  Great work
Dave


-----Original Message-----
From: M.Deniz OKTAR [mailto:deniz.oktar@gmail.com] 
Sent: Friday, May 13, 2011 2:20 PM
To: user@hbase.apache.org; apurtell@apache.org
Subject: Re: VMWare and Hadoop/Hbase

I'v tried this in our internal tests with Xen,

We tried to see if the performance degrades consistently with the amount of
resources we take away from the standalone machine to the visualized one.

The cluster I tested is below, took one disk  (25% of 4) ,1 core with medium
priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
virtual machine and the results of the virtual machines with the
remaining resources were awful . Writing performance degraded around 25%,
which is cool. But the read was around 3x slower which pointed out that
Hypervisor is not so good at handling huge random disk accesses.

Tests: Yahoo benchmark  100M

Cluster:
5x cluster (1 namenode 4 data nodes)
xeon 5620
12gb ram
4x sata 7200 rpm drives

Results:

=== TEST 1: 100M inserts ===

---STANDALONE CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 5374309.0
[OVERALL], Throughput(ops/sec), 18607.043249653118
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 4.88454509
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 92837




---VIRTUAL CLUSTER---

Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 6912776.0
[OVERALL], Throughput(ops/sec), 14465.968519737946
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 6.36836144
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 104990


=== TEST2: 100M read/update/write (transaction) ===



---STANDALONE CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 9012970.0
[OVERALL], Throughput(ops/sec), 3328.53654233843
[UPDATE], Operations, 4500829
[UPDATE], AverageLatency(ms), 0.09758402285445637
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 5477



---VIRTUAL CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 2.0272831E7
[OVERALL], Throughput(ops/sec), 1479.813056203152
[UPDATE], Operations, 4502501
[UPDATE], AverageLatency(ms), 2.803586717693122
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 560605

--
M.Deniz OKTAR

iletken Recommendation Technologies
http://www.iletken.com.tr
Tel:      +90(212)328-0290
GSM:   +90(533)477-6358



On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org> wrote:

> It is not advisable to do this.
>
> Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> Why add the overhead of Hypervisor mediation on the I/O path then?
>
> --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
>
> > From: Vishal Kapoor <vi...@gmail.com>
> > Subject: VMWare and Hadoop/Hbase
> > To: user@hbase.apache.org
> > Date: Monday, May 9, 2011, 12:24 PM
> > We were wondering if its advisable to
> > provision hbase/hadoop nodes as VMWare
> > instances?
> > any suggestions?
> >
> > thanks,
> > Vishal
> >
>

Re: VMWare and Hadoop/Hbase

Posted by "M.Deniz OKTAR" <de...@gmail.com>.

I'v tried this in our internal tests with Xen,

We tried to see if the performance degrades consistently with the amount of
resources we take away from the standalone machine to the visualized one.

The cluster I tested is below, took one disk  (25% of 4) ,1 core with medium
priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
virtual machine and the results of the virtual machines with the
remaining resources were awful . Writing performance degraded around 25%,
which is cool. But the read was around 3x slower which pointed out that
Hypervisor is not so good at handling huge random disk accesses.

Tests: Yahoo benchmark  100M

Cluster:
5x cluster (1 namenode 4 data nodes)
xeon 5620
12gb ram
4x sata 7200 rpm drives

Results:

=== TEST 1: 100M inserts ===

---STANDALONE CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 5374309.0
[OVERALL], Throughput(ops/sec), 18607.043249653118
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 4.88454509
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 92837




---VIRTUAL CLUSTER---

Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 6912776.0
[OVERALL], Throughput(ops/sec), 14465.968519737946
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 6.36836144
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 104990


=== TEST2: 100M read/update/write (transaction) ===



---STANDALONE CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 9012970.0
[OVERALL], Throughput(ops/sec), 3328.53654233843
[UPDATE], Operations, 4500829
[UPDATE], AverageLatency(ms), 0.09758402285445637
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 5477



---VIRTUAL CLUSTER---

YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 2.0272831E7
[OVERALL], Throughput(ops/sec), 1479.813056203152
[UPDATE], Operations, 4502501
[UPDATE], AverageLatency(ms), 2.803586717693122
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 560605

--
M.Deniz OKTAR

iletken Recommendation Technologies
http://www.iletken.com.tr
Tel:      +90(212)328-0290
GSM:   +90(533)477-6358



On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org> wrote:

> It is not advisable to do this.
>
> Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> Why add the overhead of Hypervisor mediation on the I/O path then?
>
> --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
>
> > From: Vishal Kapoor <vi...@gmail.com>
> > Subject: VMWare and Hadoop/Hbase
> > To: user@hbase.apache.org
> > Date: Monday, May 9, 2011, 12:24 PM
> > We were wondering if its advisable to
> > provision hbase/hadoop nodes as VMWare
> > instances?
> > any suggestions?
> >
> > thanks,
> > Vishal
> >
>

Re: VMWare and Hadoop/Hbase

Posted by Andrew Purtell <ap...@apache.org>.

It is not advisable to do this.

Hadoop/HBase is very I/O intensive. They should have dedicated hardware. Why add the overhead of Hypervisor mediation on the I/O path then?

--- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:

> From: Vishal Kapoor <vi...@gmail.com>
> Subject: VMWare and Hadoop/Hbase
> To: user@hbase.apache.org
> Date: Monday, May 9, 2011, 12:24 PM
> We were wondering if its advisable to
> provision hbase/hadoop nodes as VMWare
> instances?
> any suggestions?
> 
> thanks,
> Vishal
>