You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vishal Kapoor <vi...@gmail.com> on 2011/05/09 21:24:12 UTC
VMWare and Hadoop/Hbase
We were wondering if its advisable to provision hbase/hadoop nodes as VMWare
instances?
any suggestions?
thanks,
Vishal
RE: VMWare and Hadoop/Hbase
Posted by Doug Meil <do...@explorysmedical.com>.
For a dev cluster (i.e., something where you aren't trying to do performance testing) it's a reasonable approach. But I wouldn't do it on a production cluster.
-----Original Message-----
From: Vishal Kapoor [mailto:vishal.kapoor.in@gmail.com]
Sent: Monday, May 09, 2011 3:24 PM
To: user@hbase.apache.org
Subject: VMWare and Hadoop/Hbase
We were wondering if its advisable to provision hbase/hadoop nodes as VMWare instances?
any suggestions?
thanks,
Vishal
Re: Adding new disks to an Hadoop Cluster
Posted by lohit <lo...@gmail.com>.
Yes, you have to bounce datanode so that it can start using the disk. Also
note that you have to tell datanode to use this disk via dfs.data.dir config
parameter in hdfs-site.xml. Same with tasktracker, if you want tasktracker
to use this disk for its temp output, you have to tell it via
mapred-site.xml
2011/5/9 Pete Haidinyak <ja...@cox.net>
> Hi all,
> When you add a disk to a Hadoop data node do you have to bounce the node
> (restart mapreduce and dfs) before Hadoop can use the new disk?
>
> Thanks
>
> -Pete
>
>
--
Have a Nice Day!
Lohit
Adding new disks to an Hadoop Cluster
Posted by Pete Haidinyak <ja...@cox.net>.
Hi all,
When you add a disk to a Hadoop data node do you have to bounce the
node (restart mapreduce and dfs) before Hadoop can use the new disk?
Thanks
-Pete
Re: VMWare and Hadoop/Hbase
Posted by "M.Deniz OKTAR" <de...@gmail.com>.
Thanks david :)
--
M.Deniz OKTAR
iletken Recommendation Technologies
http://www.iletken.com.tr
Tel: +90(212)328-0290
GSM: +90(533)477-6358
On Sat, May 14, 2011 at 12:39 AM, Buttler, David <bu...@llnl.gov> wrote:
> This is a fantastic test and should be made more public. Great work
> Dave
>
>
> -----Original Message-----
> From: M.Deniz OKTAR [mailto:deniz.oktar@gmail.com]
> Sent: Friday, May 13, 2011 2:20 PM
> To: user@hbase.apache.org; apurtell@apache.org
> Subject: Re: VMWare and Hadoop/Hbase
>
> I'v tried this in our internal tests with Xen,
>
> We tried to see if the performance degrades consistently with the amount of
> resources we take away from the standalone machine to the visualized one.
>
> The cluster I tested is below, took one disk (25% of 4) ,1 core with
> medium
> priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
> virtual machine and the results of the virtual machines with the
> remaining resources were awful . Writing performance degraded around 25%,
> which is cool. But the read was around 3x slower which pointed out that
> Hypervisor is not so good at handling huge random disk accesses.
>
> Tests: Yahoo benchmark 100M
>
> Cluster:
> 5x cluster (1 namenode 4 data nodes)
> xeon 5620
> 12gb ram
> 4x sata 7200 rpm drives
>
> Results:
>
> === TEST 1: 100M inserts ===
>
> ---STANDALONE CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p
> columnfamily=values
> -P myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 5374309.0
> [OVERALL], Throughput(ops/sec), 18607.043249653118
> [INSERT], Operations, 100000000
> [INSERT], AverageLatency(ms), 4.88454509
> [INSERT], MinLatency(ms), 0
> [INSERT], MaxLatency(ms), 92837
>
>
>
>
> ---VIRTUAL CLUSTER---
>
> Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p
> columnfamily=values
> -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 6912776.0
> [OVERALL], Throughput(ops/sec), 14465.968519737946
> [INSERT], Operations, 100000000
> [INSERT], AverageLatency(ms), 6.36836144
> [INSERT], MinLatency(ms), 0
> [INSERT], MaxLatency(ms), 104990
>
>
> === TEST2: 100M read/update/write (transaction) ===
>
>
>
> ---STANDALONE CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 9012970.0
> [OVERALL], Throughput(ops/sec), 3328.53654233843
> [UPDATE], Operations, 4500829
> [UPDATE], AverageLatency(ms), 0.09758402285445637
> [UPDATE], MinLatency(ms), 0
> [UPDATE], MaxLatency(ms), 5477
>
>
>
> ---VIRTUAL CLUSTER---
>
> YCSB Client 0.1
> Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
> myworkloads/4 -threads 100 -s
> [OVERALL], RunTime(ms), 2.0272831E7
> [OVERALL], Throughput(ops/sec), 1479.813056203152
> [UPDATE], Operations, 4502501
> [UPDATE], AverageLatency(ms), 2.803586717693122
> [UPDATE], MinLatency(ms), 0
> [UPDATE], MaxLatency(ms), 560605
>
> --
> M.Deniz OKTAR
>
> iletken Recommendation Technologies
> http://www.iletken.com.tr
> Tel: +90(212)328-0290
> GSM: +90(533)477-6358
>
>
>
> On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > It is not advisable to do this.
> >
> > Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> > Why add the overhead of Hypervisor mediation on the I/O path then?
> >
> > --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
> >
> > > From: Vishal Kapoor <vi...@gmail.com>
> > > Subject: VMWare and Hadoop/Hbase
> > > To: user@hbase.apache.org
> > > Date: Monday, May 9, 2011, 12:24 PM
> > > We were wondering if its advisable to
> > > provision hbase/hadoop nodes as VMWare
> > > instances?
> > > any suggestions?
> > >
> > > thanks,
> > > Vishal
> > >
> >
>
RE: VMWare and Hadoop/Hbase
Posted by "Buttler, David" <bu...@llnl.gov>.
This is a fantastic test and should be made more public. Great work
Dave
-----Original Message-----
From: M.Deniz OKTAR [mailto:deniz.oktar@gmail.com]
Sent: Friday, May 13, 2011 2:20 PM
To: user@hbase.apache.org; apurtell@apache.org
Subject: Re: VMWare and Hadoop/Hbase
I'v tried this in our internal tests with Xen,
We tried to see if the performance degrades consistently with the amount of
resources we take away from the standalone machine to the visualized one.
The cluster I tested is below, took one disk (25% of 4) ,1 core with medium
priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
virtual machine and the results of the virtual machines with the
remaining resources were awful . Writing performance degraded around 25%,
which is cool. But the read was around 3x slower which pointed out that
Hypervisor is not so good at handling huge random disk accesses.
Tests: Yahoo benchmark 100M
Cluster:
5x cluster (1 namenode 4 data nodes)
xeon 5620
12gb ram
4x sata 7200 rpm drives
Results:
=== TEST 1: 100M inserts ===
---STANDALONE CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 5374309.0
[OVERALL], Throughput(ops/sec), 18607.043249653118
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 4.88454509
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 92837
---VIRTUAL CLUSTER---
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 6912776.0
[OVERALL], Throughput(ops/sec), 14465.968519737946
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 6.36836144
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 104990
=== TEST2: 100M read/update/write (transaction) ===
---STANDALONE CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 9012970.0
[OVERALL], Throughput(ops/sec), 3328.53654233843
[UPDATE], Operations, 4500829
[UPDATE], AverageLatency(ms), 0.09758402285445637
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 5477
---VIRTUAL CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 2.0272831E7
[OVERALL], Throughput(ops/sec), 1479.813056203152
[UPDATE], Operations, 4502501
[UPDATE], AverageLatency(ms), 2.803586717693122
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 560605
--
M.Deniz OKTAR
iletken Recommendation Technologies
http://www.iletken.com.tr
Tel: +90(212)328-0290
GSM: +90(533)477-6358
On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org> wrote:
> It is not advisable to do this.
>
> Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> Why add the overhead of Hypervisor mediation on the I/O path then?
>
> --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
>
> > From: Vishal Kapoor <vi...@gmail.com>
> > Subject: VMWare and Hadoop/Hbase
> > To: user@hbase.apache.org
> > Date: Monday, May 9, 2011, 12:24 PM
> > We were wondering if its advisable to
> > provision hbase/hadoop nodes as VMWare
> > instances?
> > any suggestions?
> >
> > thanks,
> > Vishal
> >
>
Re: VMWare and Hadoop/Hbase
Posted by "M.Deniz OKTAR" <de...@gmail.com>.
I'v tried this in our internal tests with Xen,
We tried to see if the performance degrades consistently with the amount of
resources we take away from the standalone machine to the visualized one.
The cluster I tested is below, took one disk (25% of 4) ,1 core with medium
priority (12,5% of 8) and ~1100 mb of memory (10%) to the second
virtual machine and the results of the virtual machines with the
remaining resources were awful . Writing performance degraded around 25%,
which is cool. But the read was around 3x slower which pointed out that
Hypervisor is not so good at handling huge random disk accesses.
Tests: Yahoo benchmark 100M
Cluster:
5x cluster (1 namenode 4 data nodes)
xeon 5620
12gb ram
4x sata 7200 rpm drives
Results:
=== TEST 1: 100M inserts ===
---STANDALONE CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 5374309.0
[OVERALL], Throughput(ops/sec), 18607.043249653118
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 4.88454509
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 92837
---VIRTUAL CLUSTER---
Command line: -db com.yahoo.ycsb.db.HBaseClient -load -p columnfamily=values
-P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 6912776.0
[OVERALL], Throughput(ops/sec), 14465.968519737946
[INSERT], Operations, 100000000
[INSERT], AverageLatency(ms), 6.36836144
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 104990
=== TEST2: 100M read/update/write (transaction) ===
---STANDALONE CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 9012970.0
[OVERALL], Throughput(ops/sec), 3328.53654233843
[UPDATE], Operations, 4500829
[UPDATE], AverageLatency(ms), 0.09758402285445637
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 5477
---VIRTUAL CLUSTER---
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -p columnfamily=values -P
myworkloads/4 -threads 100 -s
[OVERALL], RunTime(ms), 2.0272831E7
[OVERALL], Throughput(ops/sec), 1479.813056203152
[UPDATE], Operations, 4502501
[UPDATE], AverageLatency(ms), 2.803586717693122
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 560605
--
M.Deniz OKTAR
iletken Recommendation Technologies
http://www.iletken.com.tr
Tel: +90(212)328-0290
GSM: +90(533)477-6358
On Mon, May 9, 2011 at 10:41 PM, Andrew Purtell <ap...@apache.org> wrote:
> It is not advisable to do this.
>
> Hadoop/HBase is very I/O intensive. They should have dedicated hardware.
> Why add the overhead of Hypervisor mediation on the I/O path then?
>
> --- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
>
> > From: Vishal Kapoor <vi...@gmail.com>
> > Subject: VMWare and Hadoop/Hbase
> > To: user@hbase.apache.org
> > Date: Monday, May 9, 2011, 12:24 PM
> > We were wondering if its advisable to
> > provision hbase/hadoop nodes as VMWare
> > instances?
> > any suggestions?
> >
> > thanks,
> > Vishal
> >
>
Re: VMWare and Hadoop/Hbase
Posted by Andrew Purtell <ap...@apache.org>.
It is not advisable to do this.
Hadoop/HBase is very I/O intensive. They should have dedicated hardware. Why add the overhead of Hypervisor mediation on the I/O path then?
--- On Mon, 5/9/11, Vishal Kapoor <vi...@gmail.com> wrote:
> From: Vishal Kapoor <vi...@gmail.com>
> Subject: VMWare and Hadoop/Hbase
> To: user@hbase.apache.org
> Date: Monday, May 9, 2011, 12:24 PM
> We were wondering if its advisable to
> provision hbase/hadoop nodes as VMWare
> instances?
> any suggestions?
>
> thanks,
> Vishal
>