You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2011/09/28 08:38:41 UTC

hadoop question using VMWARE

Hi,

Suppose I am having 10 windows machines and if I have 10 VM individual
instances running on these machines independently, can I use these VM
instances to communicate with each other so that I can make hadoop cluster
using those VM instances.

Did anyone tried that thing ?

I know we can setup multiple VM instances on same machine, but can we do it
across different machines also ?
And if I do like this, Is it a good approach, considering I don't have
dedicated ubuntu machines for hadoop ?

Thanks,
Praveenesh

Re: hadoop question using VMWARE

Posted by Steve Loughran <st...@apache.org>.
On 28/09/11 08:37, N Keywal wrote:
> For example:
> - It's adding two layers (windows&  linux), that can both fail, especially
> under heavy workload (and hadoop is built to use all the resources
> available). They will need to be managed as well (software upgrades,
> hardware support...), it's an extra cost.
> - These two layers will use randomly the different resources (HDD,
> CPU,network) making issues and performance analysis more complicated.
> - there will be a real performance impact. It's depends on what you do, and
> how is configured Windows&  vmware, but on my non optimized laptop I lose
> more than 50%. VMWare claims 15% max, but it's without Windows (using direct
> ESX)


Where you take a big hit is in disk IO, as what your OS thinks is a disk 
with sequentially stored files is just a single file in the host OS that 
may be scattered round the real HDD. Disk IO goes through too many 
layers. It's often faster to NFS mount the real HDD.

For compute intensive work, the performance hit isn't so bad, at least 
provided you don't swap.

> - Last time I checked (a few months ago), vmware was not able to use all the
> core&  memory of medium sized servers.

Same with VirtualBox, which I like because it is lighter weight.

I use VMs because the infrastructure provides it; things like ElasticMR 
from AWS also offer it. Your code may be slower, but what you get is the 
ability to bring up clusters on a pay-per-hour basis, and the ability to 
vary the #of machines based on the workload/execution plan. If you can 
compensate for the IO hit by renting four more servers, you may still 
come out ahead.

http://www.slideshare.net/steve_l/farming-hadoop-inthecloud

Re: hadoop question using VMWARE

Posted by N Keywal <nk...@gmail.com>.
For example:
- It's adding two layers (windows & linux), that can both fail, especially
under heavy workload (and hadoop is built to use all the resources
available). They will need to be managed as well (software upgrades,
hardware support...), it's an extra cost.
- These two layers will use randomly the different resources (HDD,
CPU,network) making issues and performance analysis more complicated.
- there will be a real performance impact. It's depends on what you do, and
how is configured Windows & vmware, but on my non optimized laptop I lose
more than 50%. VMWare claims 15% max, but it's without Windows (using direct
ESX)
- Last time I checked (a few months ago), vmware was not able to use all the
core & memory of medium sized servers.
- The namenode needs to be secured, as it's a spof.

On Wed, Sep 28, 2011 at 9:07 AM, praveenesh kumar <pr...@gmail.com>wrote:

>  "it's not something you can do for production nor performance
> analysis."
> Can you please tell me what does it mean ?
> Why Can't we use this approach for production ???
>
> Thanks
>
> On Tue, Sep 27, 2011 at 11:56 PM, N Keywal <nk...@gmail.com> wrote:
>
> > Hi,
> >
> > Yes, it will work. HBase won't see the difference, it's a pure vmware
> > stuff.
> > Obviously, it's not something you can do for production nor performance
> > analysis.
> >
> > Cheers,
> >
> > N.
> >
> > On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar <praveenesh@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > Suppose I am having 10 windows machines and if I have 10 VM individual
> > > instances running on these machines independently, can I use these VM
> > > instances to communicate with each other so that I can make hadoop
> > cluster
> > > using those VM instances.
> > >
> > > Did anyone tried that thing ?
> > >
> > > I know we can setup multiple VM instances on same machine, but can we
> do
> > it
> > > across different machines also ?
> > > And if I do like this, Is it a good approach, considering I don't have
> > > dedicated ubuntu machines for hadoop ?
> > >
> > > Thanks,
> > > Praveenesh
> > >
> >
>

Re: hadoop question using VMWARE

Posted by praveenesh kumar <pr...@gmail.com>.
 "it's not something you can do for production nor performance
analysis."
Can you please tell me what does it mean ?
Why Can't we use this approach for production ???

Thanks

On Tue, Sep 27, 2011 at 11:56 PM, N Keywal <nk...@gmail.com> wrote:

> Hi,
>
> Yes, it will work. HBase won't see the difference, it's a pure vmware
> stuff.
> Obviously, it's not something you can do for production nor performance
> analysis.
>
> Cheers,
>
> N.
>
> On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar <praveenesh@gmail.com
> >wrote:
>
> > Hi,
> >
> > Suppose I am having 10 windows machines and if I have 10 VM individual
> > instances running on these machines independently, can I use these VM
> > instances to communicate with each other so that I can make hadoop
> cluster
> > using those VM instances.
> >
> > Did anyone tried that thing ?
> >
> > I know we can setup multiple VM instances on same machine, but can we do
> it
> > across different machines also ?
> > And if I do like this, Is it a good approach, considering I don't have
> > dedicated ubuntu machines for hadoop ?
> >
> > Thanks,
> > Praveenesh
> >
>

Re: hadoop question using VMWARE

Posted by N Keywal <nk...@gmail.com>.
Hi,

Yes, it will work. HBase won't see the difference, it's a pure vmware stuff.
Obviously, it's not something you can do for production nor performance
analysis.

Cheers,

N.

On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi,
>
> Suppose I am having 10 windows machines and if I have 10 VM individual
> instances running on these machines independently, can I use these VM
> instances to communicate with each other so that I can make hadoop cluster
> using those VM instances.
>
> Did anyone tried that thing ?
>
> I know we can setup multiple VM instances on same machine, but can we do it
> across different machines also ?
> And if I do like this, Is it a good approach, considering I don't have
> dedicated ubuntu machines for hadoop ?
>
> Thanks,
> Praveenesh
>