You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tim Dunphy <bl...@gmail.com> on 2014/11/04 20:28:36 UTC

Hadoop Learning Environment

Hey all,

 I want to setup an environment where I can teach myself hadoop. Usually
the way I'll handle this is to grab a machine off the Amazon free tier and
setup whatever software I want.

However I realize that Hadoop is a memory intensive, big data solution. So
what I'm wondering is, would a t2.micro instance be sufficient for setting
up a cluster of hadoop nodes with the intention of learning it? To keep
things running longer in the free tier I would either setup however many
nodes as I want and keep them stopped when I'm not actively using them. Or
just setup a few nodes with a few different accounts (with a different
gmail address for each one.. easy enough to do).

Failing that, what are some other free/cheap solutions for setting up a
hadoop learning environment?

Thanks,
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: Hadoop Learning Environment

Posted by Sandeep Khurana <sk...@gmail.com>.
Or on your local laptop or desktop you can setup the env using VM and VM
image of Hadoop and related components. Wrote instructions sometime back
here
https://www.linkedin.com/today/post/article/20140924133831-2560863-new-to-hadoop-and-want-to-setup-dev-environment
On Nov 5, 2014 2:25 AM, "Jim Colestock" <jc...@ramblingredneck.com> wrote:

> Hello Tim,
>
> Horton and Cloudera both offer VM’s (Including Virtual box, which is free)
> you can pull down to play with, if you’re looking just for something small
> to get you started.  i’m partial to the horton works one myself.
>
> Hope that help.
>
> JC
>
>
>
> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>
>

Re: Hadoop Learning Environment

Posted by Sandeep Khurana <sk...@gmail.com>.
Or on your local laptop or desktop you can setup the env using VM and VM
image of Hadoop and related components. Wrote instructions sometime back
here
https://www.linkedin.com/today/post/article/20140924133831-2560863-new-to-hadoop-and-want-to-setup-dev-environment
On Nov 5, 2014 2:25 AM, "Jim Colestock" <jc...@ramblingredneck.com> wrote:

> Hello Tim,
>
> Horton and Cloudera both offer VM’s (Including Virtual box, which is free)
> you can pull down to play with, if you’re looking just for something small
> to get you started.  i’m partial to the horton works one myself.
>
> Hope that help.
>
> JC
>
>
>
> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>
>

Re: Hadoop Learning Environment

Posted by Sandeep Khurana <sk...@gmail.com>.
Or on your local laptop or desktop you can setup the env using VM and VM
image of Hadoop and related components. Wrote instructions sometime back
here
https://www.linkedin.com/today/post/article/20140924133831-2560863-new-to-hadoop-and-want-to-setup-dev-environment
On Nov 5, 2014 2:25 AM, "Jim Colestock" <jc...@ramblingredneck.com> wrote:

> Hello Tim,
>
> Horton and Cloudera both offer VM’s (Including Virtual box, which is free)
> you can pull down to play with, if you’re looking just for something small
> to get you started.  i’m partial to the horton works one myself.
>
> Hope that help.
>
> JC
>
>
>
> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>
>

Re: Hadoop Learning Environment

Posted by Sandeep Khurana <sk...@gmail.com>.
Or on your local laptop or desktop you can setup the env using VM and VM
image of Hadoop and related components. Wrote instructions sometime back
here
https://www.linkedin.com/today/post/article/20140924133831-2560863-new-to-hadoop-and-want-to-setup-dev-environment
On Nov 5, 2014 2:25 AM, "Jim Colestock" <jc...@ramblingredneck.com> wrote:

> Hello Tim,
>
> Horton and Cloudera both offer VM’s (Including Virtual box, which is free)
> you can pull down to play with, if you’re looking just for something small
> to get you started.  i’m partial to the horton works one myself.
>
> Hope that help.
>
> JC
>
>
>
> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>
>

Re: Hadoop Learning Environment

Posted by Jim Colestock <jc...@ramblingredneck.com>.
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you can pull down to play with, if you’re looking just for something small to get you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
> 
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net <http://pool.sks-keyservers.net/> --recv-keys F186197B
> 


Re: Hadoop Learning Environment

Posted by Gavin Yue <yu...@gmail.com>.
Try docker!

http://ferry.opencore.io/en/latest/examples/hadoop.html



On Tue, Nov 4, 2014 at 6:36 PM, jay vyas <ja...@gmail.com>
wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Hadoop Learning Environment

Posted by Gavin Yue <yu...@gmail.com>.
Try docker!

http://ferry.opencore.io/en/latest/examples/hadoop.html



On Tue, Nov 4, 2014 at 6:36 PM, jay vyas <ja...@gmail.com>
wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hey ! I forgot, you have to run

 "vagrant plugin install hostmanager" ,

then it will work for you.

... (see
https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet/README.md
)
Thanks for noting this

On Wed, Nov 5, 2014 at 1:55 PM, Jim Shi <ha...@apple.com> wrote:

> Hi, Yay,
>    I followed the steps you described and got the following error.
> Any idea?
>
>   vagrant up
> creating provisioner directive for running tests
> Bringing machine 'bigtop1' up with 'virtualbox' provider...
> *==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found.
> Attempting to find and install...*
>     bigtop1: Box Provider: virtualbox
>     bigtop1: Box Version: >= 0
> *==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider:
> virtualbox*
>     bigtop1: Downloading:
> http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
> *==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for
> 'virtualbox'!*
> There are errors in the configuration of this machine. Please fix
> the following errors and try again:
>
> vm:
> * The 'hostmanager' provisioner could not be found.
>
> Thanks
> Jim
>
>
>
>
>
> On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:
>
> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hey ! I forgot, you have to run

 "vagrant plugin install hostmanager" ,

then it will work for you.

... (see
https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet/README.md
)
Thanks for noting this

On Wed, Nov 5, 2014 at 1:55 PM, Jim Shi <ha...@apple.com> wrote:

> Hi, Yay,
>    I followed the steps you described and got the following error.
> Any idea?
>
>   vagrant up
> creating provisioner directive for running tests
> Bringing machine 'bigtop1' up with 'virtualbox' provider...
> *==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found.
> Attempting to find and install...*
>     bigtop1: Box Provider: virtualbox
>     bigtop1: Box Version: >= 0
> *==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider:
> virtualbox*
>     bigtop1: Downloading:
> http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
> *==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for
> 'virtualbox'!*
> There are errors in the configuration of this machine. Please fix
> the following errors and try again:
>
> vm:
> * The 'hostmanager' provisioner could not be found.
>
> Thanks
> Jim
>
>
>
>
>
> On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:
>
> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hey ! I forgot, you have to run

 "vagrant plugin install hostmanager" ,

then it will work for you.

... (see
https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet/README.md
)
Thanks for noting this

On Wed, Nov 5, 2014 at 1:55 PM, Jim Shi <ha...@apple.com> wrote:

> Hi, Yay,
>    I followed the steps you described and got the following error.
> Any idea?
>
>   vagrant up
> creating provisioner directive for running tests
> Bringing machine 'bigtop1' up with 'virtualbox' provider...
> *==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found.
> Attempting to find and install...*
>     bigtop1: Box Provider: virtualbox
>     bigtop1: Box Version: >= 0
> *==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider:
> virtualbox*
>     bigtop1: Downloading:
> http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
> *==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for
> 'virtualbox'!*
> There are errors in the configuration of this machine. Please fix
> the following errors and try again:
>
> vm:
> * The 'hostmanager' provisioner could not be found.
>
> Thanks
> Jim
>
>
>
>
>
> On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:
>
> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hey ! I forgot, you have to run

 "vagrant plugin install hostmanager" ,

then it will work for you.

... (see
https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet/README.md
)
Thanks for noting this

On Wed, Nov 5, 2014 at 1:55 PM, Jim Shi <ha...@apple.com> wrote:

> Hi, Yay,
>    I followed the steps you described and got the following error.
> Any idea?
>
>   vagrant up
> creating provisioner directive for running tests
> Bringing machine 'bigtop1' up with 'virtualbox' provider...
> *==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found.
> Attempting to find and install...*
>     bigtop1: Box Provider: virtualbox
>     bigtop1: Box Version: >= 0
> *==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider:
> virtualbox*
>     bigtop1: Downloading:
> http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
> *==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for
> 'virtualbox'!*
> There are errors in the configuration of this machine. Please fix
> the following errors and try again:
>
> vm:
> * The 'hostmanager' provisioner could not be found.
>
> Thanks
> Jim
>
>
>
>
>
> On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:
>
> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by Jim Shi <ha...@apple.com>.
Hi, Yay,
   I followed the steps you described and got the following error.
Any idea?

  vagrant up
creating provisioner directive for running tests
Bringing machine 'bigtop1' up with 'virtualbox' provider...
==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found. Attempting to find and install...
    bigtop1: Box Provider: virtualbox
    bigtop1: Box Version: >= 0
==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider: virtualbox
    bigtop1: Downloading: http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for 'virtualbox'!
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* The 'hostmanager' provisioner could not be found.

Thanks
Jim





On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a hadoop cluster,  i would think setting up bigtop is super easy ! If you have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4 or 5 node hadoop cluster.
> 
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
> 
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its easy to turn the simple single node bigtop VM into a multinode one, 
> by just modifying the vagrantile. 
> 
> 
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com> wrote:
> What you want as a sandbox depends on what you are trying to learn. 
> 
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all of the suggestions (perhaps excluding BigTop due to its setup complexities) are great. Laptop? perhaps but laptop's are really kind of infuriatingly slow (because of the hardware - you pay a price for a 30-45watt average heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7 with lots of memory. What do you think of the thought that you will pretty quickly graduate to wanting a small'ish desktop for your sandbox?
> 
> A simple, single node, Hadoop instance will let you learn many things. The next level of complexity comes when you are attempting to deal with data whose processing needs to be split up, so you can learn about how to split data in Mapping, reduce the splits via reduce jobs, etc. For that, you could get a windows desktop box or e.g. RedHat/CentOS and use virtualization. Something like a 4 core i5 with 32gb of memory, running 3 or for some things 4, vm's. You could load e.g. hortonworks into each of the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives off of eBay and you can have a lot of learning. 
> 
> 
> 
> 
> 
> .......
> “The race is not to the swift,
> nor the battle to the strong,
> but to those who can see it coming and jump aside.” - Hunter Thompson
> Daemeon
> 
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
> you can try the pivotal vm as well. 
> 
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
> 
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com> wrote:
> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on your home desktop.
> 
> 
> Thank you!
> 
> 
> 
> Sincerely,
> 
> Leonid Fedotov
> 
> Systems Architect - Professional Services
> 
> lfedotov@hortonworks.com
> 
> office: +1 855 846 7866 ext 292
> 
> mobile: +1 650 430 1673
> 
> 
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> -- 
> jay vyas


Re: Hadoop Learning Environment

Posted by Jim Shi <ha...@apple.com>.
Hi, Yay,
   I followed the steps you described and got the following error.
Any idea?

  vagrant up
creating provisioner directive for running tests
Bringing machine 'bigtop1' up with 'virtualbox' provider...
==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found. Attempting to find and install...
    bigtop1: Box Provider: virtualbox
    bigtop1: Box Version: >= 0
==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider: virtualbox
    bigtop1: Downloading: http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for 'virtualbox'!
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* The 'hostmanager' provisioner could not be found.

Thanks
Jim





On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a hadoop cluster,  i would think setting up bigtop is super easy ! If you have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4 or 5 node hadoop cluster.
> 
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
> 
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its easy to turn the simple single node bigtop VM into a multinode one, 
> by just modifying the vagrantile. 
> 
> 
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com> wrote:
> What you want as a sandbox depends on what you are trying to learn. 
> 
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all of the suggestions (perhaps excluding BigTop due to its setup complexities) are great. Laptop? perhaps but laptop's are really kind of infuriatingly slow (because of the hardware - you pay a price for a 30-45watt average heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7 with lots of memory. What do you think of the thought that you will pretty quickly graduate to wanting a small'ish desktop for your sandbox?
> 
> A simple, single node, Hadoop instance will let you learn many things. The next level of complexity comes when you are attempting to deal with data whose processing needs to be split up, so you can learn about how to split data in Mapping, reduce the splits via reduce jobs, etc. For that, you could get a windows desktop box or e.g. RedHat/CentOS and use virtualization. Something like a 4 core i5 with 32gb of memory, running 3 or for some things 4, vm's. You could load e.g. hortonworks into each of the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives off of eBay and you can have a lot of learning. 
> 
> 
> 
> 
> 
> .......
> “The race is not to the swift,
> nor the battle to the strong,
> but to those who can see it coming and jump aside.” - Hunter Thompson
> Daemeon
> 
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
> you can try the pivotal vm as well. 
> 
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
> 
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com> wrote:
> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on your home desktop.
> 
> 
> Thank you!
> 
> 
> 
> Sincerely,
> 
> Leonid Fedotov
> 
> Systems Architect - Professional Services
> 
> lfedotov@hortonworks.com
> 
> office: +1 855 846 7866 ext 292
> 
> mobile: +1 650 430 1673
> 
> 
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> -- 
> jay vyas


Re: Hadoop Learning Environment

Posted by Jim Shi <ha...@apple.com>.
Hi, Yay,
   I followed the steps you described and got the following error.
Any idea?

  vagrant up
creating provisioner directive for running tests
Bringing machine 'bigtop1' up with 'virtualbox' provider...
==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found. Attempting to find and install...
    bigtop1: Box Provider: virtualbox
    bigtop1: Box Version: >= 0
==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider: virtualbox
    bigtop1: Downloading: http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for 'virtualbox'!
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* The 'hostmanager' provisioner could not be found.

Thanks
Jim





On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a hadoop cluster,  i would think setting up bigtop is super easy ! If you have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4 or 5 node hadoop cluster.
> 
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
> 
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its easy to turn the simple single node bigtop VM into a multinode one, 
> by just modifying the vagrantile. 
> 
> 
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com> wrote:
> What you want as a sandbox depends on what you are trying to learn. 
> 
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all of the suggestions (perhaps excluding BigTop due to its setup complexities) are great. Laptop? perhaps but laptop's are really kind of infuriatingly slow (because of the hardware - you pay a price for a 30-45watt average heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7 with lots of memory. What do you think of the thought that you will pretty quickly graduate to wanting a small'ish desktop for your sandbox?
> 
> A simple, single node, Hadoop instance will let you learn many things. The next level of complexity comes when you are attempting to deal with data whose processing needs to be split up, so you can learn about how to split data in Mapping, reduce the splits via reduce jobs, etc. For that, you could get a windows desktop box or e.g. RedHat/CentOS and use virtualization. Something like a 4 core i5 with 32gb of memory, running 3 or for some things 4, vm's. You could load e.g. hortonworks into each of the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives off of eBay and you can have a lot of learning. 
> 
> 
> 
> 
> 
> .......
> “The race is not to the swift,
> nor the battle to the strong,
> but to those who can see it coming and jump aside.” - Hunter Thompson
> Daemeon
> 
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
> you can try the pivotal vm as well. 
> 
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
> 
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com> wrote:
> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on your home desktop.
> 
> 
> Thank you!
> 
> 
> 
> Sincerely,
> 
> Leonid Fedotov
> 
> Systems Architect - Professional Services
> 
> lfedotov@hortonworks.com
> 
> office: +1 855 846 7866 ext 292
> 
> mobile: +1 650 430 1673
> 
> 
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> -- 
> jay vyas


Re: Hadoop Learning Environment

Posted by Gavin Yue <yu...@gmail.com>.
Try docker!

http://ferry.opencore.io/en/latest/examples/hadoop.html



On Tue, Nov 4, 2014 at 6:36 PM, jay vyas <ja...@gmail.com>
wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Hadoop Learning Environment

Posted by Jim Shi <ha...@apple.com>.
Hi, Yay,
   I followed the steps you described and got the following error.
Any idea?

  vagrant up
creating provisioner directive for running tests
Bringing machine 'bigtop1' up with 'virtualbox' provider...
==> bigtop1: Box 'puppetlab-centos-64-nocm' could not be found. Attempting to find and install...
    bigtop1: Box Provider: virtualbox
    bigtop1: Box Version: >= 0
==> bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider: virtualbox
    bigtop1: Downloading: http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
==> bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for 'virtualbox'!
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* The 'hostmanager' provisioner could not be found.

Thanks
Jim





On Nov 4, 2014, at 6:36 PM, jay vyas <ja...@gmail.com> wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a hadoop cluster,  i would think setting up bigtop is super easy ! If you have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4 or 5 node hadoop cluster.
> 
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
> 
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its easy to turn the simple single node bigtop VM into a multinode one, 
> by just modifying the vagrantile. 
> 
> 
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com> wrote:
> What you want as a sandbox depends on what you are trying to learn. 
> 
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all of the suggestions (perhaps excluding BigTop due to its setup complexities) are great. Laptop? perhaps but laptop's are really kind of infuriatingly slow (because of the hardware - you pay a price for a 30-45watt average heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7 with lots of memory. What do you think of the thought that you will pretty quickly graduate to wanting a small'ish desktop for your sandbox?
> 
> A simple, single node, Hadoop instance will let you learn many things. The next level of complexity comes when you are attempting to deal with data whose processing needs to be split up, so you can learn about how to split data in Mapping, reduce the splits via reduce jobs, etc. For that, you could get a windows desktop box or e.g. RedHat/CentOS and use virtualization. Something like a 4 core i5 with 32gb of memory, running 3 or for some things 4, vm's. You could load e.g. hortonworks into each of the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives off of eBay and you can have a lot of learning. 
> 
> 
> 
> 
> 
> .......
> “The race is not to the swift,
> nor the battle to the strong,
> but to those who can see it coming and jump aside.” - Hunter Thompson
> Daemeon
> 
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
> you can try the pivotal vm as well. 
> 
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
> 
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com> wrote:
> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on your home desktop.
> 
> 
> Thank you!
> 
> 
> 
> Sincerely,
> 
> Leonid Fedotov
> 
> Systems Architect - Professional Services
> 
> lfedotov@hortonworks.com
> 
> office: +1 855 846 7866 ext 292
> 
> mobile: +1 650 430 1673
> 
> 
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> -- 
> jay vyas


Re: Hadoop Learning Environment

Posted by Gavin Yue <yu...@gmail.com>.
Try docker!

http://ferry.opencore.io/en/latest/examples/hadoop.html



On Tue, Nov 4, 2014 at 6:36 PM, jay vyas <ja...@gmail.com>
wrote:

> Hi daemon:  Actually, for most folks who would want to actually use a
> hadoop cluster,  i would think setting up bigtop is super easy ! If you
> have issues with it ping me and I can help you get started.
> Also, we have docker containers - so you dont even *need* a VM to run a 4
> or 5 node hadoop cluster.
>
> install vagrant
> install VirtualBox
> git clone https://github.com/apache/bigtop
> cd bigtop/bigtop-deploy/vm/vagrant-puppet
> vagrant up
> Then vagrant destroy when your done.
>
> This to me is easier than manually downloading an appliance, picking memory
> starting the virtualbox gui, loading the appliance , etc...  and also its
> easy to turn the simple single node bigtop VM into a multinode one,
> by just modifying the vagrantile.
>
>
> On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> What you want as a sandbox depends on what you are trying to learn.
>>
>> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
>> all of the suggestions (perhaps excluding BigTop due to its setup
>> complexities) are great. Laptop? perhaps but laptop's are really kind of
>> infuriatingly slow (because of the hardware - you pay a price for a
>> 30-45watt average heating bill). A laptop is an OK place to start if it is
>> e.g. an i5 or i7 with lots of memory. What do you think of the thought that
>> you will pretty quickly graduate to wanting a small'ish desktop for your
>> sandbox?
>>
>> A simple, single node, Hadoop instance will let you learn many things.
>> The next level of complexity comes when you are attempting to deal with
>> data whose processing needs to be split up, so you can learn about how to
>> split data in Mapping, reduce the splits via reduce jobs, etc. For that,
>> you could get a windows desktop box or e.g. RedHat/CentOS and use
>> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
>> or for some things 4, vm's. You could load e.g. hortonworks into each of
>> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
>> off of eBay and you can have a lot of learning.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *.......“The race is not to the swift,nor the battle to the strong,but to
>> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
>> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>>
>>> you can try the pivotal vm as well.
>>>
>>>
>>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>>
>>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lfedotov@hortonworks.com
>>> > wrote:
>>>
>>>> Tim,
>>>> download Sandbox from http://hortonworks/com
>>>> You will have everything needed in a small VM instance which will run
>>>> on your home desktop.
>>>>
>>>>
>>>> *Thank you!*
>>>>
>>>>
>>>> *Sincerely,*
>>>>
>>>> *Leonid Fedotov*
>>>>
>>>> Systems Architect - Professional Services
>>>>
>>>> lfedotov@hortonworks.com
>>>>
>>>> office: +1 855 846 7866 ext 292
>>>>
>>>> mobile: +1 650 430 1673
>>>>
>>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>  I want to setup an environment where I can teach myself hadoop.
>>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>>> tier and setup whatever software I want.
>>>>>
>>>>> However I realize that Hadoop is a memory intensive, big data
>>>>> solution. So what I'm wondering is, would a t2.micro instance be sufficient
>>>>> for setting up a cluster of hadoop nodes with the intention of learning it?
>>>>> To keep things running longer in the free tier I would either setup however
>>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>>> different gmail address for each one.. easy enough to do).
>>>>>
>>>>> Failing that, what are some other free/cheap solutions for setting up
>>>>> a hadoop learning environment?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>> --
>>>>> GPG me!!
>>>>>
>>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi daemon:  Actually, for most folks who would want to actually use a
hadoop cluster,  i would think setting up bigtop is super easy ! If you
have issues with it ping me and I can help you get started.
Also, we have docker containers - so you dont even *need* a VM to run a 4
or 5 node hadoop cluster.

install vagrant
install VirtualBox
git clone https://github.com/apache/bigtop
cd bigtop/bigtop-deploy/vm/vagrant-puppet
vagrant up
Then vagrant destroy when your done.

This to me is easier than manually downloading an appliance, picking memory
starting the virtualbox gui, loading the appliance , etc...  and also its
easy to turn the simple single node bigtop VM into a multinode one,
by just modifying the vagrantile.


On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> What you want as a sandbox depends on what you are trying to learn.
>
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
> of the suggestions (perhaps excluding BigTop due to its setup complexities)
> are great. Laptop? perhaps but laptop's are really kind of infuriatingly
> slow (because of the hardware - you pay a price for a 30-45watt average
> heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
> with lots of memory. What do you think of the thought that you will pretty
> quickly graduate to wanting a small'ish desktop for your sandbox?
>
> A simple, single node, Hadoop instance will let you learn many things. The
> next level of complexity comes when you are attempting to deal with data
> whose processing needs to be split up, so you can learn about how to split
> data in Mapping, reduce the splits via reduce jobs, etc. For that, you
> could get a windows desktop box or e.g. RedHat/CentOS and use
> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
> or for some things 4, vm's. You could load e.g. hortonworks into each of
> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
> off of eBay and you can have a lot of learning.
>
>
>
>
>
>
>
>
>
>
>
> *.......“The race is not to the swift,nor the battle to the strong,but to
> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>
>> you can try the pivotal vm as well.
>>
>>
>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>
>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
>> wrote:
>>
>>> Tim,
>>> download Sandbox from http://hortonworks/com
>>> You will have everything needed in a small VM instance which will run on
>>> your home desktop.
>>>
>>>
>>> *Thank you!*
>>>
>>>
>>> *Sincerely,*
>>>
>>> *Leonid Fedotov*
>>>
>>> Systems Architect - Professional Services
>>>
>>> lfedotov@hortonworks.com
>>>
>>> office: +1 855 846 7866 ext 292
>>>
>>> mobile: +1 650 430 1673
>>>
>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>>  I want to setup an environment where I can teach myself hadoop.
>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>> tier and setup whatever software I want.
>>>>
>>>> However I realize that Hadoop is a memory intensive, big data solution.
>>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>>> keep things running longer in the free tier I would either setup however
>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>> different gmail address for each one.. easy enough to do).
>>>>
>>>> Failing that, what are some other free/cheap solutions for setting up a
>>>> hadoop learning environment?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> --
>>>> GPG me!!
>>>>
>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi daemon:  Actually, for most folks who would want to actually use a
hadoop cluster,  i would think setting up bigtop is super easy ! If you
have issues with it ping me and I can help you get started.
Also, we have docker containers - so you dont even *need* a VM to run a 4
or 5 node hadoop cluster.

install vagrant
install VirtualBox
git clone https://github.com/apache/bigtop
cd bigtop/bigtop-deploy/vm/vagrant-puppet
vagrant up
Then vagrant destroy when your done.

This to me is easier than manually downloading an appliance, picking memory
starting the virtualbox gui, loading the appliance , etc...  and also its
easy to turn the simple single node bigtop VM into a multinode one,
by just modifying the vagrantile.


On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> What you want as a sandbox depends on what you are trying to learn.
>
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
> of the suggestions (perhaps excluding BigTop due to its setup complexities)
> are great. Laptop? perhaps but laptop's are really kind of infuriatingly
> slow (because of the hardware - you pay a price for a 30-45watt average
> heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
> with lots of memory. What do you think of the thought that you will pretty
> quickly graduate to wanting a small'ish desktop for your sandbox?
>
> A simple, single node, Hadoop instance will let you learn many things. The
> next level of complexity comes when you are attempting to deal with data
> whose processing needs to be split up, so you can learn about how to split
> data in Mapping, reduce the splits via reduce jobs, etc. For that, you
> could get a windows desktop box or e.g. RedHat/CentOS and use
> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
> or for some things 4, vm's. You could load e.g. hortonworks into each of
> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
> off of eBay and you can have a lot of learning.
>
>
>
>
>
>
>
>
>
>
>
> *.......“The race is not to the swift,nor the battle to the strong,but to
> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>
>> you can try the pivotal vm as well.
>>
>>
>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>
>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
>> wrote:
>>
>>> Tim,
>>> download Sandbox from http://hortonworks/com
>>> You will have everything needed in a small VM instance which will run on
>>> your home desktop.
>>>
>>>
>>> *Thank you!*
>>>
>>>
>>> *Sincerely,*
>>>
>>> *Leonid Fedotov*
>>>
>>> Systems Architect - Professional Services
>>>
>>> lfedotov@hortonworks.com
>>>
>>> office: +1 855 846 7866 ext 292
>>>
>>> mobile: +1 650 430 1673
>>>
>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>>  I want to setup an environment where I can teach myself hadoop.
>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>> tier and setup whatever software I want.
>>>>
>>>> However I realize that Hadoop is a memory intensive, big data solution.
>>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>>> keep things running longer in the free tier I would either setup however
>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>> different gmail address for each one.. easy enough to do).
>>>>
>>>> Failing that, what are some other free/cheap solutions for setting up a
>>>> hadoop learning environment?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> --
>>>> GPG me!!
>>>>
>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi daemon:  Actually, for most folks who would want to actually use a
hadoop cluster,  i would think setting up bigtop is super easy ! If you
have issues with it ping me and I can help you get started.
Also, we have docker containers - so you dont even *need* a VM to run a 4
or 5 node hadoop cluster.

install vagrant
install VirtualBox
git clone https://github.com/apache/bigtop
cd bigtop/bigtop-deploy/vm/vagrant-puppet
vagrant up
Then vagrant destroy when your done.

This to me is easier than manually downloading an appliance, picking memory
starting the virtualbox gui, loading the appliance , etc...  and also its
easy to turn the simple single node bigtop VM into a multinode one,
by just modifying the vagrantile.


On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> What you want as a sandbox depends on what you are trying to learn.
>
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
> of the suggestions (perhaps excluding BigTop due to its setup complexities)
> are great. Laptop? perhaps but laptop's are really kind of infuriatingly
> slow (because of the hardware - you pay a price for a 30-45watt average
> heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
> with lots of memory. What do you think of the thought that you will pretty
> quickly graduate to wanting a small'ish desktop for your sandbox?
>
> A simple, single node, Hadoop instance will let you learn many things. The
> next level of complexity comes when you are attempting to deal with data
> whose processing needs to be split up, so you can learn about how to split
> data in Mapping, reduce the splits via reduce jobs, etc. For that, you
> could get a windows desktop box or e.g. RedHat/CentOS and use
> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
> or for some things 4, vm's. You could load e.g. hortonworks into each of
> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
> off of eBay and you can have a lot of learning.
>
>
>
>
>
>
>
>
>
>
>
> *.......“The race is not to the swift,nor the battle to the strong,but to
> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>
>> you can try the pivotal vm as well.
>>
>>
>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>
>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
>> wrote:
>>
>>> Tim,
>>> download Sandbox from http://hortonworks/com
>>> You will have everything needed in a small VM instance which will run on
>>> your home desktop.
>>>
>>>
>>> *Thank you!*
>>>
>>>
>>> *Sincerely,*
>>>
>>> *Leonid Fedotov*
>>>
>>> Systems Architect - Professional Services
>>>
>>> lfedotov@hortonworks.com
>>>
>>> office: +1 855 846 7866 ext 292
>>>
>>> mobile: +1 650 430 1673
>>>
>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>>  I want to setup an environment where I can teach myself hadoop.
>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>> tier and setup whatever software I want.
>>>>
>>>> However I realize that Hadoop is a memory intensive, big data solution.
>>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>>> keep things running longer in the free tier I would either setup however
>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>> different gmail address for each one.. easy enough to do).
>>>>
>>>> Failing that, what are some other free/cheap solutions for setting up a
>>>> hadoop learning environment?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> --
>>>> GPG me!!
>>>>
>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi daemon:  Actually, for most folks who would want to actually use a
hadoop cluster,  i would think setting up bigtop is super easy ! If you
have issues with it ping me and I can help you get started.
Also, we have docker containers - so you dont even *need* a VM to run a 4
or 5 node hadoop cluster.

install vagrant
install VirtualBox
git clone https://github.com/apache/bigtop
cd bigtop/bigtop-deploy/vm/vagrant-puppet
vagrant up
Then vagrant destroy when your done.

This to me is easier than manually downloading an appliance, picking memory
starting the virtualbox gui, loading the appliance , etc...  and also its
easy to turn the simple single node bigtop VM into a multinode one,
by just modifying the vagrantile.


On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> What you want as a sandbox depends on what you are trying to learn.
>
> If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
> of the suggestions (perhaps excluding BigTop due to its setup complexities)
> are great. Laptop? perhaps but laptop's are really kind of infuriatingly
> slow (because of the hardware - you pay a price for a 30-45watt average
> heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
> with lots of memory. What do you think of the thought that you will pretty
> quickly graduate to wanting a small'ish desktop for your sandbox?
>
> A simple, single node, Hadoop instance will let you learn many things. The
> next level of complexity comes when you are attempting to deal with data
> whose processing needs to be split up, so you can learn about how to split
> data in Mapping, reduce the splits via reduce jobs, etc. For that, you
> could get a windows desktop box or e.g. RedHat/CentOS and use
> virtualization. Something like a 4 core i5 with 32gb of memory, running 3
> or for some things 4, vm's. You could load e.g. hortonworks into each of
> the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
> off of eBay and you can have a lot of learning.
>
>
>
>
>
>
>
>
>
>
>
> *.......“The race is not to the swift,nor the battle to the strong,but to
> those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
> On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:
>
>> you can try the pivotal vm as well.
>>
>>
>> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>>
>> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
>> wrote:
>>
>>> Tim,
>>> download Sandbox from http://hortonworks/com
>>> You will have everything needed in a small VM instance which will run on
>>> your home desktop.
>>>
>>>
>>> *Thank you!*
>>>
>>>
>>> *Sincerely,*
>>>
>>> *Leonid Fedotov*
>>>
>>> Systems Architect - Professional Services
>>>
>>> lfedotov@hortonworks.com
>>>
>>> office: +1 855 846 7866 ext 292
>>>
>>> mobile: +1 650 430 1673
>>>
>>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>>  I want to setup an environment where I can teach myself hadoop.
>>>> Usually the way I'll handle this is to grab a machine off the Amazon free
>>>> tier and setup whatever software I want.
>>>>
>>>> However I realize that Hadoop is a memory intensive, big data solution.
>>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>>> keep things running longer in the free tier I would either setup however
>>>> many nodes as I want and keep them stopped when I'm not actively using
>>>> them. Or just setup a few nodes with a few different accounts (with a
>>>> different gmail address for each one.. easy enough to do).
>>>>
>>>> Failing that, what are some other free/cheap solutions for setting up a
>>>> hadoop learning environment?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> --
>>>> GPG me!!
>>>>
>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by daemeon reiydelle <da...@gmail.com>.
What you want as a sandbox depends on what you are trying to learn.

If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
of the suggestions (perhaps excluding BigTop due to its setup complexities)
are great. Laptop? perhaps but laptop's are really kind of infuriatingly
slow (because of the hardware - you pay a price for a 30-45watt average
heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
with lots of memory. What do you think of the thought that you will pretty
quickly graduate to wanting a small'ish desktop for your sandbox?

A simple, single node, Hadoop instance will let you learn many things. The
next level of complexity comes when you are attempting to deal with data
whose processing needs to be split up, so you can learn about how to split
data in Mapping, reduce the splits via reduce jobs, etc. For that, you
could get a windows desktop box or e.g. RedHat/CentOS and use
virtualization. Something like a 4 core i5 with 32gb of memory, running 3
or for some things 4, vm's. You could load e.g. hortonworks into each of
the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
off of eBay and you can have a lot of learning.











*.......“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:

> you can try the pivotal vm as well.
>
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
> wrote:
>
>> Tim,
>> download Sandbox from http://hortonworks/com
>> You will have everything needed in a small VM instance which will run on
>> your home desktop.
>>
>>
>> *Thank you!*
>>
>>
>> *Sincerely,*
>>
>> *Leonid Fedotov*
>>
>> Systems Architect - Professional Services
>>
>> lfedotov@hortonworks.com
>>
>> office: +1 855 846 7866 ext 292
>>
>> mobile: +1 650 430 1673
>>
>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>>  I want to setup an environment where I can teach myself hadoop. Usually
>>> the way I'll handle this is to grab a machine off the Amazon free tier and
>>> setup whatever software I want.
>>>
>>> However I realize that Hadoop is a memory intensive, big data solution.
>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>> keep things running longer in the free tier I would either setup however
>>> many nodes as I want and keep them stopped when I'm not actively using
>>> them. Or just setup a few nodes with a few different accounts (with a
>>> different gmail address for each one.. easy enough to do).
>>>
>>> Failing that, what are some other free/cheap solutions for setting up a
>>> hadoop learning environment?
>>>
>>> Thanks,
>>> Tim
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop Learning Environment

Posted by daemeon reiydelle <da...@gmail.com>.
What you want as a sandbox depends on what you are trying to learn.

If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
of the suggestions (perhaps excluding BigTop due to its setup complexities)
are great. Laptop? perhaps but laptop's are really kind of infuriatingly
slow (because of the hardware - you pay a price for a 30-45watt average
heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
with lots of memory. What do you think of the thought that you will pretty
quickly graduate to wanting a small'ish desktop for your sandbox?

A simple, single node, Hadoop instance will let you learn many things. The
next level of complexity comes when you are attempting to deal with data
whose processing needs to be split up, so you can learn about how to split
data in Mapping, reduce the splits via reduce jobs, etc. For that, you
could get a windows desktop box or e.g. RedHat/CentOS and use
virtualization. Something like a 4 core i5 with 32gb of memory, running 3
or for some things 4, vm's. You could load e.g. hortonworks into each of
the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
off of eBay and you can have a lot of learning.











*.......“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:

> you can try the pivotal vm as well.
>
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
> wrote:
>
>> Tim,
>> download Sandbox from http://hortonworks/com
>> You will have everything needed in a small VM instance which will run on
>> your home desktop.
>>
>>
>> *Thank you!*
>>
>>
>> *Sincerely,*
>>
>> *Leonid Fedotov*
>>
>> Systems Architect - Professional Services
>>
>> lfedotov@hortonworks.com
>>
>> office: +1 855 846 7866 ext 292
>>
>> mobile: +1 650 430 1673
>>
>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>>  I want to setup an environment where I can teach myself hadoop. Usually
>>> the way I'll handle this is to grab a machine off the Amazon free tier and
>>> setup whatever software I want.
>>>
>>> However I realize that Hadoop is a memory intensive, big data solution.
>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>> keep things running longer in the free tier I would either setup however
>>> many nodes as I want and keep them stopped when I'm not actively using
>>> them. Or just setup a few nodes with a few different accounts (with a
>>> different gmail address for each one.. easy enough to do).
>>>
>>> Failing that, what are some other free/cheap solutions for setting up a
>>> hadoop learning environment?
>>>
>>> Thanks,
>>> Tim
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop Learning Environment

Posted by daemeon reiydelle <da...@gmail.com>.
What you want as a sandbox depends on what you are trying to learn.

If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
of the suggestions (perhaps excluding BigTop due to its setup complexities)
are great. Laptop? perhaps but laptop's are really kind of infuriatingly
slow (because of the hardware - you pay a price for a 30-45watt average
heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
with lots of memory. What do you think of the thought that you will pretty
quickly graduate to wanting a small'ish desktop for your sandbox?

A simple, single node, Hadoop instance will let you learn many things. The
next level of complexity comes when you are attempting to deal with data
whose processing needs to be split up, so you can learn about how to split
data in Mapping, reduce the splits via reduce jobs, etc. For that, you
could get a windows desktop box or e.g. RedHat/CentOS and use
virtualization. Something like a 4 core i5 with 32gb of memory, running 3
or for some things 4, vm's. You could load e.g. hortonworks into each of
the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
off of eBay and you can have a lot of learning.











*.......“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:

> you can try the pivotal vm as well.
>
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
> wrote:
>
>> Tim,
>> download Sandbox from http://hortonworks/com
>> You will have everything needed in a small VM instance which will run on
>> your home desktop.
>>
>>
>> *Thank you!*
>>
>>
>> *Sincerely,*
>>
>> *Leonid Fedotov*
>>
>> Systems Architect - Professional Services
>>
>> lfedotov@hortonworks.com
>>
>> office: +1 855 846 7866 ext 292
>>
>> mobile: +1 650 430 1673
>>
>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>>  I want to setup an environment where I can teach myself hadoop. Usually
>>> the way I'll handle this is to grab a machine off the Amazon free tier and
>>> setup whatever software I want.
>>>
>>> However I realize that Hadoop is a memory intensive, big data solution.
>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>> keep things running longer in the free tier I would either setup however
>>> many nodes as I want and keep them stopped when I'm not actively using
>>> them. Or just setup a few nodes with a few different accounts (with a
>>> different gmail address for each one.. easy enough to do).
>>>
>>> Failing that, what are some other free/cheap solutions for setting up a
>>> hadoop learning environment?
>>>
>>> Thanks,
>>> Tim
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop Learning Environment

Posted by daemeon reiydelle <da...@gmail.com>.
What you want as a sandbox depends on what you are trying to learn.

If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
of the suggestions (perhaps excluding BigTop due to its setup complexities)
are great. Laptop? perhaps but laptop's are really kind of infuriatingly
slow (because of the hardware - you pay a price for a 30-45watt average
heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
with lots of memory. What do you think of the thought that you will pretty
quickly graduate to wanting a small'ish desktop for your sandbox?

A simple, single node, Hadoop instance will let you learn many things. The
next level of complexity comes when you are attempting to deal with data
whose processing needs to be split up, so you can learn about how to split
data in Mapping, reduce the splits via reduce jobs, etc. For that, you
could get a windows desktop box or e.g. RedHat/CentOS and use
virtualization. Something like a 4 core i5 with 32gb of memory, running 3
or for some things 4, vm's. You could load e.g. hortonworks into each of
the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
off of eBay and you can have a lot of learning.











*.......“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano <os...@gmail.com> wrote:

> you can try the pivotal vm as well.
>
> http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
>
> On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
> wrote:
>
>> Tim,
>> download Sandbox from http://hortonworks/com
>> You will have everything needed in a small VM instance which will run on
>> your home desktop.
>>
>>
>> *Thank you!*
>>
>>
>> *Sincerely,*
>>
>> *Leonid Fedotov*
>>
>> Systems Architect - Professional Services
>>
>> lfedotov@hortonworks.com
>>
>> office: +1 855 846 7866 ext 292
>>
>> mobile: +1 650 430 1673
>>
>> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>>  I want to setup an environment where I can teach myself hadoop. Usually
>>> the way I'll handle this is to grab a machine off the Amazon free tier and
>>> setup whatever software I want.
>>>
>>> However I realize that Hadoop is a memory intensive, big data solution.
>>> So what I'm wondering is, would a t2.micro instance be sufficient for
>>> setting up a cluster of hadoop nodes with the intention of learning it? To
>>> keep things running longer in the free tier I would either setup however
>>> many nodes as I want and keep them stopped when I'm not actively using
>>> them. Or just setup a few nodes with a few different accounts (with a
>>> different gmail address for each one.. easy enough to do).
>>>
>>> Failing that, what are some other free/cheap solutions for setting up a
>>> hadoop learning environment?
>>>
>>> Thanks,
>>> Tim
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop Learning Environment

Posted by oscar sumano <os...@gmail.com>.
you can try the pivotal vm as well.

http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
wrote:

> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on
> your home desktop.
>
>
> *Thank you!*
>
>
> *Sincerely,*
>
> *Leonid Fedotov*
>
> Systems Architect - Professional Services
>
> lfedotov@hortonworks.com
>
> office: +1 855 846 7866 ext 292
>
> mobile: +1 650 430 1673
>
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey all,
>>
>>  I want to setup an environment where I can teach myself hadoop. Usually
>> the way I'll handle this is to grab a machine off the Amazon free tier and
>> setup whatever software I want.
>>
>> However I realize that Hadoop is a memory intensive, big data solution.
>> So what I'm wondering is, would a t2.micro instance be sufficient for
>> setting up a cluster of hadoop nodes with the intention of learning it? To
>> keep things running longer in the free tier I would either setup however
>> many nodes as I want and keep them stopped when I'm not actively using
>> them. Or just setup a few nodes with a few different accounts (with a
>> different gmail address for each one.. easy enough to do).
>>
>> Failing that, what are some other free/cheap solutions for setting up a
>> hadoop learning environment?
>>
>> Thanks,
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by oscar sumano <os...@gmail.com>.
you can try the pivotal vm as well.

http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
wrote:

> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on
> your home desktop.
>
>
> *Thank you!*
>
>
> *Sincerely,*
>
> *Leonid Fedotov*
>
> Systems Architect - Professional Services
>
> lfedotov@hortonworks.com
>
> office: +1 855 846 7866 ext 292
>
> mobile: +1 650 430 1673
>
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey all,
>>
>>  I want to setup an environment where I can teach myself hadoop. Usually
>> the way I'll handle this is to grab a machine off the Amazon free tier and
>> setup whatever software I want.
>>
>> However I realize that Hadoop is a memory intensive, big data solution.
>> So what I'm wondering is, would a t2.micro instance be sufficient for
>> setting up a cluster of hadoop nodes with the intention of learning it? To
>> keep things running longer in the free tier I would either setup however
>> many nodes as I want and keep them stopped when I'm not actively using
>> them. Or just setup a few nodes with a few different accounts (with a
>> different gmail address for each one.. easy enough to do).
>>
>> Failing that, what are some other free/cheap solutions for setting up a
>> hadoop learning environment?
>>
>> Thanks,
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by oscar sumano <os...@gmail.com>.
you can try the pivotal vm as well.

http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
wrote:

> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on
> your home desktop.
>
>
> *Thank you!*
>
>
> *Sincerely,*
>
> *Leonid Fedotov*
>
> Systems Architect - Professional Services
>
> lfedotov@hortonworks.com
>
> office: +1 855 846 7866 ext 292
>
> mobile: +1 650 430 1673
>
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey all,
>>
>>  I want to setup an environment where I can teach myself hadoop. Usually
>> the way I'll handle this is to grab a machine off the Amazon free tier and
>> setup whatever software I want.
>>
>> However I realize that Hadoop is a memory intensive, big data solution.
>> So what I'm wondering is, would a t2.micro instance be sufficient for
>> setting up a cluster of hadoop nodes with the intention of learning it? To
>> keep things running longer in the free tier I would either setup however
>> many nodes as I want and keep them stopped when I'm not actively using
>> them. Or just setup a few nodes with a few different accounts (with a
>> different gmail address for each one.. easy enough to do).
>>
>> Failing that, what are some other free/cheap solutions for setting up a
>> hadoop learning environment?
>>
>> Thanks,
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by oscar sumano <os...@gmail.com>.
you can try the pivotal vm as well.

http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov <lf...@hortonworks.com>
wrote:

> Tim,
> download Sandbox from http://hortonworks/com
> You will have everything needed in a small VM instance which will run on
> your home desktop.
>
>
> *Thank you!*
>
>
> *Sincerely,*
>
> *Leonid Fedotov*
>
> Systems Architect - Professional Services
>
> lfedotov@hortonworks.com
>
> office: +1 855 846 7866 ext 292
>
> mobile: +1 650 430 1673
>
> On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey all,
>>
>>  I want to setup an environment where I can teach myself hadoop. Usually
>> the way I'll handle this is to grab a machine off the Amazon free tier and
>> setup whatever software I want.
>>
>> However I realize that Hadoop is a memory intensive, big data solution.
>> So what I'm wondering is, would a t2.micro instance be sufficient for
>> setting up a cluster of hadoop nodes with the intention of learning it? To
>> keep things running longer in the free tier I would either setup however
>> many nodes as I want and keep them stopped when I'm not actively using
>> them. Or just setup a few nodes with a few different accounts (with a
>> different gmail address for each one.. easy enough to do).
>>
>> Failing that, what are some other free/cheap solutions for setting up a
>> hadoop learning environment?
>>
>> Thanks,
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by Leonid Fedotov <lf...@hortonworks.com>.
Tim,
download Sandbox from http://hortonworks/com
You will have everything needed in a small VM instance which will run on
your home desktop.


*Thank you!*


*Sincerely,*

*Leonid Fedotov*

Systems Architect - Professional Services

lfedotov@hortonworks.com

office: +1 855 846 7866 ext 292

mobile: +1 650 430 1673

On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by Jim Colestock <jc...@ramblingredneck.com>.
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you can pull down to play with, if you’re looking just for something small to get you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
> 
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net <http://pool.sks-keyservers.net/> --recv-keys F186197B
> 


Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi tim.  Id suggest using apache bigtop for this.

BigTop integrates the hadoop ecosystem into a single upstream distribution,
packages everything, curates smoke tests, vagrant, docker recipes for
deployment.
Also, we curate a blueprint hadoop application (bigpetstore) which you
build yourself, easily, and can run to generate, process, and visualize the
bigdata ecosystem.

You can also easily deploy bigtop onto ec2 if you want to pay for it .




On Tue, Nov 4, 2014 at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by Leonid Fedotov <lf...@hortonworks.com>.
Tim,
download Sandbox from http://hortonworks/com
You will have everything needed in a small VM instance which will run on
your home desktop.


*Thank you!*


*Sincerely,*

*Leonid Fedotov*

Systems Architect - Professional Services

lfedotov@hortonworks.com

office: +1 855 846 7866 ext 292

mobile: +1 650 430 1673

On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by Jim Colestock <jc...@ramblingredneck.com>.
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you can pull down to play with, if you’re looking just for something small to get you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
> 
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net <http://pool.sks-keyservers.net/> --recv-keys F186197B
> 


Re: Hadoop Learning Environment

Posted by Leonid Fedotov <lf...@hortonworks.com>.
Tim,
download Sandbox from http://hortonworks/com
You will have everything needed in a small VM instance which will run on
your home desktop.


*Thank you!*


*Sincerely,*

*Leonid Fedotov*

Systems Architect - Professional Services

lfedotov@hortonworks.com

office: +1 855 846 7866 ext 292

mobile: +1 650 430 1673

On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by Jim Colestock <jc...@ramblingredneck.com>.
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you can pull down to play with, if you’re looking just for something small to get you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



> On Nov 4, 2014, at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:
> 
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the way I'll handle this is to grab a machine off the Amazon free tier and setup whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So what I'm wondering is, would a t2.micro instance be sufficient for setting up a cluster of hadoop nodes with the intention of learning it? To keep things running longer in the free tier I would either setup however many nodes as I want and keep them stopped when I'm not actively using them. Or just setup a few nodes with a few different accounts (with a different gmail address for each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net <http://pool.sks-keyservers.net/> --recv-keys F186197B
> 


Re: Hadoop Learning Environment

Posted by Leonid Fedotov <lf...@hortonworks.com>.
Tim,
download Sandbox from http://hortonworks/com
You will have everything needed in a small VM instance which will run on
your home desktop.


*Thank you!*


*Sincerely,*

*Leonid Fedotov*

Systems Architect - Professional Services

lfedotov@hortonworks.com

office: +1 855 846 7866 ext 292

mobile: +1 650 430 1673

On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi tim.  Id suggest using apache bigtop for this.

BigTop integrates the hadoop ecosystem into a single upstream distribution,
packages everything, curates smoke tests, vagrant, docker recipes for
deployment.
Also, we curate a blueprint hadoop application (bigpetstore) which you
build yourself, easily, and can run to generate, process, and visualize the
bigdata ecosystem.

You can also easily deploy bigtop onto ec2 if you want to pay for it .




On Tue, Nov 4, 2014 at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi tim.  Id suggest using apache bigtop for this.

BigTop integrates the hadoop ecosystem into a single upstream distribution,
packages everything, curates smoke tests, vagrant, docker recipes for
deployment.
Also, we curate a blueprint hadoop application (bigpetstore) which you
build yourself, easily, and can run to generate, process, and visualize the
bigdata ecosystem.

You can also easily deploy bigtop onto ec2 if you want to pay for it .




On Tue, Nov 4, 2014 at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
jay vyas

Re: Hadoop Learning Environment

Posted by jay vyas <ja...@gmail.com>.
Hi tim.  Id suggest using apache bigtop for this.

BigTop integrates the hadoop ecosystem into a single upstream distribution,
packages everything, curates smoke tests, vagrant, docker recipes for
deployment.
Also, we curate a blueprint hadoop application (bigpetstore) which you
build yourself, easily, and can run to generate, process, and visualize the
bigdata ecosystem.

You can also easily deploy bigtop onto ec2 if you want to pay for it .




On Tue, Nov 4, 2014 at 2:28 PM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey all,
>
>  I want to setup an environment where I can teach myself hadoop. Usually
> the way I'll handle this is to grab a machine off the Amazon free tier and
> setup whatever software I want.
>
> However I realize that Hadoop is a memory intensive, big data solution. So
> what I'm wondering is, would a t2.micro instance be sufficient for setting
> up a cluster of hadoop nodes with the intention of learning it? To keep
> things running longer in the free tier I would either setup however many
> nodes as I want and keep them stopped when I'm not actively using them. Or
> just setup a few nodes with a few different accounts (with a different
> gmail address for each one.. easy enough to do).
>
> Failing that, what are some other free/cheap solutions for setting up a
> hadoop learning environment?
>
> Thanks,
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
jay vyas