You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by John Conwell <jo...@iamjohn.me> on 2011/10/05 18:38:10 UTC

AMIs to use when creating hadoop cluster with whirr

I'm having stability issues (data nodes constantly failing under very little
load) on the hadoop clusters I'm creating, and I'm trying to figure out the
best practice for creating the most stable hadoop environment on EC2.

In order to run the cdh install and config scripts, I'm
setting whirr.hadoop-install-function to install_cdh_hadoop, and
whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
cloudera AMIs as well as the cloudera install and config scripts.

Are they any best practices for how to setup a cloudera distribution of
hadoop on EC2?

-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
sure thing!

On Wed, Oct 5, 2011 at 1:32 PM, Andrei Savu <sa...@gmail.com> wrote:

> I understand. From my point of view this is a bug we should fix. Can you
> open an issue?
>
> On Wed, Oct 5, 2011 at 11:25 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> I thought about that, but the hadoop-site.xml created by whirr has some of
>> the info needed, but its not the full set of xml elements that get written
>> to the *-site.xml files on the hadoop cluster.   For example whirr sets *
>> mapred.reduce.tasks* based on the number task trackers, which is vital
>> for the job configuration to have.  But the hadoop-size.xml doesnt have this
>> value.  It only has the core properties needed to allow you to use the ssh
>> proxy to interact with the name node and job tracker
>>
>>
>>
>> On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>> The files are also created on the local machine in ~/.whirr/cluster-name/
>>> so it shouldn't be that hard. The only tricky part is to match the Hadoop
>>> version from my point of view.
>>>
>>> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>
>>>> This whole scenario does bring up the question about how people handle
>>>> this kind of scenario.  To me the beauty of whirr is that it means I can
>>>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>>>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>>>> problem, gather my data, kill the cluster, workflow goes on.
>>>>
>>>> But if my workflow requires the contents of three little files located
>>>> on a different machine, in a different cluster, and possible a different
>>>> cloud vendor, that really puts a damper on the whimsical on-the-flyness of
>>>> creating hadoop resources only when needed.  I'm curious how other people
>>>> are handling this scenario.
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>>>>
>>>>> Awesome! I'm glad we figured this out, I was getting worried that we
>>>>> have a critical bug.
>>>>>
>>>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>>
>>>>>> Ok...I think I figured it out.  This email thread made me take a look
>>>>>> at how I'm kicking off my hadoop job.  My hadoop driver, the class that
>>>>>> links a bunch of jobs together in a workflow, is on a different machine than
>>>>>> the cluster that hadoop is running on.  This means when I create a new
>>>>>> Configuration() object it, it tries to load the default hadoop values from
>>>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>>>> the default vales...config for suck.
>>>>>>
>>>>>> So I copied the *-site.xml files from my namenode over to the machine
>>>>>> my hadoop job driver was running from and put it in the class path, and
>>>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me>wrote:
>>>>>>>
>>>>>>>> It looks like hadoop is reading default configuration values from
>>>>>>>> somewhere and using them, and not reading from
>>>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>>>
>>>>>>>
>>>>>>> If you are running CDH the config files are in:
>>>>>>>
>>>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks,
>>>>>> John C
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> John C
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
Any type should work. We can change it later.

On Thu, Oct 6, 2011 at 12:07 AM, John Conwell <jo...@iamjohn.me> wrote:

> Do you guys want it logged as a bug, feature, improvement?  Does it matter?
>
>
> On Wed, Oct 5, 2011 at 1:32 PM, Andrei Savu <sa...@gmail.com> wrote:
>
>> I understand. From my point of view this is a bug we should fix. Can you
>> open an issue?
>>
>> On Wed, Oct 5, 2011 at 11:25 PM, John Conwell <jo...@iamjohn.me> wrote:
>>
>>> I thought about that, but the hadoop-site.xml created by whirr has some
>>> of the info needed, but its not the full set of xml elements that get
>>> written to the *-site.xml files on the hadoop cluster.   For example whirr
>>> sets *mapred.reduce.tasks* based on the number task trackers, which is
>>> vital for the job configuration to have.  But the hadoop-size.xml doesnt
>>> have this value.  It only has the core properties needed to allow you to use
>>> the ssh proxy to interact with the name node and job tracker
>>>
>>>
>>>
>>> On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <sa...@gmail.com>wrote:
>>>
>>>> The files are also created on the local machine in
>>>> ~/.whirr/cluster-name/ so it shouldn't be that hard. The only tricky part is
>>>> to match the Hadoop version from my point of view.
>>>>
>>>> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>
>>>>> This whole scenario does bring up the question about how people handle
>>>>> this kind of scenario.  To me the beauty of whirr is that it means I can
>>>>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>>>>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>>>>> problem, gather my data, kill the cluster, workflow goes on.
>>>>>
>>>>> But if my workflow requires the contents of three little files located
>>>>> on a different machine, in a different cluster, and possible a different
>>>>> cloud vendor, that really puts a damper on the whimsical on-the-flyness of
>>>>> creating hadoop resources only when needed.  I'm curious how other people
>>>>> are handling this scenario.
>>>>>
>>>>>
>>>>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Awesome! I'm glad we figured this out, I was getting worried that we
>>>>>> have a critical bug.
>>>>>>
>>>>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me>wrote:
>>>>>>
>>>>>>> Ok...I think I figured it out.  This email thread made me take a look
>>>>>>> at how I'm kicking off my hadoop job.  My hadoop driver, the class that
>>>>>>> links a bunch of jobs together in a workflow, is on a different machine than
>>>>>>> the cluster that hadoop is running on.  This means when I create a new
>>>>>>> Configuration() object it, it tries to load the default hadoop values from
>>>>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>>>>> the default vales...config for suck.
>>>>>>>
>>>>>>> So I copied the *-site.xml files from my namenode over to the machine
>>>>>>> my hadoop job driver was running from and put it in the class path, and
>>>>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me>wrote:
>>>>>>>>
>>>>>>>>> It looks like hadoop is reading default configuration values from
>>>>>>>>> somewhere and using them, and not reading from
>>>>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you are running CDH the config files are in:
>>>>>>>>
>>>>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks,
>>>>>>> John C
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks,
>>>>> John C
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
Do you guys want it logged as a bug, feature, improvement?  Does it matter?

On Wed, Oct 5, 2011 at 1:32 PM, Andrei Savu <sa...@gmail.com> wrote:

> I understand. From my point of view this is a bug we should fix. Can you
> open an issue?
>
> On Wed, Oct 5, 2011 at 11:25 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> I thought about that, but the hadoop-site.xml created by whirr has some of
>> the info needed, but its not the full set of xml elements that get written
>> to the *-site.xml files on the hadoop cluster.   For example whirr sets *
>> mapred.reduce.tasks* based on the number task trackers, which is vital
>> for the job configuration to have.  But the hadoop-size.xml doesnt have this
>> value.  It only has the core properties needed to allow you to use the ssh
>> proxy to interact with the name node and job tracker
>>
>>
>>
>> On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>> The files are also created on the local machine in ~/.whirr/cluster-name/
>>> so it shouldn't be that hard. The only tricky part is to match the Hadoop
>>> version from my point of view.
>>>
>>> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>
>>>> This whole scenario does bring up the question about how people handle
>>>> this kind of scenario.  To me the beauty of whirr is that it means I can
>>>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>>>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>>>> problem, gather my data, kill the cluster, workflow goes on.
>>>>
>>>> But if my workflow requires the contents of three little files located
>>>> on a different machine, in a different cluster, and possible a different
>>>> cloud vendor, that really puts a damper on the whimsical on-the-flyness of
>>>> creating hadoop resources only when needed.  I'm curious how other people
>>>> are handling this scenario.
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>>>>
>>>>> Awesome! I'm glad we figured this out, I was getting worried that we
>>>>> have a critical bug.
>>>>>
>>>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>>
>>>>>> Ok...I think I figured it out.  This email thread made me take a look
>>>>>> at how I'm kicking off my hadoop job.  My hadoop driver, the class that
>>>>>> links a bunch of jobs together in a workflow, is on a different machine than
>>>>>> the cluster that hadoop is running on.  This means when I create a new
>>>>>> Configuration() object it, it tries to load the default hadoop values from
>>>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>>>> the default vales...config for suck.
>>>>>>
>>>>>> So I copied the *-site.xml files from my namenode over to the machine
>>>>>> my hadoop job driver was running from and put it in the class path, and
>>>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me>wrote:
>>>>>>>
>>>>>>>> It looks like hadoop is reading default configuration values from
>>>>>>>> somewhere and using them, and not reading from
>>>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>>>
>>>>>>>
>>>>>>> If you are running CDH the config files are in:
>>>>>>>
>>>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks,
>>>>>> John C
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> John C
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
I understand. From my point of view this is a bug we should fix. Can you
open an issue?

On Wed, Oct 5, 2011 at 11:25 PM, John Conwell <jo...@iamjohn.me> wrote:

> I thought about that, but the hadoop-site.xml created by whirr has some of
> the info needed, but its not the full set of xml elements that get written
> to the *-site.xml files on the hadoop cluster.   For example whirr sets *
> mapred.reduce.tasks* based on the number task trackers, which is vital for
> the job configuration to have.  But the hadoop-size.xml doesnt have this
> value.  It only has the core properties needed to allow you to use the ssh
> proxy to interact with the name node and job tracker
>
>
>
> On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <sa...@gmail.com> wrote:
>
>> The files are also created on the local machine in ~/.whirr/cluster-name/
>> so it shouldn't be that hard. The only tricky part is to match the Hadoop
>> version from my point of view.
>>
>> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:
>>
>>> This whole scenario does bring up the question about how people handle
>>> this kind of scenario.  To me the beauty of whirr is that it means I can
>>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>>> problem, gather my data, kill the cluster, workflow goes on.
>>>
>>> But if my workflow requires the contents of three little files located on
>>> a different machine, in a different cluster, and possible a different cloud
>>> vendor, that really puts a damper on the whimsical on-the-flyness of
>>> creating hadoop resources only when needed.  I'm curious how other people
>>> are handling this scenario.
>>>
>>>
>>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>>>
>>>> Awesome! I'm glad we figured this out, I was getting worried that we
>>>> have a critical bug.
>>>>
>>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>
>>>>> Ok...I think I figured it out.  This email thread made me take a look
>>>>> at how I'm kicking off my hadoop job.  My hadoop driver, the class that
>>>>> links a bunch of jobs together in a workflow, is on a different machine than
>>>>> the cluster that hadoop is running on.  This means when I create a new
>>>>> Configuration() object it, it tries to load the default hadoop values from
>>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>>> the default vales...config for suck.
>>>>>
>>>>> So I copied the *-site.xml files from my namenode over to the machine
>>>>> my hadoop job driver was running from and put it in the class path, and
>>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>>>
>>>>>>> It looks like hadoop is reading default configuration values from
>>>>>>> somewhere and using them, and not reading from
>>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>>
>>>>>>
>>>>>> If you are running CDH the config files are in:
>>>>>>
>>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks,
>>>>> John C
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
I thought about that, but the hadoop-site.xml created by whirr has some of
the info needed, but its not the full set of xml elements that get written
to the *-site.xml files on the hadoop cluster.   For example whirr sets *
mapred.reduce.tasks* based on the number task trackers, which is vital for
the job configuration to have.  But the hadoop-size.xml doesnt have this
value.  It only has the core properties needed to allow you to use the ssh
proxy to interact with the name node and job tracker


On Wed, Oct 5, 2011 at 1:11 PM, Andrei Savu <sa...@gmail.com> wrote:

> The files are also created on the local machine in ~/.whirr/cluster-name/
> so it shouldn't be that hard. The only tricky part is to match the Hadoop
> version from my point of view.
>
> On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> This whole scenario does bring up the question about how people handle
>> this kind of scenario.  To me the beauty of whirr is that it means I can
>> spin up and down hadoop clusters on the fly when my workflow demands it.  If
>> a task gets q'd up that needs mapreduce, I spin up a cluster, solve my
>> problem, gather my data, kill the cluster, workflow goes on.
>>
>> But if my workflow requires the contents of three little files located on
>> a different machine, in a different cluster, and possible a different cloud
>> vendor, that really puts a damper on the whimsical on-the-flyness of
>> creating hadoop resources only when needed.  I'm curious how other people
>> are handling this scenario.
>>
>>
>> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>> Awesome! I'm glad we figured this out, I was getting worried that we have
>>> a critical bug.
>>>
>>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>
>>>> Ok...I think I figured it out.  This email thread made me take a look at
>>>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
>>>> bunch of jobs together in a workflow, is on a different machine than the
>>>> cluster that hadoop is running on.  This means when I create a new
>>>> Configuration() object it, it tries to load the default hadoop values from
>>>> the class path, but since the driver isnt running on the hadoop cluster and
>>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>>> the default vales...config for suck.
>>>>
>>>> So I copied the *-site.xml files from my namenode over to the machine my
>>>> hadoop job driver was running from and put it in the class path, and
>>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>>
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>>
>>>>>
>>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>>
>>>>>> It looks like hadoop is reading default configuration values from
>>>>>> somewhere and using them, and not reading from
>>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>>
>>>>>
>>>>> If you are running CDH the config files are in:
>>>>>
>>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> John C
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
The files are also created on the local machine in ~/.whirr/cluster-name/ so
it shouldn't be that hard. The only tricky part is to match the Hadoop
version from my point of view.

On Wed, Oct 5, 2011 at 11:01 PM, John Conwell <jo...@iamjohn.me> wrote:

> This whole scenario does bring up the question about how people handle this
> kind of scenario.  To me the beauty of whirr is that it means I can spin up
> and down hadoop clusters on the fly when my workflow demands it.  If a task
> gets q'd up that needs mapreduce, I spin up a cluster, solve my problem,
> gather my data, kill the cluster, workflow goes on.
>
> But if my workflow requires the contents of three little files located on a
> different machine, in a different cluster, and possible a different cloud
> vendor, that really puts a damper on the whimsical on-the-flyness of
> creating hadoop resources only when needed.  I'm curious how other people
> are handling this scenario.
>
>
> On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com>wrote:
>
>> Awesome! I'm glad we figured this out, I was getting worried that we have
>> a critical bug.
>>
>> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>>
>>> Ok...I think I figured it out.  This email thread made me take a look at
>>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
>>> bunch of jobs together in a workflow, is on a different machine than the
>>> cluster that hadoop is running on.  This means when I create a new
>>> Configuration() object it, it tries to load the default hadoop values from
>>> the class path, but since the driver isnt running on the hadoop cluster and
>>> doesnt have access to the hadoop cluster's configuration files, it just uses
>>> the default vales...config for suck.
>>>
>>> So I copied the *-site.xml files from my namenode over to the machine my
>>> hadoop job driver was running from and put it in the class path, and
>>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>>
>>>
>>>
>>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>>
>>>>> It looks like hadoop is reading default configuration values from
>>>>> somewhere and using them, and not reading from
>>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>>
>>>>
>>>> If you are running CDH the config files are in:
>>>>
>>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
This whole scenario does bring up the question about how people handle this
kind of scenario.  To me the beauty of whirr is that it means I can spin up
and down hadoop clusters on the fly when my workflow demands it.  If a task
gets q'd up that needs mapreduce, I spin up a cluster, solve my problem,
gather my data, kill the cluster, workflow goes on.

But if my workflow requires the contents of three little files located on a
different machine, in a different cluster, and possible a different cloud
vendor, that really puts a damper on the whimsical on-the-flyness of
creating hadoop resources only when needed.  I'm curious how other people
are handling this scenario.


On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com> wrote:

> Awesome! I'm glad we figured this out, I was getting worried that we have a
> critical bug.
>
> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> Ok...I think I figured it out.  This email thread made me take a look at
>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
>> bunch of jobs together in a workflow, is on a different machine than the
>> cluster that hadoop is running on.  This means when I create a new
>> Configuration() object it, it tries to load the default hadoop values from
>> the class path, but since the driver isnt running on the hadoop cluster and
>> doesnt have access to the hadoop cluster's configuration files, it just uses
>> the default vales...config for suck.
>>
>> So I copied the *-site.xml files from my namenode over to the machine my
>> hadoop job driver was running from and put it in the class path, and
>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>
>>
>>
>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>>
>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>
>>>> It looks like hadoop is reading default configuration values from
>>>> somewhere and using them, and not reading from
>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>
>>>
>>> If you are running CDH the config files are in:
>>>
>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>
>>>
>>>
>>>
>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
No...just critical stupidity on my part :)

On Wed, Oct 5, 2011 at 12:45 PM, Andrei Savu <sa...@gmail.com> wrote:

> Awesome! I'm glad we figured this out, I was getting worried that we have a
> critical bug.
>
> On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> Ok...I think I figured it out.  This email thread made me take a look at
>> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
>> bunch of jobs together in a workflow, is on a different machine than the
>> cluster that hadoop is running on.  This means when I create a new
>> Configuration() object it, it tries to load the default hadoop values from
>> the class path, but since the driver isnt running on the hadoop cluster and
>> doesnt have access to the hadoop cluster's configuration files, it just uses
>> the default vales...config for suck.
>>
>> So I copied the *-site.xml files from my namenode over to the machine my
>> hadoop job driver was running from and put it in the class path, and
>> shazam...it picked up the hadoop config that whirr created for me.  yay!
>>
>>
>>
>> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>>
>>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>>
>>>> It looks like hadoop is reading default configuration values from
>>>> somewhere and using them, and not reading from
>>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>>
>>>
>>> If you are running CDH the config files are in:
>>>
>>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>>
>>>
>>>
>>>
>>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
Awesome! I'm glad we figured this out, I was getting worried that we have a
critical bug.

On Wed, Oct 5, 2011 at 10:40 PM, John Conwell <jo...@iamjohn.me> wrote:

> Ok...I think I figured it out.  This email thread made me take a look at
> how I'm kicking off my hadoop job.  My hadoop driver, the class that links a
> bunch of jobs together in a workflow, is on a different machine than the
> cluster that hadoop is running on.  This means when I create a new
> Configuration() object it, it tries to load the default hadoop values from
> the class path, but since the driver isnt running on the hadoop cluster and
> doesnt have access to the hadoop cluster's configuration files, it just uses
> the default vales...config for suck.
>
> So I copied the *-site.xml files from my namenode over to the machine my
> hadoop job driver was running from and put it in the class path, and
> shazam...it picked up the hadoop config that whirr created for me.  yay!
>
>
>
> On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com>wrote:
>
>>
>> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>>
>>> It looks like hadoop is reading default configuration values from
>>> somewhere and using them, and not reading from
>>> the /usr/lib/hadoop/conf/*-site.xml files.
>>>
>>
>> If you are running CDH the config files are in:
>>
>> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>>
>>
>>
>> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>>
>>
>>
>>
>>
>>
>>
>>
>> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>>
>>
>>
>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
Ok...I think I figured it out.  This email thread made me take a look at how
I'm kicking off my hadoop job.  My hadoop driver, the class that links a
bunch of jobs together in a workflow, is on a different machine than the
cluster that hadoop is running on.  This means when I create a new
Configuration() object it, it tries to load the default hadoop values from
the class path, but since the driver isnt running on the hadoop cluster and
doesnt have access to the hadoop cluster's configuration files, it just uses
the default vales...config for suck.

So I copied the *-site.xml files from my namenode over to the machine my
hadoop job driver was running from and put it in the class path, and
shazam...it picked up the hadoop config that whirr created for me.  yay!



On Wed, Oct 5, 2011 at 10:49 AM, Andrei Savu <sa...@gmail.com> wrote:

>
> On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:
>
>> It looks like hadoop is reading default configuration values from
>> somewhere and using them, and not reading from
>> the /usr/lib/hadoop/conf/*-site.xml files.
>>
>
> If you are running CDH the config files are in:
>
> HADOOP=hadoop-${HADOOP_VERSION:-0.20}
>
> HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist
>
>
>
>
> See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh
>
>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
On Wed, Oct 5, 2011 at 8:41 PM, John Conwell <jo...@iamjohn.me> wrote:

> It looks like hadoop is reading default configuration values from somewhere
> and using them, and not reading from the /usr/lib/hadoop/conf/*-site.xml
> files.
>

If you are running CDH the config files are in:

HADOOP=hadoop-${HADOOP_VERSION:-0.20}
HADOOP_CONF_DIR=/etc/$HADOOP/conf.dist


See https://github.com/apache/whirr/blob/trunk/services/cdh/src/main/resources/functions/configure_cdh_hadoop.sh

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
Ok, I just realized something that could be the issue.  I looked at the
different hadoop site xml files (core, hdfs, mapred) and compared their
values to the job configuration reported by the hadoop UI for a job that
ran.  And the values in the site xml files do not correspond with the job
configuration values.  For example, the /usr/lib/hadoop/conf/mapred-site.xml
has mapred.reduce.tasks set to 12, but if I look at the job configuration
for any of my completed jobs, this value is set to 1.

It looks like hadoop is reading default configuration values from somewhere
and using them, and not reading from the /usr/lib/hadoop/conf/*-site.xml
files.

Any ideas?


On Wed, Oct 5, 2011 at 10:25 AM, Andrei Savu <sa...@gmail.com> wrote:

> From here:
> http://developer.yahoo.com/hadoop/tutorial/module7.html
>
> "With multiple racks of servers, RPC timeouts may become more frequent.
> The NameNode takes a continual census of DataNodes and their health via
> heartbeat messages sent every few seconds. A similar timeout mechanism
> exists on the MapReduce side with the JobTracker. With many racks of
> machines, they may force one another to timeout because the master node is
> not handling them fast enough. The following options increase the number of
> threads on the master machine dedicated to handling RPC's from slave nodes:"
> ( I think this is also true for AWS)
>
> Proposed solution is:
>
>  <property>
>     <name>dfs.namenode.handler.count</name>
>     <value>40</value>
>   </property>
>   <property>
>     <name>mapred.job.tracker.handler.count</name>
>
>
>     <value>40</value>
>   </property>
>
>
> You can do this in Whirr by specifying:
>
> hadoop-dfs.dfs.namenode.handler.count=40
> hadoop-mapreduce.mapred.job.tracker.handler.count=40
>
> in the .properties file.
>
> Let me know if this works for you. We should probably use something like
> this by default.
>
> -- Andrei Savu
>
>
> On Wed, Oct 5, 2011 at 8:15 PM, Andrei Savu <sa...@gmail.com> wrote:
>
>> Looks like a network congestion issue to me. I don't know how to do this
>> but I would try to increase the heartbeat timeout.
>>
>> Tom any ideas? Have you seen this before on aws?
>>
>> I don't think there is something wrong with the AMI, I suspect there is
>> something wrong with the Hadoop configuration.
>>
>>
>> On Wednesday, October 5, 2011, John Conwell wrote:
>>
>>> It starts with hadoop reporting bocks of data being 'lost', then
>>> individual data nodes stop responding, the individual data nodes get taken
>>> off line, then jobs get killed, then data nodes come back on line and the
>>> data blocks get replicated back out the correct replication factor.
>>>
>>> The end result are about 80% of the time, my hadoop jobs get killed
>>> because some task fails 3 times in a row, but about an hour after the job
>>> gets killed, all data nodes are back online and all data is fully
>>> replicated.
>>>
>>> Before I go rat holing down "why are my data nodes going down", I want to
>>> cover the easy scenarios like "oh yea...your totally misconfigured.  You
>>> should use ABC ami with the cloudera install and config scripts".  Basically
>>> validate if there are any best practices for setting up a cloudera
>>> distribution of hadoop on EC2.
>>>
>>> I know cloudera has created their own AMIs.  Should I be using them?
>>>  Does it matter?
>>>
>>>
>>>
>>> On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu <sa...@gmail.com>wrote:
>>>
>>>> What do you mean by failing? Is the Hadoop daemon shutting down or the
>>>> machine as a whole?
>>>>
>>>> On Wednesday, October 5, 2011, John Conwell wrote:
>>>>
>>>>> I'm having stability issues (data nodes constantly failing under very
>>>>> little load) on the hadoop clusters I'm creating, and I'm trying to figure
>>>>> out the best practice for creating the most stable hadoop environment on
>>>>> EC2.
>>>>>
>>>>> In order to run the cdh install and config scripts, I'm
>>>>> setting whirr.hadoop-install-function to install_cdh_hadoop, and
>>>>> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
>>>>> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
>>>>> cloudera AMIs as well as the cloudera install and config scripts.
>>>>>
>>>>> Are they any best practices for how to setup a cloudera distribution of
>>>>> hadoop on EC2?
>>>>>
>>>>> --
>>>>>
>>>>> Thanks,
>>>>> John C
>>>>>
>>>>>
>>>>
>>>> --
>>>> -- Andrei Savu / andreisavu.ro
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>> --
>> -- Andrei Savu / andreisavu.ro
>>
>>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
>From here:
http://developer.yahoo.com/hadoop/tutorial/module7.html

"With multiple racks of servers, RPC timeouts may become more frequent. The
NameNode takes a continual census of DataNodes and their health via
heartbeat messages sent every few seconds. A similar timeout mechanism
exists on the MapReduce side with the JobTracker. With many racks of
machines, they may force one another to timeout because the master node is
not handling them fast enough. The following options increase the number of
threads on the master machine dedicated to handling RPC's from slave nodes:"
( I think this is also true for AWS)

Proposed solution is:


 <property>
    <name>dfs.namenode.handler.count</name>
    <value>40</value>
  </property>
  <property>
    <name>mapred.job.tracker.handler.count</name>

    <value>40</value>
  </property>


You can do this in Whirr by specifying:

hadoop-dfs.dfs.namenode.handler.count=40
hadoop-mapreduce.mapred.job.tracker.handler.count=40

in the .properties file.

Let me know if this works for you. We should probably use something like
this by default.

-- Andrei Savu

On Wed, Oct 5, 2011 at 8:15 PM, Andrei Savu <sa...@gmail.com> wrote:

> Looks like a network congestion issue to me. I don't know how to do this
> but I would try to increase the heartbeat timeout.
>
> Tom any ideas? Have you seen this before on aws?
>
> I don't think there is something wrong with the AMI, I suspect there is
> something wrong with the Hadoop configuration.
>
>
> On Wednesday, October 5, 2011, John Conwell wrote:
>
>> It starts with hadoop reporting bocks of data being 'lost', then
>> individual data nodes stop responding, the individual data nodes get taken
>> off line, then jobs get killed, then data nodes come back on line and the
>> data blocks get replicated back out the correct replication factor.
>>
>> The end result are about 80% of the time, my hadoop jobs get killed
>> because some task fails 3 times in a row, but about an hour after the job
>> gets killed, all data nodes are back online and all data is fully
>> replicated.
>>
>> Before I go rat holing down "why are my data nodes going down", I want to
>> cover the easy scenarios like "oh yea...your totally misconfigured.  You
>> should use ABC ami with the cloudera install and config scripts".  Basically
>> validate if there are any best practices for setting up a cloudera
>> distribution of hadoop on EC2.
>>
>> I know cloudera has created their own AMIs.  Should I be using them?  Does
>> it matter?
>>
>>
>>
>> On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu <sa...@gmail.com>wrote:
>>
>>> What do you mean by failing? Is the Hadoop daemon shutting down or the
>>> machine as a whole?
>>>
>>> On Wednesday, October 5, 2011, John Conwell wrote:
>>>
>>>> I'm having stability issues (data nodes constantly failing under very
>>>> little load) on the hadoop clusters I'm creating, and I'm trying to figure
>>>> out the best practice for creating the most stable hadoop environment on
>>>> EC2.
>>>>
>>>> In order to run the cdh install and config scripts, I'm
>>>> setting whirr.hadoop-install-function to install_cdh_hadoop, and
>>>> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
>>>> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
>>>> cloudera AMIs as well as the cloudera install and config scripts.
>>>>
>>>> Are they any best practices for how to setup a cloudera distribution of
>>>> hadoop on EC2?
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> John C
>>>>
>>>>
>>>
>>> --
>>> -- Andrei Savu / andreisavu.ro
>>>
>>>
>>
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>
> --
> -- Andrei Savu / andreisavu.ro
>
>

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
Looks like a network congestion issue to me. I don't know how to do this but
I would try to increase the heartbeat timeout.

Tom any ideas? Have you seen this before on aws?

I don't think there is something wrong with the AMI, I suspect there is
something wrong with the Hadoop configuration.

On Wednesday, October 5, 2011, John Conwell wrote:

> It starts with hadoop reporting bocks of data being 'lost', then individual
> data nodes stop responding, the individual data nodes get taken off line,
> then jobs get killed, then data nodes come back on line and the data blocks
> get replicated back out the correct replication factor.
>
> The end result are about 80% of the time, my hadoop jobs get killed because
> some task fails 3 times in a row, but about an hour after the job gets
> killed, all data nodes are back online and all data is fully replicated.
>
> Before I go rat holing down "why are my data nodes going down", I want to
> cover the easy scenarios like "oh yea...your totally misconfigured.  You
> should use ABC ami with the cloudera install and config scripts".  Basically
> validate if there are any best practices for setting up a cloudera
> distribution of hadoop on EC2.
>
> I know cloudera has created their own AMIs.  Should I be using them?  Does
> it matter?
>
>
>
> On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu <savu.andrei@gmail.com<javascript:_e({}, 'cvml', 'savu.andrei@gmail.com');>
> > wrote:
>
>> What do you mean by failing? Is the Hadoop daemon shutting down or the
>> machine as a whole?
>>
>> On Wednesday, October 5, 2011, John Conwell wrote:
>>
>>> I'm having stability issues (data nodes constantly failing under very
>>> little load) on the hadoop clusters I'm creating, and I'm trying to figure
>>> out the best practice for creating the most stable hadoop environment on
>>> EC2.
>>>
>>> In order to run the cdh install and config scripts, I'm
>>> setting whirr.hadoop-install-function to install_cdh_hadoop, and
>>> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
>>> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
>>> cloudera AMIs as well as the cloudera install and config scripts.
>>>
>>> Are they any best practices for how to setup a cloudera distribution of
>>> hadoop on EC2?
>>>
>>> --
>>>
>>> Thanks,
>>> John C
>>>
>>>
>>
>> --
>> -- Andrei Savu / andreisavu.ro
>>
>>
>
>
> --
>
> Thanks,
> John C
>
>

-- 
-- Andrei Savu / andreisavu.ro

Re: AMIs to use when creating hadoop cluster with whirr

Posted by John Conwell <jo...@iamjohn.me>.
It starts with hadoop reporting bocks of data being 'lost', then individual
data nodes stop responding, the individual data nodes get taken off line,
then jobs get killed, then data nodes come back on line and the data blocks
get replicated back out the correct replication factor.

The end result are about 80% of the time, my hadoop jobs get killed because
some task fails 3 times in a row, but about an hour after the job gets
killed, all data nodes are back online and all data is fully replicated.

Before I go rat holing down "why are my data nodes going down", I want to
cover the easy scenarios like "oh yea...your totally misconfigured.  You
should use ABC ami with the cloudera install and config scripts".  Basically
validate if there are any best practices for setting up a cloudera
distribution of hadoop on EC2.

I know cloudera has created their own AMIs.  Should I be using them?  Does
it matter?



On Wed, Oct 5, 2011 at 9:43 AM, Andrei Savu <sa...@gmail.com> wrote:

> What do you mean by failing? Is the Hadoop daemon shutting down or the
> machine as a whole?
>
> On Wednesday, October 5, 2011, John Conwell wrote:
>
>> I'm having stability issues (data nodes constantly failing under very
>> little load) on the hadoop clusters I'm creating, and I'm trying to figure
>> out the best practice for creating the most stable hadoop environment on
>> EC2.
>>
>> In order to run the cdh install and config scripts, I'm
>> setting whirr.hadoop-install-function to install_cdh_hadoop, and
>> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
>> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
>> cloudera AMIs as well as the cloudera install and config scripts.
>>
>> Are they any best practices for how to setup a cloudera distribution of
>> hadoop on EC2?
>>
>> --
>>
>> Thanks,
>> John C
>>
>>
>
> --
> -- Andrei Savu / andreisavu.ro
>
>


-- 

Thanks,
John C

Re: AMIs to use when creating hadoop cluster with whirr

Posted by Andrei Savu <sa...@gmail.com>.
What do you mean by failing? Is the Hadoop daemon shutting down or the
machine as a whole?

On Wednesday, October 5, 2011, John Conwell wrote:

> I'm having stability issues (data nodes constantly failing under very
> little load) on the hadoop clusters I'm creating, and I'm trying to figure
> out the best practice for creating the most stable hadoop environment on
> EC2.
>
> In order to run the cdh install and config scripts, I'm
> setting whirr.hadoop-install-function to install_cdh_hadoop, and
> whirr.hadoop-configure-function to configure_cdh_hadoop.  But I'm using a
> plain jane ubuntu amd64 ami (ami-da0cf8b3).  Should I also be using the
> cloudera AMIs as well as the cloudera install and config scripts.
>
> Are they any best practices for how to setup a cloudera distribution of
> hadoop on EC2?
>
> --
>
> Thanks,
> John C
>
>

-- 
-- Andrei Savu / andreisavu.ro