You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by Benjamin Clark <be...@daltonclark.com> on 2011/03/16 17:54:58 UTC

aws 64-bit c1.xlarge problems

I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use

whirr.hardware-id=c1.xlarge

and then either this (from the recipe)
# Ubuntu 10.04 LTS Lucid. See http://alestic.com/
whirr.image-id=us-east-1/ami-da0cf8b3

or this:
# Amazon linux 64-bit, default as of 3/11:
whirr.image-id=us-east-1/ami-8e1fece7

I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).


My whole config file is this:

whirr.cluster-name=bhcL4
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
whirr.hadoop-install-function=install_cdh_hadoop
whirr.hadoop-configure-function=configure_cdh_hadoop
whirr.provider=aws-ec2
whirr.identity=...
whirr.credential=...
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
whirr.hardware-id=c1.xlarge
#whirr.hardware-id=c1.medium
# Ubuntu 10.04 LTS Lucid. See http://alestic.com/
whirr.image-id=us-east-1/ami-da0cf8b3
# Amazon linux as of 3/11:
#whirr.image-id=us-east-1/ami-8e1fece7
# If you choose a different location, make sure whirr.image-id is updated too
whirr.location-id=us-east-1d
hadoop-hdfs.dfs.permissions=false
hadoop-hdfs.dfs.replication=2



Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1

Re: aws 64-bit c1.xlarge problems

Posted by Andrei Savu <sa...@gmail.com>.
Thanks for sharing. I'm thinking about defining a set of of supported
AMIs / OSes for Whirr and test them all when building a new release.

On Fri, Mar 18, 2011 at 9:57 PM, Benjamin Clark <be...@daltonclark.com> wrote:
> I read the manual and figured a bit more of this out.  Amazon may change the
> defaults in their console without an announcement, but they document what
> they're doing here: http://aws.amazon.com/amazon-linux-ami/
> The /media/ephemeral0 is for one of their Amazon linux instances that has
> S3-backed non-durable storage.  It seems as if the ebs-backed ones have no
> non-durable storage by default, and the S3-backed ones do, but in that
> eccentric location (eccentric relative to what everybody else does on
> Amazon).  So if we like the S3-backed instances, we can hack the
> install_cdh_hadoop.sh script by adding
>     rm -rf /mnt
>     ln -s /media/ephemeral0 /mnt
> or we can write a whole thing that spins up an ebs volume per node and
> attaches it, for the ebs-backed ones.
> Is there any experience among the users as to which will be more stable and
> perform better?  I've got the S3-backed one working, so I'll use that and
> just bake it off against the Alestic/ubuntu system that now also works for
> me, unless there's a compelling case for the ebs-backed thing.
> --Ben
>
>
> Andrei,
>
> The release candidate code does work.  Perhaps something is different,
> relative to the patched frankenstein I was using, perhaps I had some local
> corruption or config problem.
>
> It sets up everything as whatever my local user is, by default, and the
> override as whirr.cluster-user works as well.
>
> In any case, at the rate AWS seems to be changing the configuration of
> 'amazon linux' perhaps it's less useful than I thought.  Last week the
> default amis in the console had a bunch of spare disk space on the
> /media/ephemeral0 partition, which I could symlink /mnt to in the
> install_cdh_hadoop.sh script, and then hdfs would have a decent amount of
> space.  Now there is no such thing, so I suppose I would have to launch an
> ebs volume per node and mount that.  This is now tipping over into the "too
> much trouble" zone for me.  And in the mean time I got all my native stuff
> (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to use the
> Alestic image from the recipe for a while.  If there's an obvious candidate
> up there for "reasonably-modern redhat derivative ami from a source on the
> good lists that behaves well," I'd like to know what it is.  By 'reasonably
> modern' I mean having default python >= 2.5.
>
> I liked the old custom of having /mnt be a separate partition of a decent
> size.  I hope this is just a glitch with AWS.  I suspect it may be because
> jclouds/whirr is showing (e.g.) in the output:
> volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false,
> isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc,
> durable=false, isBootDevice=false]
> So theoretically the disk space is still there on those non-boot,
> non-durable devices, but I cannot mount them.
>
>
> I also tried the cluster ami, because I am intrigued by the possibilities
> for good performance.  Sounds great for hadoop, doesn't it?  But it won't
> even start the nodes, giving this:
>
> Configuring template
> Unexpected error while starting 1 nodes, minimum 1 nodes for
> [hadoop-namenode, hadoop-jobtracker] of cluster bhcLA
> java.util.concurrent.ExecutionException:
> org.jclouds.http.HttpResponseException: command: POST
> https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1
> 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of
> 'hvm' currently may only be used with Cluster Compute instance types.]
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
> at
> org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
> at
> org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> Caused by: org.jclouds.http.HttpResponseException: command: POST
> https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1
> 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of
> 'hvm' currently may only be used with Cluster Compute instance types.]
> at
> org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)
>
> There must be something a bit more involved to specify cluster instances in
> the amazon api, perhaps not (yet) supported by jclouds?  I'm afraid I don't
> need this enough right now to justify digging further .
>
>
> Anyway, thanks for all your help and advice on this.
>
> --Ben
>
>
> On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:
>
> Strange! I will try your properties file tomorrow.
>
> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
>
> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
>
> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>
> Andrei,
>
> Thanks for looking at this.  Unfortunately it does not seem to work.
>
> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it
> to 'ben' or whatever else, I get this.
>
> 1) SshException on node us-east-1/i-62de280d:
>
> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to
> session.
>
>       at
> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>
>       at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>
>       at
> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>
>       at
> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>
> So it doesn't seem to be honoring that property, and it's definitely not
> allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>
> The ubuntu ami from the recipes continues to work fine.
>
> Here's the full config file I'm using.  I grabbed the recipe from trunk and
> put my stuff back in, to make sure I'm not missing a new setting:
>
> whirr.cluster-name=bhcTL
>
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2
> hadoop-datanode+hadoop-tasktracker
>
> whirr.hadoop-install-function=install_cdh_hadoop
>
> whirr.hadoop-configure-function=configure_cdh_hadoop
>
> whirr.provider=aws-ec2
>
> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>
> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>
> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>
> whirr.cluster-user=ben
>
> # Amazon linux 32-bit--works
>
> #whirr.hardware-id=c1.medium
>
> #whirr.image-id=us-east-1/ami-d59d6bbc
>
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>
> #whirr.hardware-id=c1.xlarge
>
> #whirr.image-id=us-east-1/ami-da0cf8b3
>
> # Amazon linux 64-bit as of 3/11:--doesn't work
>
> whirr.hardware-id=c1.xlarge
>
> whirr.image-id=us-east-1/ami-8e1fece7
>
> #Cluster compute --doesn't work
>
> #whirr.hardward-id=cc1.4xlarge
>
> #whirr.image-id=us-east-1/ami-321eed5b
>
> whirr.location-id=us-east-1d
>
> hadoop-hdfs.dfs.permissions=false
>
> hadoop-hdfs.dfs.replication=2
>
>
> --Ben
>
>
>
>
> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>
> Ben,  could you give it one more try using the current trunk?
>
> You can specify the user by setting the option whirr.cluster-user
>
> (defaults to current system user).
>
> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com>
> wrote:
>
> Andrei,
>
> Thanks.
>
> After patching with 158, it launches fine as me on that Ubuntu image from
> the recipe (i.e. on my client machine I am 'ben', so now the aws user that
> has sudo, and as whom I can log in is also 'ben'), so that looks good.
>
> But it's now doing this with amazon linux (ami-da0cf8b3, which was the
> default 64-bit ami a few days ago, and may still be) during launch:
>
> 1) SshException on node us-east-1/i-b2678ddd:
>
> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to
> session.
>
>       at
> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>
>       at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>
>       at
> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>
>       at
> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>
>       at
> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
> So it seems as if the key part of jclouds authentication setup is still
> failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the
> local user, but failing.
>
> Is there a property for the user it launches as?  Or does it just do
> whichever user you are locally, instead of ec2-user/ubuntu/root, depending
> on the default, as before?
>
> I can switch to ubuntu, but I have a fair amount of native code setup in my
> custom scripts and would prefer to stick with a redhattish version if
> possible.
>
> Looking ahead, I want to benchmark plain old 64-bit instances against
> cluster instances, to see if the allegedly improved networking gives us a
> boost, and the available ones I see are Suse and Amazon linux.  When I
> switch to the amazon linux one, like so:
>
> whirr.hardward-id=cc1.4xlarge
>
> whirr.image-id=us-east-1/ami-321eed5b
>
> I get different a different problem:
>
> Exception in thread "main" java.util.NoSuchElementException: hardwares don't
> support any images: [biggest=false, fastest=false, imageName=null,
> imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4,
> imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1,
> scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA],
> metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null,
> osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=,
> osArch=hvm, os64Bit=true, hardwareId=m1.small]
>
> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0,
> speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null,
> type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true],
> [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false,
> isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc,
> durable=false, isBootDevice=false]], supportsI
>
> but I imagine that if using cluster instances is going to be possible,
> support for amazon linux will be needed.
>
> --Ben
>
>
> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>
> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>
> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>
> you are seeing. We should be able to restart the vote for the 0.4.0
>
> release after fixing this issue.
>
> [0] https://issues.apache.org/jira/browse/WHIRR-264
>
> [1] https://issues.apache.org/jira/browse/WHIRR-158
>
> -- Andrei Savu / andreisavu.ro
>
> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>
> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon
> linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default
> for new amazon linux instances, a few days ago) with good success.  I took
> the default hadoop-ec2.properties recipe and modified it slightly to suit my
> needs.  I'm now trying with basically the same properties file, but when I
> use
>
> whirr.hardware-id=c1.xlarge
>
> and then either this (from the recipe)
>
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>
> whirr.image-id=us-east-1/ami-da0cf8b3
>
> or this:
>
> # Amazon linux 64-bit, default as of 3/11:
>
> whirr.image-id=us-east-1/ami-8e1fece7
>
> I get a a failure to install the right public key, so that I can't log into
> the name node (or any other nodes, for that matter).
>
>
> My whole config file is this:
>
> whirr.cluster-name=bhcL4
>
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4
> hadoop-datanode+hadoop-tasktracker
>
> whirr.hadoop-install-function=install_cdh_hadoop
>
> whirr.hadoop-configure-function=configure_cdh_hadoop
>
> whirr.provider=aws-ec2
>
> whirr.identity=...
>
> whirr.credential=...
>
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>
> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>
> whirr.hardware-id=c1.xlarge
>
> #whirr.hardware-id=c1.medium
>
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>
> whirr.image-id=us-east-1/ami-da0cf8b3
>
> # Amazon linux as of 3/11:
>
> #whirr.image-id=us-east-1/ami-8e1fece7
>
> # If you choose a different location, make sure whirr.image-id is updated
> too
>
> whirr.location-id=us-east-1d
>
> hadoop-hdfs.dfs.permissions=false
>
> hadoop-hdfs.dfs.replication=2
>
>
>
> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d
> and whirr.location-id=us-east-1
>
>
>
>
>
>
>
>

Re: aws 64-bit c1.xlarge problems

Posted by Benjamin Clark <be...@daltonclark.com>.
I read the manual and figured a bit more of this out.  Amazon may change the defaults in their console without an announcement, but they document what they're doing here: http://aws.amazon.com/amazon-linux-ami/

The /media/ephemeral0 is for one of their Amazon linux instances that has S3-backed non-durable storage.  It seems as if the ebs-backed ones have no non-durable storage by default, and the S3-backed ones do, but in that eccentric location (eccentric relative to what everybody else does on Amazon).  So if we like the S3-backed instances, we can hack the install_cdh_hadoop.sh script by adding

    rm -rf /mnt
    ln -s /media/ephemeral0 /mnt

or we can write a whole thing that spins up an ebs volume per node and attaches it, for the ebs-backed ones.

Is there any experience among the users as to which will be more stable and perform better?  I've got the S3-backed one working, so I'll use that and just bake it off against the Alestic/ubuntu system that now also works for me, unless there's a compelling case for the ebs-backed thing.

--Ben


> 
>> Andrei,
>> 
>> The release candidate code does work.  Perhaps something is different, relative to the patched frankenstein I was using, perhaps I had some local corruption or config problem.
>> 
>> It sets up everything as whatever my local user is, by default, and the override as whirr.cluster-user works as well.
>> 
>> In any case, at the rate AWS seems to be changing the configuration of 'amazon linux' perhaps it's less useful than I thought.  Last week the default amis in the console had a bunch of spare disk space on the /media/ephemeral0 partition, which I could symlink /mnt to in the install_cdh_hadoop.sh script, and then hdfs would have a decent amount of space.  Now there is no such thing, so I suppose I would have to launch an ebs volume per node and mount that.  This is now tipping over into the "too much trouble" zone for me.  And in the mean time I got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to use the Alestic image from the recipe for a while.  If there's an obvious candidate up there for "reasonably-modern redhat derivative ami from a source on the good lists that behaves well," I'd like to know what it is.  By 'reasonably modern' I mean having default python >= 2.5.
>> 
>> I liked the old custom of having /mnt be a separate partition of a decent size.  I hope this is just a glitch with AWS.  I suspect it may be because jclouds/whirr is showing (e.g.) in the output:
>> volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]
>> So theoretically the disk space is still there on those non-boot, non-durable devices, but I cannot mount them.  
>> 
>> 
>> I also tried the cluster ami, because I am intrigued by the possibilities for good performance.  Sounds great for hadoop, doesn't it?  But it won't even start the nodes, giving this:
>> 
>> Configuring template
>> Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, hadoop-jobtracker] of cluster bhcLA
>> java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
>> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> 	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
>> 	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
>> 	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
>> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> 	at java.lang.Thread.run(Thread.java:680)
>> Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
>> 	at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)
>> 
>> There must be something a bit more involved to specify cluster instances in the amazon api, perhaps not (yet) supported by jclouds?  I'm afraid I don't need this enough right now to justify digging further .
>> 
>> 
>> Anyway, thanks for all your help and advice on this.
>> 
>> --Ben
>> 
>> 
>> On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:
>> 
>>> Strange! I will try your properties file tomorrow.
>>> 
>>> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
>>> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
>>> 
>>> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>> Andrei,
>>>> 
>>>> Thanks for looking at this.  Unfortunately it does not seem to work.
>>>> 
>>>> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben' or whatever else, I get this.
>>>> 
>>>> 1) SshException on node us-east-1/i-62de280d:
>>>> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
>>>>       at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>>       at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>>       at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>>       at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>> 
>>>> So it doesn't seem to be honoring that property, and it's definitely not allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>>>> 
>>>> The ubuntu ami from the recipes continues to work fine.
>>>> 
>>>> Here's the full config file I'm using.  I grabbed the recipe from trunk and put my stuff back in, to make sure I'm not missing a new setting:
>>>> 
>>>> whirr.cluster-name=bhcTL
>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>> whirr.provider=aws-ec2
>>>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>>>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>>>> whirr.cluster-user=ben
>>>> # Amazon linux 32-bit--works
>>>> #whirr.hardware-id=c1.medium
>>>> #whirr.image-id=us-east-1/ami-d59d6bbc
>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>>>> #whirr.hardware-id=c1.xlarge
>>>> #whirr.image-id=us-east-1/ami-da0cf8b3
>>>> # Amazon linux 64-bit as of 3/11:--doesn't work
>>>> whirr.hardware-id=c1.xlarge
>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>> #Cluster compute --doesn't work
>>>> #whirr.hardward-id=cc1.4xlarge
>>>> #whirr.image-id=us-east-1/ami-321eed5b
>>>> whirr.location-id=us-east-1d
>>>> hadoop-hdfs.dfs.permissions=false
>>>> hadoop-hdfs.dfs.replication=2
>>>> 
>>>> 
>>>> --Ben
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>>>> 
>>>>> Ben,  could you give it one more try using the current trunk?
>>>>> 
>>>>> You can specify the user by setting the option whirr.cluster-user
>>>>> (defaults to current system user).
>>>>> 
>>>>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>>>> Andrei,
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.
>>>>>> 
>>>>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:
>>>>>> 
>>>>>> 1) SshException on node us-east-1/i-b2678ddd:
>>>>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>>>>>       at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>>>>       at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>>>>       at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>>>>       at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>>>>       at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>>>>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>> 
>>>>>> So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>>>>>> 
>>>>>> Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>>>>>> 
>>>>>> I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.
>>>>>> 
>>>>>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>>>>>> 
>>>>>> whirr.hardward-id=cc1.4xlarge
>>>>>> whirr.image-id=us-east-1/ami-321eed5b
>>>>>> 
>>>>>> I get different a different problem:
>>>>>> 
>>>>>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>>>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>>>>>> 
>>>>>> but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.
>>>>>> 
>>>>>> --Ben
>>>>>> 
>>>>>> 
>>>>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>>>> 
>>>>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>>>>> release after fixing this issue.
>>>>>>> 
>>>>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>>>> 
>>>>>>> -- Andrei Savu / andreisavu.ro
>>>>>>> 
>>>>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>>>>>>>> 
>>>>>>>> whirr.hardware-id=c1.xlarge
>>>>>>>> 
>>>>>>>> and then either this (from the recipe)
>>>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>>>> 
>>>>>>>> or this:
>>>>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>>>> 
>>>>>>>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> My whole config file is this:
>>>>>>>> 
>>>>>>>> whirr.cluster-name=bhcL4
>>>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>>>>> whirr.provider=aws-ec2
>>>>>>>> whirr.identity=...
>>>>>>>> whirr.credential=...
>>>>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>>>>> whirr.hardware-id=c1.xlarge
>>>>>>>> #whirr.hardware-id=c1.medium
>>>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>>>> # Amazon linux as of 3/11:
>>>>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>>>>> # If you choose a different location, make sure whirr.image-id is updated too
>>>>>>>> whirr.location-id=us-east-1d
>>>>>>>> hadoop-hdfs.dfs.permissions=false
>>>>>>>> hadoop-hdfs.dfs.replication=2
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
> 


Re: aws 64-bit c1.xlarge problems

Posted by Benjamin Clark <be...@daltonclark.com>.
Andrei,

The release candidate code does work.  Perhaps something is different, relative to the patched frankenstein I was using, perhaps I had some local corruption or config problem.

It sets up everything as whatever my local user is, by default, and the override as whirr.cluster-user works as well.

In any case, at the rate AWS seems to be changing the configuration of 'amazon linux' perhaps it's less useful than I thought.  Last week the default amis in the console had a bunch of spare disk space on the /media/ephemeral0 partition, which I could symlink /mnt to in the install_cdh_hadoop.sh script, and then hdfs would have a decent amount of space.  Now there is no such thing, so I suppose I would have to launch an ebs volume per node and mount that.  This is now tipping over into the "too much trouble" zone for me.  And in the mean time I got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to use the Alestic image from the recipe for a while.  If there's an obvious candidate up there for "reasonably-modern redhat derivative ami from a source on the good lists that behaves well," I'd like to know what it is.  By 'reasonably modern' I mean having default python >= 2.5.

I liked the old custom of having /mnt be a separate partition of a decent size.  I hope this is just a glitch with AWS.  I suspect it may be because jclouds/whirr is showing (e.g.) in the output:
volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]
So theoretically the disk space is still there on those non-boot, non-durable devices, but I cannot mount them.  


I also tried the cluster ami, because I am intrigued by the possibilities for good performance.  Sounds great for hadoop, doesn't it?  But it won't even start the nodes, giving this:

Configuring template
Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, hadoop-jobtracker] of cluster bhcLA
java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
	at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)

There must be something a bit more involved to specify cluster instances in the amazon api, perhaps not (yet) supported by jclouds?  I'm afraid I don't need this enough right now to justify digging further .


Anyway, thanks for all your help and advice on this.

--Ben


On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:

> Strange! I will try your properties file tomorrow.
> 
> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
> 
> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>> Andrei,
>> 
>> Thanks for looking at this.  Unfortunately it does not seem to work.
>> 
>> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben' or whatever else, I get this.
>> 
>> 1) SshException on node us-east-1/i-62de280d:
>> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>> 
>> So it doesn't seem to be honoring that property, and it's definitely not allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>> 
>> The ubuntu ami from the recipes continues to work fine.
>> 
>> Here's the full config file I'm using.  I grabbed the recipe from trunk and put my stuff back in, to make sure I'm not missing a new setting:
>> 
>> whirr.cluster-name=bhcTL
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>> whirr.cluster-user=ben
>> # Amazon linux 32-bit--works
>> #whirr.hardware-id=c1.medium
>> #whirr.image-id=us-east-1/ami-d59d6bbc
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>> #whirr.hardware-id=c1.xlarge
>> #whirr.image-id=us-east-1/ami-da0cf8b3
>> # Amazon linux 64-bit as of 3/11:--doesn't work
>> whirr.hardware-id=c1.xlarge
>> whirr.image-id=us-east-1/ami-8e1fece7
>> #Cluster compute --doesn't work
>> #whirr.hardward-id=cc1.4xlarge
>> #whirr.image-id=us-east-1/ami-321eed5b
>> whirr.location-id=us-east-1d
>> hadoop-hdfs.dfs.permissions=false
>> hadoop-hdfs.dfs.replication=2
>> 
>> 
>> --Ben
>> 
>> 
>> 
>> 
>> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>> 
>>> Ben,  could you give it one more try using the current trunk?
>>> 
>>> You can specify the user by setting the option whirr.cluster-user
>>> (defaults to current system user).
>>> 
>>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>> Andrei,
>>>> 
>>>> Thanks.
>>>> 
>>>> After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.
>>>> 
>>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:
>>>> 
>>>> 1) SshException on node us-east-1/i-b2678ddd:
>>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>> 
>>>> So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>>>> 
>>>> Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>>>> 
>>>> I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.
>>>> 
>>>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>>>> 
>>>> whirr.hardward-id=cc1.4xlarge
>>>> whirr.image-id=us-east-1/ami-321eed5b
>>>> 
>>>> I get different a different problem:
>>>> 
>>>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>>>> 
>>>> but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.
>>>> 
>>>> --Ben
>>>> 
>>>> 
>>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>> 
>>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>>> release after fixing this issue.
>>>>> 
>>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>> 
>>>>> -- Andrei Savu / andreisavu.ro
>>>>> 
>>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>>>>>> 
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> 
>>>>>> and then either this (from the recipe)
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> 
>>>>>> or this:
>>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> 
>>>>>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>>>>>> 
>>>>>> 
>>>>>> My whole config file is this:
>>>>>> 
>>>>>> whirr.cluster-name=bhcL4
>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>>> whirr.provider=aws-ec2
>>>>>> whirr.identity=...
>>>>>> whirr.credential=...
>>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> #whirr.hardware-id=c1.medium
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> # Amazon linux as of 3/11:
>>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> # If you choose a different location, make sure whirr.image-id is updated too
>>>>>> whirr.location-id=us-east-1d
>>>>>> hadoop-hdfs.dfs.permissions=false
>>>>>> hadoop-hdfs.dfs.replication=2
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>>>> 
>>>> 
>> 
>> 


Re: aws 64-bit c1.xlarge problems

Posted by Andrei Savu <sa...@gmail.com>.
Strange! I will try your properties file tomorrow.

If you want to try again you can find the artifacts for 0.4.0 RC1 here:
http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1

On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <be...@daltonclark.com> wrote:
> Andrei,
>
> Thanks for looking at this.  Unfortunately it does not seem to work.
>
> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben' or whatever else, I get this.
>
> 1) SshException on node us-east-1/i-62de280d:
> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>
> So it doesn't seem to be honoring that property, and it's definitely not allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>
> The ubuntu ami from the recipes continues to work fine.
>
> Here's the full config file I'm using.  I grabbed the recipe from trunk and put my stuff back in, to make sure I'm not missing a new setting:
>
> whirr.cluster-name=bhcTL
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
> whirr.hadoop-install-function=install_cdh_hadoop
> whirr.hadoop-configure-function=configure_cdh_hadoop
> whirr.provider=aws-ec2
> whirr.identity=${env:AWS_ACCESS_KEY_ID}
> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
> whirr.cluster-user=ben
> # Amazon linux 32-bit--works
> #whirr.hardware-id=c1.medium
> #whirr.image-id=us-east-1/ami-d59d6bbc
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
> #whirr.hardware-id=c1.xlarge
> #whirr.image-id=us-east-1/ami-da0cf8b3
> # Amazon linux 64-bit as of 3/11:--doesn't work
> whirr.hardware-id=c1.xlarge
> whirr.image-id=us-east-1/ami-8e1fece7
> #Cluster compute --doesn't work
> #whirr.hardward-id=cc1.4xlarge
> #whirr.image-id=us-east-1/ami-321eed5b
> whirr.location-id=us-east-1d
> hadoop-hdfs.dfs.permissions=false
> hadoop-hdfs.dfs.replication=2
>
>
> --Ben
>
>
>
>
> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>
>> Ben,  could you give it one more try using the current trunk?
>>
>> You can specify the user by setting the option whirr.cluster-user
>> (defaults to current system user).
>>
>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>> Andrei,
>>>
>>> Thanks.
>>>
>>> After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.
>>>
>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:
>>>
>>> 1) SshException on node us-east-1/i-b2678ddd:
>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>
>>> So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>>>
>>> Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>>>
>>> I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.
>>>
>>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>>>
>>> whirr.hardward-id=cc1.4xlarge
>>> whirr.image-id=us-east-1/ami-321eed5b
>>>
>>> I get different a different problem:
>>>
>>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>>>
>>> but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.
>>>
>>> --Ben
>>>
>>>
>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>
>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>> release after fixing this issue.
>>>>
>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>
>>>> -- Andrei Savu / andreisavu.ro
>>>>
>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>>>>>
>>>>> whirr.hardware-id=c1.xlarge
>>>>>
>>>>> and then either this (from the recipe)
>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>
>>>>> or this:
>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>
>>>>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>>>>>
>>>>>
>>>>> My whole config file is this:
>>>>>
>>>>> whirr.cluster-name=bhcL4
>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>> whirr.provider=aws-ec2
>>>>> whirr.identity=...
>>>>> whirr.credential=...
>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>> whirr.hardware-id=c1.xlarge
>>>>> #whirr.hardware-id=c1.medium
>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>> # Amazon linux as of 3/11:
>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>> # If you choose a different location, make sure whirr.image-id is updated too
>>>>> whirr.location-id=us-east-1d
>>>>> hadoop-hdfs.dfs.permissions=false
>>>>> hadoop-hdfs.dfs.replication=2
>>>>>
>>>>>
>>>>>
>>>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>>>
>>>
>
>

Re: aws 64-bit c1.xlarge problems

Posted by Benjamin Clark <be...@daltonclark.com>.
Andrei,

Thanks for looking at this.  Unfortunately it does not seem to work.

Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben' or whatever else, I get this.

1) SshException on node us-east-1/i-62de280d:
org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
	at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
	at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
	at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
 
So it doesn't seem to be honoring that property, and it's definitely not allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.

The ubuntu ami from the recipes continues to work fine.

Here's the full config file I'm using.  I grabbed the recipe from trunk and put my stuff back in, to make sure I'm not missing a new setting:

whirr.cluster-name=bhcTL
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
whirr.hadoop-install-function=install_cdh_hadoop
whirr.hadoop-configure-function=configure_cdh_hadoop
whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
whirr.cluster-user=ben
# Amazon linux 32-bit--works
#whirr.hardware-id=c1.medium
#whirr.image-id=us-east-1/ami-d59d6bbc
# Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
#whirr.hardware-id=c1.xlarge
#whirr.image-id=us-east-1/ami-da0cf8b3
# Amazon linux 64-bit as of 3/11:--doesn't work
whirr.hardware-id=c1.xlarge
whirr.image-id=us-east-1/ami-8e1fece7
#Cluster compute --doesn't work
#whirr.hardward-id=cc1.4xlarge
#whirr.image-id=us-east-1/ami-321eed5b
whirr.location-id=us-east-1d
hadoop-hdfs.dfs.permissions=false
hadoop-hdfs.dfs.replication=2


--Ben




On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:

> Ben,  could you give it one more try using the current trunk?
> 
> You can specify the user by setting the option whirr.cluster-user
> (defaults to current system user).
> 
> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>> Andrei,
>> 
>> Thanks.
>> 
>> After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.
>> 
>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:
>> 
>> 1) SshException on node us-east-1/i-b2678ddd:
>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> 
>> So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>> 
>> Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>> 
>> I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.
>> 
>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>> 
>> whirr.hardward-id=cc1.4xlarge
>> whirr.image-id=us-east-1/ami-321eed5b
>> 
>> I get different a different problem:
>> 
>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>> 
>> but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.
>> 
>> --Ben
>> 
>> 
>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>> 
>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>> release after fixing this issue.
>>> 
>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>> 
>>> -- Andrei Savu / andreisavu.ro
>>> 
>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>>>> 
>>>> whirr.hardware-id=c1.xlarge
>>>> 
>>>> and then either this (from the recipe)
>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>> 
>>>> or this:
>>>> # Amazon linux 64-bit, default as of 3/11:
>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>> 
>>>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>>>> 
>>>> 
>>>> My whole config file is this:
>>>> 
>>>> whirr.cluster-name=bhcL4
>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>> whirr.provider=aws-ec2
>>>> whirr.identity=...
>>>> whirr.credential=...
>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>> whirr.hardware-id=c1.xlarge
>>>> #whirr.hardware-id=c1.medium
>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>> # Amazon linux as of 3/11:
>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>> # If you choose a different location, make sure whirr.image-id is updated too
>>>> whirr.location-id=us-east-1d
>>>> hadoop-hdfs.dfs.permissions=false
>>>> hadoop-hdfs.dfs.replication=2
>>>> 
>>>> 
>>>> 
>>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>> 
>> 


Re: aws 64-bit c1.xlarge problems

Posted by Andrei Savu <sa...@gmail.com>.
Ben,  could you give it one more try using the current trunk?

You can specify the user by setting the option whirr.cluster-user
(defaults to current system user).

On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <be...@daltonclark.com> wrote:
> Andrei,
>
> Thanks.
>
> After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.
>
> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:
>
> 1) SshException on node us-east-1/i-b2678ddd:
> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
> So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>
> Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>
> I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.
>
> Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>
> whirr.hardward-id=cc1.4xlarge
> whirr.image-id=us-east-1/ami-321eed5b
>
> I get different a different problem:
>
> Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>
> but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.
>
> --Ben
>
>
> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>
>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>> you are seeing. We should be able to restart the vote for the 0.4.0
>> release after fixing this issue.
>>
>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>
>> -- Andrei Savu / andreisavu.ro
>>
>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>>>
>>> whirr.hardware-id=c1.xlarge
>>>
>>> and then either this (from the recipe)
>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>
>>> or this:
>>> # Amazon linux 64-bit, default as of 3/11:
>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>
>>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>>>
>>>
>>> My whole config file is this:
>>>
>>> whirr.cluster-name=bhcL4
>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>> whirr.hadoop-install-function=install_cdh_hadoop
>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>> whirr.provider=aws-ec2
>>> whirr.identity=...
>>> whirr.credential=...
>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>> whirr.hardware-id=c1.xlarge
>>> #whirr.hardware-id=c1.medium
>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>> # Amazon linux as of 3/11:
>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>> # If you choose a different location, make sure whirr.image-id is updated too
>>> whirr.location-id=us-east-1d
>>> hadoop-hdfs.dfs.permissions=false
>>> hadoop-hdfs.dfs.replication=2
>>>
>>>
>>>
>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>
>

Re: aws 64-bit c1.xlarge problems

Posted by Benjamin Clark <be...@daltonclark.com>.
Andrei,

Thanks.

After patching with 158, it launches fine as me on that Ubuntu image from the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as whom I can log in is also 'ben'), so that looks good.

But it's now doing this with amazon linux (ami-da0cf8b3, which was the default 64-bit ami a few days ago, and may still be) during launch:

1) SshException on node us-east-1/i-b2678ddd:
org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
	at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
	at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
	at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

So it seems as if the key part of jclouds authentication setup is still failing for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.

Is there a property for the user it launches as?  Or does it just do whichever user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?

I can switch to ubuntu, but I have a fair amount of native code setup in my custom scripts and would prefer to stick with a redhattish version if possible.

Looking ahead, I want to benchmark plain old 64-bit instances against cluster instances, to see if the allegedly improved networking gives us a boost, and the available ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:

whirr.hardward-id=cc1.4xlarge
whirr.image-id=us-east-1/ami-321eed5b

I get different a different problem:

Exception in thread "main" java.util.NoSuchElementException: hardwares don't support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
[[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsI

but I imagine that if using cluster instances is going to be possible, support for amazon linux will be needed.

--Ben


On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:

> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
> you are seeing. We should be able to restart the vote for the 0.4.0
> release after fixing this issue.
> 
> [0] https://issues.apache.org/jira/browse/WHIRR-264
> [1] https://issues.apache.org/jira/browse/WHIRR-158
> 
> -- Andrei Savu / andreisavu.ro
> 
> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>> 
>> whirr.hardware-id=c1.xlarge
>> 
>> and then either this (from the recipe)
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>> whirr.image-id=us-east-1/ami-da0cf8b3
>> 
>> or this:
>> # Amazon linux 64-bit, default as of 3/11:
>> whirr.image-id=us-east-1/ami-8e1fece7
>> 
>> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>> 
>> 
>> My whole config file is this:
>> 
>> whirr.cluster-name=bhcL4
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=...
>> whirr.credential=...
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>> whirr.hardware-id=c1.xlarge
>> #whirr.hardware-id=c1.medium
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>> whirr.image-id=us-east-1/ami-da0cf8b3
>> # Amazon linux as of 3/11:
>> #whirr.image-id=us-east-1/ami-8e1fece7
>> # If you choose a different location, make sure whirr.image-id is updated too
>> whirr.location-id=us-east-1d
>> hadoop-hdfs.dfs.permissions=false
>> hadoop-hdfs.dfs.replication=2
>> 
>> 
>> 
>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1


Re: aws 64-bit c1.xlarge problems

Posted by Andrei Savu <sa...@gmail.com>.
I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
going to commit WHIRR-158 [1] tomorrow and it should fix the problem
you are seeing. We should be able to restart the vote for the 0.4.0
release after fixing this issue.

[0] https://issues.apache.org/jira/browse/WHIRR-264
[1] https://issues.apache.org/jira/browse/WHIRR-158

-- Andrei Savu / andreisavu.ro

On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <be...@daltonclark.com> wrote:
> I have been using whirr 0.4 branch to launch clusters of c1.medium amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties recipe and modified it slightly to suit my needs.  I'm now trying with basically the same properties file, but when I use
>
> whirr.hardware-id=c1.xlarge
>
> and then either this (from the recipe)
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
> whirr.image-id=us-east-1/ami-da0cf8b3
>
> or this:
> # Amazon linux 64-bit, default as of 3/11:
> whirr.image-id=us-east-1/ami-8e1fece7
>
> I get a a failure to install the right public key, so that I can't log into the name node (or any other nodes, for that matter).
>
>
> My whole config file is this:
>
> whirr.cluster-name=bhcL4
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
> whirr.hadoop-install-function=install_cdh_hadoop
> whirr.hadoop-configure-function=configure_cdh_hadoop
> whirr.provider=aws-ec2
> whirr.identity=...
> whirr.credential=...
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
> whirr.hardware-id=c1.xlarge
> #whirr.hardware-id=c1.medium
> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
> whirr.image-id=us-east-1/ami-da0cf8b3
> # Amazon linux as of 3/11:
> #whirr.image-id=us-east-1/ami-8e1fece7
> # If you choose a different location, make sure whirr.image-id is updated too
> whirr.location-id=us-east-1d
> hadoop-hdfs.dfs.permissions=false
> hadoop-hdfs.dfs.replication=2
>
>
>
> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d and whirr.location-id=us-east-1