You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@whirr.apache.org by Benjamin Clark <be...@daltonclark.com> on 2011/03/12 04:56:58 UTC

hadoop config property override problem

Yes, thanks Tom, that works.

Many features and design changes in this branch are much appreciated, especially the config property override in the whirr config file.

I found one problem, at least in the 0.4 branch.  If you override a property with a comma-separated list, for example:

hadoop-common.io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec

then what actually shows up in core-site.xml is surrounded by brackets and has spaces between the elements of the list, so

[org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec]

Hadoop is not chomping or trimming that kind of thing, so you need to remove the spaces and brackets to get that to work.  It's easy enough to patch the file, deploy to the slaves and restart, but I'm wondering if that's accounted for anywhere.  I scanned CHANGES.txt in trunk and I don't see it.

--Ben


On Mar 10, 2011, at 5:41 PM, Tom White wrote:

> On Thu, Mar 10, 2011 at 1:19 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>> Thank you both, Tom and Andrei.   Now that I know where the faq is, I hope to bother you less with things that are documented!
>> 
>> I think I should be all set with customization, but I need to build.  BUILD.txt says 'mvn clean install' or mvn package -Ppackage.  I can do that, and mvn reports success but then I try to use whirr-cli-0.4.0-incubating.jar as the jar, and the manifest has no main class (OK, I can supply 'org.apache.whirr.cli.Main'  if I need to), and in any case the jar is not a fat jar, as I see all the publicly distributed versions are.  It looks in the poms as if you have the maven assembly plugins trying to make a fat jar, but it doesn't seem to be doing it for me.
> 
> Whirr no longer produces a shaded (fat) JAR as of 0.4.0 and trunk, so
> perhaps it is working. Try bin/whirr and it should list the roles for
> you.
> 
> Cheers
> Tom
> 
>> 
>> What am I doing wrong?
>> 
>> I'm doing this on a Mac like so:
>> 
>> $ ruby --version
>> ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin10]
>> $ mvn --version
>> Apache Maven 3.0.2 (r1056850; 2011-01-08 19:58:10-0500)
>> Java version: 1.6.0_24, vendor: Apple Inc.
>> Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
>> Default locale: en_US, platform encoding: MacRoman
>> OS name: "mac os x", version: "10.6.6", arch: "x86_64", family: "mac"
>> $ java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)
>> 
>> 
>> 
>> 
>> Now I think my only problem is that
>> On Mar 10, 2011, at 2:35 PM, Andrei Savu wrote:
>> 
>>> Starting with the upcoming 0.4.0 release Whirr is no longer using S3
>>> for storing the install and configure scripts. You can grab the
>>> scripts from:
>>> 
>>> ${WHIRR_HOME}/services/${SERVICE_NAME}/src/main/resources/functions/{install,configure}_SERVICE.sh
>>> 
>>> It's also easier to customize the scripts. You just need to place your
>>> version in ${WHIRR_HOME}/functions (I believe it should have the same
>>> name).
>>> 
>>> From the 0.4.0 FAQ: "If you want to change the scripts then you can
>>> place a modified copy of the
>>> scripts in a _functions_ directory in Whirr's installation directory. The
>>> original versions of the scripts can be found in _functions_ directories in the
>>> source trees."
>>> 
>>> -- Andrei Savu / andreisavu.ro
>>> 
>>> On Thu, Mar 10, 2011 at 9:27 PM, Benjamin Clark <be...@daltonclark.com> wrote:
>>>> So if we grab install_cdh_hadoop.sh from the source tree, and put a customized version in our own bucket, and set whirr.run-url-base to the root of that bucket, it should work, even in 4.0 and after?
>>>> 
>>>> Based on the FAQ I tried a few of these to attempt to verify I was on the right track:
>>>> 
>>>> wget http://whirr.s3.amazonaws.com/install_cdh_hadoop
>>>> wget http://whirr.s3.amazonaws.com/0.4/install_cdh_hadoop
>>>> wget http://whirr.s3.amazonaws.com/0.4.0/install_cdh_hadoop
>>>> wget http://whirr.s3.amazonaws.com/install_cdh_hadoop.sh
>>>> wget http://whirr.s3.amazonaws.com/0.4/install_cdh_hadoop.sh
>>>> wget http://whirr.s3.amazonaws.com/0.4.0/install_cdh_hadoop.sh
>>>> 
>>>> but all give 404s.
>>>> 
>>>> 
>>>> 
>>>> On Mar 10, 2011, at 12:01 AM, Tom White wrote:
>>>> 
>>>>> Sorry I missed this thread. On 0.4.0 and later
>>>>> 
>>>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>>>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>>>> 
>>>>> changes to
>>>>> 
>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>> 
>>>>> Cheers,
>>>>> Tom
>>>>> 
>>>>> On Wed, Mar 9, 2011 at 8:23 AM, Sebastian Schoenherr
>>>>> <se...@uibk.ac.at> wrote:
>>>>>> Hi Saptarshi,
>>>>>> I tried to execute my working whirr 0.3.0 configuration (identical to your
>>>>>> property file, using cloudera scripts) on branch-0.4 and the same issues
>>>>>> arised for me.  Unfortunately I'm not sure yet why it's not working with
>>>>>> branch-0.4. Is using branch-0.3 an option for you?
>>>>>> Any other guesses?
>>>>>> cheers,
>>>>>> sebastian
>>>>>> 
>>>>>> 
>>>>>> On 08.03.2011 05:41, Saptarshi Guha wrote:
>>>>>>> 
>>>>>>> once again, i've changed the secret identity ..
>>>>>>> 
>>>>>>> On Mon, Mar 7, 2011 at 8:41 PM, Saptarshi Guha
>>>>>>> <sa...@revolutionanalytics.com>  wrote:
>>>>>>>> 
>>>>>>>> Hello
>>>>>>>> 
>>>>>>>> No such luck on my end This is my script file, you can test that the
>>>>>>>> scripts download. But when I log in
>>>>>>>> hadoop version is. (I pulled the latest git). Also my scripts (you can
>>>>>>>> confirm if you download) echo a small line to files in /tmp.
>>>>>>>> They are not being created.
>>>>>>>> 
>>>>>>>> Hadoop 0.20.2
>>>>>>>> Subversion
>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
>>>>>>>> -r 911707
>>>>>>>> Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
>>>>>>>> 
>>>>>>>> whirr.cluster-name=revotesting2
>>>>>>>> whirr.service-name=hadoop
>>>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2
>>>>>>>> hadoop-datanode+hadoop-tasktracker
>>>>>>>> whirr.provider=aws-ec2
>>>>>>>> whirr.identity= AKIAJH5JBSI5KJ7YZQ6A
>>>>>>>> whirr.credential= b/kqLJAHOdRA4L30n7Zt8Edz383B1ARtPI3wiyD6
>>>>>>>> whirr.location-id=us-east-1
>>>>>>>> whirr.hardware-id=c1.xlarge
>>>>>>>> whirr.run-url-base=http://ml.stat.purdue.edu/whirr-scripts/
>>>>>>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>>>>>>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>>>>>>> 
>>>>>>>> ## Rightscales CentOS AMI
>>>>>>>> 
>>>>>>>> ##http://support.rightscale.com/18-Release_Notes/02-AMI/RightImages_Release_Notes
>>>>>>>> jclouds.ec2.ami-owners=411009282317
>>>>>>>> whirr.image-id=us-east-1/ami-ccb35ea5
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Mar 7, 2011 at 9:59 AM, Benjamin Clark<be...@daltonclark.com>
>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> In my experience you need
>>>>>>>>> 
>>>>>>>>> whirr.run-url-base=http://name-of-my-bucket-with-customized-scripts/
>>>>>>>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>>>>>>>> 
>>>>>>>>> and then you *also* need to have a copy of sun/java/install in that same
>>>>>>>>> bucket.
>>>>>>>>> 
>>>>>>>>> And both of those scripts need to be public-readable.
>>>>>>>>> 
>>>>>>>>> So in the end you should be able to do
>>>>>>>>> curl
>>>>>>>>> http://name-of-my-bucket-with-customized-scripts.s3.amazonaws.com/cloudera/cdh/install
>>>>>>>>> 
>>>>>>>>> and
>>>>>>>>> curl
>>>>>>>>> http://name-of-my-bucket-with-customized-scripts.s3.amazonaws.com/sun/java/install
>>>>>>>>> 
>>>>>>>>> Even if you haven't customizied sun/java/install, it needs to be there.
>>>>>>>>> 
>>>>>>>>> If you do all that, the scripts will run and you will have the versions
>>>>>>>>> you asked for.
>>>>>>>>> 
>>>>>>>>> `hadoop version` on the name node then says, in my case:
>>>>>>>>> 
>>>>>>>>> Hadoop 0.20.2-CDH3B4
>>>>>>>>> 
>>>>>>>>> On Mar 7, 2011, at 12:41 PM, Saptarshi Guha wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> Fixed the security slip up. Did the hadoop version thing and got this
>>>>>>>>>> 
>>>>>>>>>> [root@domU-12-31-39-0B-CC-41 ~]# hadoop version
>>>>>>>>>> Hadoop 0.20.2
>>>>>>>>>> Subversion
>>>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
>>>>>>>>>> -r 911707
>>>>>>>>>> Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
>>>>>>>>>> 
>>>>>>>>>> So i guess its not CDH.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Saptarshi
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Mar 7, 2011 at 9:22 AM, Saptarshi Guha
>>>>>>>>>> <sa...@revolutionanalytics.com>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> dear me! thanks, will do right away.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Mar 7, 2011 at 1:46 AM, Sebastian Schoenherr
>>>>>>>>>>> <se...@uibk.ac.at>  wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Saptarshi,
>>>>>>>>>>>> Try to execute "hadoop version" on your namenode, if the output is
>>>>>>>>>>>> Hadoop
>>>>>>>>>>>> 0.20.2-CDH3B4, the current cloudera distribution has been installed.
>>>>>>>>>>>> Btw, I would recommend to set your current Access Key ID and Secret
>>>>>>>>>>>> Key
>>>>>>>>>>>> inactive, since you posted it in your prop file.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> sebastian
>>>>>>>>>>>> 
>>>>>>>>>>>> On 06.03.2011 06:54, Saptarshi Guha wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I did git clone of the latest whirr and copied cloudera scripts into
>>>>>>>>>>>>> the script directory (copied over
>>>>>>>>>>>>> from whirr-0.3-incubating).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My properties file is at the end of this email.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, I don't think the scripts are being run because the
>>>>>>>>>>>>> jobtracker
>>>>>>>>>>>>> is the default Apache hadoop jobtracker and not the cloudera
>>>>>>>>>>>>> jobtracker.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Have i missed something?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Saptarshi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## Properties
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> whirr.cluster-name=revotesting
>>>>>>>>>>>>> whirr.service-name=hadoop
>>>>>>>>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2
>>>>>>>>>>>>> hadoop-datanode+hadoop-tasktracker
>>>>>>>>>>>>> whirr.provider=aws-ec2
>>>>>>>>>>>>> whirr.identity= AKIAI3FUFFXAPYLE7CJA
>>>>>>>>>>>>> whirr.credential= 2Yq3Ar2HSxK/hbwZHs6aN6yrh0yfGNSPTpVw3t2n
>>>>>>>>>>>>> whirr.location-id=us-east-1
>>>>>>>>>>>>> whirr.hardware-id=c1.xlarge
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## Rightscales CentOS AMI
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://support.rightscale.com/18-Release_Notes/02-AMI/RightImages_Release_Notes
>>>>>>>>>>>>> jclouds.ec2.ami-owners=411009282317
>>>>>>>>>>>>> whirr.image-id=us-east-1/ami-ccb35ea5
>>>>>>>>>>>>> 
>>>>>>>>>>>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>>>>>>>>>>>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>