You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Merto Mertek <ma...@gmail.com> on 2012/02/13 11:26:30 UTC

Developing and deploying hadoop

I am interested in some general tips on how to develop and deploy new
versions of hadoop. I've been trying to compile a new version of  hadoop
and place the new jar to the cluster in the lib folder, however it was not
picked despite the classpath was explicitly set to the lib folder. I am
interested in the following questions:

a) How to deploy a new version? Just copy the new compiled jar file to all
lib folders on all nodes?
b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
c) How do you develop and deploy hadoop locally and how remotely?  For
deploying builds are you using your own sh scripts or are you using any
tools like ant/maven?
d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?


Any other tips are welcomed..

Thank you

Re: Developing and deploying hadoop

Posted by Eric Yang <er...@gmail.com>.
a) Standard practice is to keep data directory independent of program
directory.  For example, if the software is installed in
/opt/hadoop/hadoop-1.0.  Data may be located in /var/hadoop.  When new
version is available for deployment, it can be deployed to
/opt/hadoop/hadoop-2.0 and use the same /var/hadoop directory for
data.

b) It is best to use "ant binary" with a dozen other switches that are
documented in Hadoop wiki, http://wiki.apache.org/hadoop/HowToRelease.
 This reduces the size of the program files without having to deploy
documentation and source on all nodes.

c) There are a couple deployment systems, like Ambari, Cloudera
Manager, HMS and IBM BigInsights.  Most of them are free to use for up
to 50 nodes.  pdsh with shell scripts works too, in fact the largest
clusters are deployed with ssh and scp.

d) PREFIX/share/hadoop was introduced in 0.20.204.0.  The design was
to map closely to Filesystem Hierarchy Standard, where platform
independent files are stored in /usr/share.  This design enables
dependent project to cross reference classpath by using relative path.
 For example, HBase may refer to hadoop jar files by sourcing
PREFIX/share/hadoop/*.jar.  Some projects have adopt this design, and
we hope more projects will switch to this convention.

regards,
Eric

On Wed, Feb 29, 2012 at 6:56 PM, Merto Mertek <ma...@gmail.com> wrote:
> I would be glad to hear what is your development cycle and how you deploy
> new features to production cluster...  How do you deploy them to the
> production cluster? With bash scripts and rsync, ant, maven or any other
> automation tool? I would be thankfull if you could point me to any resource
> describing best practices in developing, deploying and automatization of
> java project in unix/linux environment..
>
> thanks..
>
>
> On 13 February 2012 11:26, Merto Mertek <ma...@gmail.com> wrote:
>
>> I am interested in some general tips on how to develop and deploy new
>> versions of hadoop. I've been trying to compile a new version of  hadoop
>> and place the new jar to the cluster in the lib folder, however it was not
>> picked despite the classpath was explicitly set to the lib folder. I am
>> interested in the following questions:
>>
>> a) How to deploy a new version? Just copy the new compiled jar file to all
>> lib folders on all nodes?
>> b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
>> c) How do you develop and deploy hadoop locally and how remotely?  For
>> deploying builds are you using your own sh scripts or are you using any
>> tools like ant/maven?
>> d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?
>>
>>
>> Any other tips are welcomed..
>>
>> Thank you
>>

Re: Developing and deploying hadoop

Posted by Roman Shaposhnik <rv...@apache.org>.
Hi!

One way to deploy Hadoop in a more formal environment is to do it
via the Bigtop distribution. Bigtop provides packages and puppet deployment
code for most Linux distributions. We try to make the experience of
deploying Hadoop as seamless as possible, since our goal can
be summarized as "trying to be Ubuntu of Hadoop-based bigdata
management platforms". More info is available over here:
    https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop

We're also an Apache incubating project. That means -- if
we don't quite scratch your itch yet -- join the fun and help
us make a 100% Apache Hadoop-based bigdata management
platform a reality.

Thanks,
Roman.

On Wed, Feb 29, 2012 at 6:56 PM, Merto Mertek <ma...@gmail.com> wrote:
> I would be glad to hear what is your development cycle and how you deploy
> new features to production cluster...  How do you deploy them to the
> production cluster? With bash scripts and rsync, ant, maven or any other
> automation tool? I would be thankfull if you could point me to any resource
> describing best practices in developing, deploying and automatization of
> java project in unix/linux environment..
>
> thanks..
>
>
> On 13 February 2012 11:26, Merto Mertek <ma...@gmail.com> wrote:
>
>> I am interested in some general tips on how to develop and deploy new
>> versions of hadoop. I've been trying to compile a new version of  hadoop
>> and place the new jar to the cluster in the lib folder, however it was not
>> picked despite the classpath was explicitly set to the lib folder. I am
>> interested in the following questions:
>>
>> a) How to deploy a new version? Just copy the new compiled jar file to all
>> lib folders on all nodes?
>> b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
>> c) How do you develop and deploy hadoop locally and how remotely?  For
>> deploying builds are you using your own sh scripts or are you using any
>> tools like ant/maven?
>> d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?
>>
>>
>> Any other tips are welcomed..
>>
>> Thank you
>>

Re: Developing and deploying hadoop

Posted by Roman Shaposhnik <rv...@apache.org>.
Hi!

One way to deploy Hadoop in a more formal environment is to do it
via the Bigtop distribution. Bigtop provides packages and puppet deployment
code for most Linux distributions. We try to make the experience of
deploying Hadoop as seamless as possible, since our goal can
be summarized as "trying to be Ubuntu of Hadoop-based bigdata
management platforms". More info is available over here:
    https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop

We're also an Apache incubating project. That means -- if
we don't quite scratch your itch yet -- join the fun and help
us make a 100% Apache Hadoop-based bigdata management
platform a reality.

Thanks,
Roman.

On Wed, Feb 29, 2012 at 6:56 PM, Merto Mertek <ma...@gmail.com> wrote:
> I would be glad to hear what is your development cycle and how you deploy
> new features to production cluster...  How do you deploy them to the
> production cluster? With bash scripts and rsync, ant, maven or any other
> automation tool? I would be thankfull if you could point me to any resource
> describing best practices in developing, deploying and automatization of
> java project in unix/linux environment..
>
> thanks..
>
>
> On 13 February 2012 11:26, Merto Mertek <ma...@gmail.com> wrote:
>
>> I am interested in some general tips on how to develop and deploy new
>> versions of hadoop. I've been trying to compile a new version of  hadoop
>> and place the new jar to the cluster in the lib folder, however it was not
>> picked despite the classpath was explicitly set to the lib folder. I am
>> interested in the following questions:
>>
>> a) How to deploy a new version? Just copy the new compiled jar file to all
>> lib folders on all nodes?
>> b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
>> c) How do you develop and deploy hadoop locally and how remotely?  For
>> deploying builds are you using your own sh scripts or are you using any
>> tools like ant/maven?
>> d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?
>>
>>
>> Any other tips are welcomed..
>>
>> Thank you
>>

Re: Developing and deploying hadoop

Posted by Merto Mertek <ma...@gmail.com>.
I would be glad to hear what is your development cycle and how you deploy
new features to production cluster...  How do you deploy them to the
production cluster? With bash scripts and rsync, ant, maven or any other
automation tool? I would be thankfull if you could point me to any resource
describing best practices in developing, deploying and automatization of
java project in unix/linux environment..

thanks..


On 13 February 2012 11:26, Merto Mertek <ma...@gmail.com> wrote:

> I am interested in some general tips on how to develop and deploy new
> versions of hadoop. I've been trying to compile a new version of  hadoop
> and place the new jar to the cluster in the lib folder, however it was not
> picked despite the classpath was explicitly set to the lib folder. I am
> interested in the following questions:
>
> a) How to deploy a new version? Just copy the new compiled jar file to all
> lib folders on all nodes?
> b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
> c) How do you develop and deploy hadoop locally and how remotely?  For
> deploying builds are you using your own sh scripts or are you using any
> tools like ant/maven?
> d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?
>
>
> Any other tips are welcomed..
>
> Thank you
>