You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2008/07/25 20:33:31 UTC

[jira] Created: (HADOOP-3835) Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines

Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines
------------------------------------------------------------------------------------------

                 Key: HADOOP-3835
                 URL: https://issues.apache.org/jira/browse/HADOOP-3835
             Project: Hadoop Core
          Issue Type: Improvement
          Components: build
            Reporter: dhruba borthakur
            Priority: Minor


A rpm-like packing scheme to package and then install hadoop binaries is very helpful, especially when the number of machines in the cluster is huge. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3835) Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617757#action_12617757 ] 

Steve Loughran commented on HADOOP-3835:
----------------------------------------

Thinking about this a bit more, a big choice point is does the hadoop-site.xml file get distributed as an RPM, or is managed in other ways? If it is also pushed out by RPM, then it should be a separate RPM that Apache don't distributed themselves, but let people push out. It should also be stored in the filesystem layout such that people who use the RPM are happy. This is somewhat controversial.

1. the Filesystem Hierarchy Standard [ http://www.pathname.com/fhs/pub/fhs-2.3.html ] says conf files should go into /var/opt/, so there should be a directory /var/opt/hadoop containing hadoop configuration files, which can be customised on a per-machine basis.

2. If you are managing a datacentre, you dont want things customised on a per-machine basis, as that creates problems. You want a single configuration for the entire cluster. And you dont want to mount /var/opt/hadoop using NFS, as that just creates a SPOF in the cluster. 
Which argues for putting the configuration files into /opt/hadoop along with everything else, and somehow pushing that configuration out. 

It is a lot easier to do #1; letting people downstream do their own RPMs is powerful, but complicated.

see also: http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/release/doc/creating_and_editing_rpms.pdf

> Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3835
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3835
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: build
>            Reporter: dhruba borthakur
>            Priority: Minor
>
> A rpm-like packing scheme to package and then install hadoop binaries is very helpful, especially when the number of machines in the cluster is huge. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3835) Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617414#action_12617414 ] 

Steve Loughran commented on HADOOP-3835:
----------------------------------------

As someone who has done RPMs for other OSS projects, I should warn this is a major undertaking

* you need to understand all the rules of where to place things in a well-managed linux system, else rpmlint will complain.
* you need to select which linux distros to support and create real/virtual machines that contain these 
* to test, you need to automate RPM upload install and then operations
* the RPM upgrade model is, well, wierd. you get your install script run before the next version is uninstalled. 
* You need to decide if farm configuration goes into the RPMs, in which case you have to give downstream people the tools to create their own custom RPMs. If not, you need to make it easy to push out the configuration as a separate RPM (or do configuration in some other manner)
* Dont try doing anything clever like stopping hadoop during uninstall, because it is very hard to have this work in all situations.
* You have to be prepared to field all the support calls related to installs, upgrades, alien installs on debian, etc.

I do think RPMs are useful, especially for big farm rollouts, But they are also surprisingly hard work, as in "full time for some weeks" kind of hard work

Some example code to build RPMs and test them over ssh installations. 

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/release

.spec file (prior to being pushed through Ant property expansion)
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/release/metadata/rpm/smartfrog.spec?view=markup

> Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3835
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3835
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: build
>            Reporter: dhruba borthakur
>            Priority: Minor
>
> A rpm-like packing scheme to package and then install hadoop binaries is very helpful, especially when the number of machines in the cluster is huge. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3835) Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617804#action_12617804 ] 

dhruba borthakur commented on HADOOP-3835:
------------------------------------------

My thinking is that all (core and contrib) jars, docs, javadocs, bin, lib, etc get packaged into a single rpm. This gets installed in a default location, probably /var/opt/hadoop. This path can be overridden at rpm install time. This package does not have any configuration files.

Then, there will be a separate rpm package that contains the configuration (hadoop*.xml, metrics.properties, log4j.properties, and all file sin the conf directory). This will be installed by default at /var/opt/hadoop/conf. 

No NFS mounting is necessary. These two packages will most likely be installed on local directories on each cluster machine. The installation of a new package will not start/stop services. It is possible that an "install" might check to see if any hadoop processes are running, and of so, refuse to install. This will be for RehHat Linux. Once the scripts are made public, anyone can extend it to work on other Linux distributions.








> Develop scripts to create rpm package to facilitate deployment of hadoop on Linux machines
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3835
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3835
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: build
>            Reporter: dhruba borthakur
>            Priority: Minor
>
> A rpm-like packing scheme to package and then install hadoop binaries is very helpful, especially when the number of machines in the cluster is huge. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.