You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Scott Roberts <sc...@jhu.edu> on 2012/02/13 05:12:25 UTC

Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster

All,

I am running into an interesting issue while attempting to start Accumulo on a Rocks+ cluster where only the Accumulo master will start and I am requesting assistance to get it up and running.  I believe this issue is occurring for two reasons: First, Accumulo does not appear to leverage the HADOOP_CONF_DIR environment variable and second is that while Accumulo is being started under a specific uid with shared SSH keys in place, it is not SSH'ing to the other servers to start and stop its processes.

Setup:

Rocks+ 1.0.6 "Big Data" cluster consisting of a head node and three compute nodes named "frontend", compute-0-0, compute-0-1, and compute-0-2 respectively.  The frontend runs the namenode and jobtracker instances. All three compute nodes are running the datanode, tasktracker, and zookeeper instances. HDFS & Zookeeper are physically storing their data under /state/partition1/apache-hdfs and apache-zk on the compute nodes.  Accumulo is configured & stored under /state/partition1/accumulo on all four servers where the frontend is designated as master and the three compute nodes as slaves.

The HDFS configuration files are stored under /opt/apache/hadoop/conf/apache-hdfs/hdfs, MapReduce under /opt/apache/hadoop/conf/apache-mr/mapreduce, and Zookeeper under /opt/apache/zookeeper/conf.  As per Hadoop 0.20 "best practices" the appropriate configuration is referenced by setting the HADOOP_CONF_DIR environment variable.  For example if I am going to run a MR job I export HADOOP_CONF_DIR=/opt/apache/hadoop/conf/apache-mr/mapreduce.

I've tested HDFS, MapReduce, and Zookeeper so I know all three work properly across the cluster provided I set the Hadoop configuration variable.

Results with Accumulo:

I set HADOOP_CONF_DIR in accumulo-env.sh and it doesn't appear that Accumulo leverages that variable.  When I run "accumulo init", it thinks my HDFS home is file:/// which is not correct and it fails to initialize.  If I specify my HADOOP_HOME as the mapreduce or HDFS configuration directories, Accumulo complains that Hadoop 0.20 is required to run.  If I copy my HDFS configuration files to /opt/apache/hadoop/conf, then Accumulo seems to init properly (it finds hdfs://localhost:8900).  However, when starting it with bin/start-all.sh no errors appear but Accumulo never starts on the compute nodes.  The only process that seems to start is the master process on the frontend and I am able to access the Web page at localhost:50095, where it reports all services except for HDFS are down.

I'm open to suggestions.  Thanks in advance.

Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster

Posted by Scott Roberts <sc...@jhu.edu>.
I set HADOOP_HOME to /opt/apache/hadoop/conf/apache-mr in accumulo-env.sh.  Technically, yes, Hadoop is installed to /opt/apache/hadoop.  However, during the Hadoop setup process for Rocks+ you name a hadoop configuration.  In my case, HDFS is called "apache-hdfs" and MapReduce is called "apache-mr".  It then creates configuration files under /opt/apache/hadoop/conf/apache-hdfs and apache-mr, respectively, then copies that configuration out to the grid.

Here are the exact commands I issued to get Accumulo up and running on my Rocks+ test grid:

rocks add hadoop hdfs name="apache-hdfs" provider="apache" namenodes="frontend" datanodes="compute-0-0 compute-0-1 compute-0-2"

rocks sync hadoop name=apache-hdfs

rocks create hadoop name=apache-hdfs

rocks start hadoop name=apache-hdfs

 
rocks add hadoop mapreduce name="apache-mr" provider="apache" jobtrackers="frontend" tasktrackers="compute-0-0 compute-0-1 compute-0-2" requires=apache-hdfs

rocks sync hadoop name=apache-mr

rocks create hadoop name=apache-mr

rocks start hadoop name=apache-mr


rocks add hadoop zookeeper name=apache-zk provider="apache" quorum-servers="compute-0-0 compute-0-1 compute-0-2"

rocks sync hadoop zookeeper name=apache-zk  

rocks create hadoop name=apache-zk

rocks start hadoop name=apache-zk


Extract Accumulo to /share/apps (which is an NFS export to all compute nodes, a.k.a. shared directory). Edit files as follows:

masters: testcluster

slaves: compute-0-0 compute-0-1 compute-0-2

accumulo-site.xml: Add zookeepers, e.g. <value>compute-0-0:2181,compute-0-1:2181,compute-0-2:2181</value>

accumulo-env.sh:
export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/opt/apache/hadoop/conf/apache-mr
export ACCUMULO_HOME=/state/partition1/accumulo
export ZOOKEEPER_HOME=/opt/apache/zookeeper

Now, we need to get Accumulo copied to /state/partition1.  We also need to chown everything to the hdfs:hadoop user/group, since that is the uid under which it will run:

chown -R hdfs:hadoop /share/apps/accumulo

cp -rp /share/apps/accumulo /state/partition1

rocks run host 'cp -rp /share/apps/accumulo /state/partition1'

Add Hadoop symlinks:
cd /opt/apache/hadoop/conf/apache-mr;ln -s /opt/apache/hadoop/bin;ln -s /opt/apache/hadoop/lib;ln -s mapreduce conf;for i in `ls /opt/apache/hadoop/*.jar`;do ln -s $i;done

rocks run host 'cd /opt/apache/hadoop/conf/apache-mr;ln -s /opt/apache/hadoop/bin;ln -s /opt/apache/hadoop/lib;ln -s mapreduce conf;for i in `ls /opt/apache/hadoop/*.jar`;do ln -s $i;done'

su to the hdfs user on the frontend, cd to /state/partition1/accumulo, run:
bin/accumulo init
bin/start-all.sh

And you're done.  The only caveat is that the namenode Web page (on port 50070) is ONLY accessible via localhost on the frontend for security reasons.  So, while Accumulo will be able to obtain the stats since it is running under the hdfs uid, you won't get any response if you click the namenode link on the Accumulo home page unless you are running from the Web browser on the frontend.

Hope this helps.

On Feb 13, 2012, at 2:14 PM, John W Vines wrote:

> So looking at what you did, I have a question with how rocks+hadoop works. Did you set HADOOP_HOME, or did it set HADOOP_HOME to /opt/apache/hadoop/conf/apache-mr for you? Because looking at the symlinks you put in (as well as the rocks+ rpms), HADOOP_HOME should have just been /opt/apache/hadoop. 
> 
> John
> 
> ----- Original Message -----
> | From: "Scott Roberts" <sc...@jhu.edu>
> | To: "<ac...@incubator.apache.org>" <ac...@incubator.apache.org>
> | Sent: Monday, February 13, 2012 12:24:04 AM
> | Subject: Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster
> | John,
> | 
> | Thanks for the quick reply, especially for late on a Sunday night! I
> | was able to resolve the issue by running this command on each compute
> | node:
> | 
> | cd /opt/apache/hadoop/conf/apache-mr;ln -s /opt/apache/hadoop/bin;ln
> | -s /opt/apache/hadoop/lib;ln -s mapreduce conf;for i in `ls
> | /opt/apache/hadoop/*.jar`;do ln -s $i;done
> | 
> | WRT the slave nodes, the messages were always the same, except this
> | time the tablet servers actually started on the compute nodes. E.g.:
> | 
> | Starting tablet servers and loggers ...... done
> | Starting logger on compute-0-2
> | Starting tablet server on compute-0-1
> | Starting tablet server on compute-0-0
> | Starting tablet server on compute-0-2
> | Starting logger on compute-0-1
> | Starting logger on compute-0-0
> | Starting master on frontend
> | Starting garbage collector on frontend
> | Starting monitor on frontend
> | Starting tracer on frontend
> | 
> | Cheers.
> | 
> | 
> | >
> | > We do not leverage the HADOOP_CONF_DIR for our scripts, but that's
> | > definately something we should look into. We currently just expect
> | > $HADOOP_HOME, grab the libs out of that directory as well as
> | > grabbing the config files from $HADOOP_HOME/conf. So copying your
> | > directory shouldn't be necessary, but you may have to put a symlink
> | > in place. I am creating a ticket in order to make us run better with
> | > installations of hadoop not isolated to a single directory.
> | >
> | > As for the slave nodes, you should see messages when you run
> | > start-all.sh for each service it's starting. If you are not seeing
> | > it attempt to start tservers/loggers on your slave nodes, check the
> | > slaves file in $ACCUMULO_HOME/conf. If you are seeing messages about
> | > those services, go to one of those nodes and check the log files
> | > (start with .out and .err) to see if there's any more information
> | > there. If you have an error similar to your above error WRT HDFS
> | > home, then you should make sure that your HADOOP_HOME changes you
> | > made are in effect on every node. As mentioned above, it's how we
> | > resolves our classpaths and we are currently designed with this
> | > expectation. If you can't figure out the error, let us know. And if
> | > it does work, let us know anyway so we know what should be made
> | > clearer.
> | >
> | > John


Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster

Posted by John W Vines <jo...@ugov.gov>.
So looking at what you did, I have a question with how rocks+hadoop works. Did you set HADOOP_HOME, or did it set HADOOP_HOME to /opt/apache/hadoop/conf/apache-mr for you? Because looking at the symlinks you put in (as well as the rocks+ rpms), HADOOP_HOME should have just been /opt/apache/hadoop. 

John

----- Original Message -----
| From: "Scott Roberts" <sc...@jhu.edu>
| To: "<ac...@incubator.apache.org>" <ac...@incubator.apache.org>
| Sent: Monday, February 13, 2012 12:24:04 AM
| Subject: Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster
| John,
| 
| Thanks for the quick reply, especially for late on a Sunday night! I
| was able to resolve the issue by running this command on each compute
| node:
| 
| cd /opt/apache/hadoop/conf/apache-mr;ln -s /opt/apache/hadoop/bin;ln
| -s /opt/apache/hadoop/lib;ln -s mapreduce conf;for i in `ls
| /opt/apache/hadoop/*.jar`;do ln -s $i;done
| 
| WRT the slave nodes, the messages were always the same, except this
| time the tablet servers actually started on the compute nodes. E.g.:
| 
| Starting tablet servers and loggers ...... done
| Starting logger on compute-0-2
| Starting tablet server on compute-0-1
| Starting tablet server on compute-0-0
| Starting tablet server on compute-0-2
| Starting logger on compute-0-1
| Starting logger on compute-0-0
| Starting master on frontend
| Starting garbage collector on frontend
| Starting monitor on frontend
| Starting tracer on frontend
| 
| Cheers.
| 
| 
| >
| > We do not leverage the HADOOP_CONF_DIR for our scripts, but that's
| > definately something we should look into. We currently just expect
| > $HADOOP_HOME, grab the libs out of that directory as well as
| > grabbing the config files from $HADOOP_HOME/conf. So copying your
| > directory shouldn't be necessary, but you may have to put a symlink
| > in place. I am creating a ticket in order to make us run better with
| > installations of hadoop not isolated to a single directory.
| >
| > As for the slave nodes, you should see messages when you run
| > start-all.sh for each service it's starting. If you are not seeing
| > it attempt to start tservers/loggers on your slave nodes, check the
| > slaves file in $ACCUMULO_HOME/conf. If you are seeing messages about
| > those services, go to one of those nodes and check the log files
| > (start with .out and .err) to see if there's any more information
| > there. If you have an error similar to your above error WRT HDFS
| > home, then you should make sure that your HADOOP_HOME changes you
| > made are in effect on every node. As mentioned above, it's how we
| > resolves our classpaths and we are currently designed with this
| > expectation. If you can't figure out the error, let us know. And if
| > it does work, let us know anyway so we know what should be made
| > clearer.
| >
| > John

Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster

Posted by Scott Roberts <sc...@jhu.edu>.
John,

Thanks for the quick reply, especially for late on a Sunday night!  I was able to resolve the issue by running this command on each compute node:

cd /opt/apache/hadoop/conf/apache-mr;ln -s /opt/apache/hadoop/bin;ln -s /opt/apache/hadoop/lib;ln -s mapreduce conf;for i in `ls /opt/apache/hadoop/*.jar`;do ln -s $i;done

WRT the slave nodes, the messages were always the same, except this time the tablet servers actually started on the compute nodes. E.g.:

Starting tablet servers and loggers ...... done
Starting logger on compute-0-2
Starting tablet server on compute-0-1
Starting tablet server on compute-0-0
Starting tablet server on compute-0-2
Starting logger on compute-0-1
Starting logger on compute-0-0
Starting master on frontend
Starting garbage collector on frontend
Starting monitor on frontend
Starting tracer on frontend

Cheers.


> 
> We do not leverage the HADOOP_CONF_DIR for our scripts, but that's definately something we should look into. We currently just expect $HADOOP_HOME, grab the libs out of that directory as well as grabbing the config files from $HADOOP_HOME/conf. So copying your directory shouldn't be necessary, but you may have to put a symlink in place. I am creating a ticket in order to make us run better with installations of hadoop not isolated to a single directory.
> 
> As for the slave nodes, you should see messages when you run start-all.sh for each service it's starting. If you are not seeing it attempt to start tservers/loggers on your slave nodes, check the slaves file in $ACCUMULO_HOME/conf. If you are seeing messages about those services, go to one of those nodes and check the log files (start with .out and .err) to see if there's any more information there. If you have an error similar to your above error WRT HDFS home, then you should make sure that your HADOOP_HOME changes you made are in effect on every node. As mentioned above, it's how we resolves our classpaths and we are currently designed with this expectation. If you can't figure out the error, let us know. And if it does work, let us know anyway so we know what should be made clearer.
> 
> John


Re: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster

Posted by John W Vines <jo...@ugov.gov>.

----- Original Message -----
| From: "Scott Roberts" <sc...@jhu.edu>
| To: accumulo-user@incubator.apache.org
| Sent: Sunday, February 12, 2012 11:12:25 PM
| Subject: Accumulo 1.3.5 configuration issues with pre-existing Hadoop Rocks+ cluster
| All,
| 
| I am running into an interesting issue while attempting to start
| Accumulo on a Rocks+ cluster where only the Accumulo master will start
| and I am requesting assistance to get it up and running. I believe
| this issue is occurring for two reasons: First, Accumulo does not
| appear to leverage the HADOOP_CONF_DIR environment variable and second
| is that while Accumulo is being started under a specific uid with
| shared SSH keys in place, it is not SSH'ing to the other servers to
| start and stop its processes.
| 
| Setup:
| 
| Rocks+ 1.0.6 "Big Data" cluster consisting of a head node and three
| compute nodes named "frontend", compute-0-0, compute-0-1, and
| compute-0-2 respectively. The frontend runs the namenode and
| jobtracker instances. All three compute nodes are running the
| datanode, tasktracker, and zookeeper instances. HDFS & Zookeeper are
| physically storing their data under /state/partition1/apache-hdfs and
| apache-zk on the compute nodes. Accumulo is configured & stored under
| /state/partition1/accumulo on all four servers where the frontend is
| designated as master and the three compute nodes as slaves.
| 
| The HDFS configuration files are stored under
| /opt/apache/hadoop/conf/apache-hdfs/hdfs, MapReduce under
| /opt/apache/hadoop/conf/apache-mr/mapreduce, and Zookeeper under
| /opt/apache/zookeeper/conf. As per Hadoop 0.20 "best practices" the
| appropriate configuration is referenced by setting the HADOOP_CONF_DIR
| environment variable. For example if I am going to run a MR job I
| export HADOOP_CONF_DIR=/opt/apache/hadoop/conf/apache-mr/mapreduce.
| 
| I've tested HDFS, MapReduce, and Zookeeper so I know all three work
| properly across the cluster provided I set the Hadoop configuration
| variable.
| 
| Results with Accumulo:
| 
| I set HADOOP_CONF_DIR in accumulo-env.sh and it doesn't appear that
| Accumulo leverages that variable. When I run "accumulo init", it
| thinks my HDFS home is file:/// which is not correct and it fails to
| initialize. If I specify my HADOOP_HOME as the mapreduce or HDFS
| configuration directories, Accumulo complains that Hadoop 0.20 is
| required to run. If I copy my HDFS configuration files to
| /opt/apache/hadoop/conf, then Accumulo seems to init properly (it
| finds hdfs://localhost:8900). However, when starting it with
| bin/start-all.sh no errors appear but Accumulo never starts on the
| compute nodes. The only process that seems to start is the master
| process on the frontend and I am able to access the Web page at
| localhost:50095, where it reports all services except for HDFS are
| down.
| 
| I'm open to suggestions. Thanks in advance.

We do not leverage the HADOOP_CONF_DIR for our scripts, but that's definately something we should look into. We currently just expect $HADOOP_HOME, grab the libs out of that directory as well as grabbing the config files from $HADOOP_HOME/conf. So copying your directory shouldn't be necessary, but you may have to put a symlink in place. I am creating a ticket in order to make us run better with installations of hadoop not isolated to a single directory.

As for the slave nodes, you should see messages when you run start-all.sh for each service it's starting. If you are not seeing it attempt to start tservers/loggers on your slave nodes, check the slaves file in $ACCUMULO_HOME/conf. If you are seeing messages about those services, go to one of those nodes and check the log files (start with .out and .err) to see if there's any more information there. If you have an error similar to your above error WRT HDFS home, then you should make sure that your HADOOP_HOME changes you made are in effect on every node. As mentioned above, it's how we resolves our classpaths and we are currently designed with this expectation. If you can't figure out the error, let us know. And if it does work, let us know anyway so we know what should be made clearer.

John