You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Taylor, Ronald C" <ro...@pnl.gov> on 2010/09/20 22:14:41 UTC

RE: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Hello Ryan, Dave, other developers,

Have not fixed the problem. Here's where things stand:

1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:

export HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar: /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.jar

Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.

2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.

That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.

Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?

Obviously, we need more help.

In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:

1) extract data from the source Hbase table and store in an HDFS file, all data needed for analysis contained independently on each row - this task to be done by a non-MapReduce class that can access Hbase tables

2) call an MapReduce class that will process the file in parallel and return an new file (well, a directory of files which I'll combine into one) as output

3) write the contents of the new results file back into an Hbase table using another non-MapReduce class

I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.

Does anybody have any advice?
  Cheers,
   Ron

___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov


-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov]
Sent: Monday, September 20, 2010 10:17 AM
To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
Subject: RE: hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop

I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).

Dave

-----Original Message-----
From: Taylor, Ronald C
Sent: Sunday, September 19, 2010 9:59 PM
To: 'Ryan Rawson'; user@hbase.apache.org; hbase-user@hadoop.apache.org
Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop


Ryan,

Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.

However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?

I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
  http://code.google.com/p/guava-libraries/downloads/list

I Googled and found issue HBASE-2714 (Remove Guava as a client dependency, June 11 2010) here

http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html  (see below, where I've included the text)

which appears to say that Hbase (at least *some* release of Hbase - does this include 0.89?) has a dependency on Guava, in order to run a MapReduce job over Hbase. But nothing on Guava is mentioned at

   http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

(I cannot find anything in the Hbase 0.89 online documents on Guava or in how to set CLASSPATH or in what *.jar files to include so I can use MapReduce with Hbase; the best guidance I can find is in this earlier document.)

So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.

 Regards,
   Ron

%%%%%%%%%%%%%%%%%%%%%%%%

From

http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html


Todd Lipcon commented on HBASE-2714:
------------------------------------

Why not?

In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?

ryan rawson commented on HBASE-2714:
------------------------------------

not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.

Todd Lipcon commented on HBASE-2714:
------------------------------------

Does this mean in general that we can't add more dependencies to the hbase client? I think instead we should make it easier to run hbase MR jobs *without* touching the Hadoop config (eg right now you have to restart MR to upgrade hbase, that's not going to fly for a lot of clusters)

stack commented on HBASE-2714:
------------------------------

So, we need to change our recommendations here:
http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?


> Remove Guava as a client dependency
> -----------------------------------
>
>                 Key: HBASE-2714
>                 URL: https://issues.apache.org/jira/browse/HBASE-2714
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Jeff Hammerbacher
>
> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.


%%%%%%%%%%%%%%%%%%%%%%%%



-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Sunday, September 19, 2010 12:45 AM
To: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
Subject: Re: hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop

hey,

looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.

You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.

-ryan



On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
> Hi folks,
>
> Got a problem in basic Hadoop-Hbase communication. My small test
> program ProteinCounter1.java - shown in full below - reports out this
> error
>
>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>
> The full invocation and error msgs are shown at bottom.
>
> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>
> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>
> But my test program for such, as you see from the error msg, is not
> working. Apparently the
>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>  class is not found.
>
> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>
>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
> /home/hbase/hbase/hbase-0.89.20100726.jar:
> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>
>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>
> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
> and
>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>  is indeed present that Hbase *.jar file.
>
> Also, I have restarted both Hbase and Hadoop after making this change.
>
> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>
>   Regards,
>     Ron T.
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group Pacific Northwest
> National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, Mail Stop J4-33
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%
>
> contents of the "ProteinCounter1.java" file:
>
>
>
> //  to compile
> // javac ProteinCounter1.java
> // jar cf ProteinCounterTest.jar  *.class
>
> // to run
> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>
>
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
> import org.apache.hadoop.mapreduce.Job; import
> org.apache.hadoop.io.IntWritable;
>
> import java.util.*;
> import java.io.*;
> import org.apache.hadoop.hbase.*;
> import org.apache.hadoop.hbase.client.*; import
> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
> import org.apache.hadoop.hbase.mapreduce.*;
>
>
> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> /**
>  * counts the number of times each protein appears in the proteinTable
>  *
>  */
> public class ProteinCounter1 {
>
>
>    static class ProteinMapper1 extends
> TableMapper<ImmutableBytesWritable, IntWritable> {
>
>        private int numRecords = 0;
>        private static final IntWritable one = new IntWritable(1);
>
>        @Override
>            public void map(ImmutableBytesWritable row, Result values,
> Context context) throws IOException {
>
>            // retrieve the value of proteinID, which is the row key
> for each protein in the proteinTable
>            ImmutableBytesWritable proteinID_Key = new
> ImmutableBytesWritable(row.get());
>            try {
>                context.write(proteinID_Key, one);
>            } catch (InterruptedException e) {
>                throw new IOException(e);
>            }
>            numRecords++;
>            if ((numRecords % 100) == 0) {
>                context.setStatus("mapper processed " + numRecords + "
> proteinTable records so far");
>            }
>        }
>    }
>
>    public static class ProteinReducer1 extends
> TableReducer<ImmutableBytesWritable,
>                                               IntWritable,
> ImmutableBytesWritable> {
>
>        public void reduce(ImmutableBytesWritable proteinID_key,
> Iterable<IntWritable> values,
>                            Context context)
>            throws IOException, InterruptedException {
>            int sum = 0;
>            for (IntWritable val : values) {
>                sum += val.get();
>            }
>
>            Put put = new Put(proteinID_key.get());
>            put.add(Bytes.toBytes("resultFields"),
> Bytes.toBytes("total"), Bytes.toBytes(sum));
>            System.out.println(String.format("stats : proteinID_key :
> %d, count : %d",
>
> Bytes.toInt(proteinID_key.get()), sum));
>            context.write(proteinID_key, put);
>        }
>    }
>
>    public static void main(String[] args) throws Exception {
>
>        org.apache.hadoop.conf.Configuration conf;
>           conf = org.apache.hadoop.hbase.HBaseConfiguration.create();
>
>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>        job.setJarByClass(ProteinCounter1.class);
>
>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>
>        String colFamilyToUse = "proteinFields";
>        String fieldToUse = "Protein_Ref_ID";
>
>        // retreive this one column from the specified family
>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
> Bytes.toBytes(fieldToUse));
>
>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
> filterToUse =
>                 new
> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>        scan.setFilter(filterToUse);
>
>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
> ProteinMapper1.class,
>                              ImmutableBytesWritable.class,
>                                              IntWritable.class, job);
>        TableMapReduceUtil.initTableReducerJob("testTable",
> ProteinReducer1.class, job);
>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>    }
> }
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%
>
>
> session output:
>
> [rtaylor@h01 Hadoop]$ javac ProteinCounter1.java
>
> [rtaylor@h01 Hadoop]$ jar cf ProteinCounterTest.jar  *.class
>
> [rtaylor@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar
> ProteinCounter1
>
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
> 10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
> zookeeper
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14
> GMT
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:host.name=h01.emsl.pnl.gov
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_21
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/java/jdk1.6.0_21/jre
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java/
> default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/b
> in/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-cl
> i-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/h
> adoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/../
> lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/common
> s-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-api
> -1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/
> hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../lib
> /hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.
> 5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/hom
> e/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/..
> /lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.14
> .jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoo
> p/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.15
> .jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoop
> /hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servle
> t-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.ja
> r:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/
> hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-2
> .1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:
> /home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home
> /rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeepe
> r-3.3.1.jar
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/Li
> nux-i386-32
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:os.arch=i386
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-194.11.1.el5
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:user.name=rtaylor
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/home/rtaylor
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/home/rtaylor/HadoopWork/Hadoop
> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client
> connection,
> connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h0
> 8:2182,h07:2182,h06:2182 sessionTimeout=60000
> watcher=org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper@dcb03b
> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket connection
> to server h04/192.168.200.24:2182
> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection
> established to h04/192.168.200.24:2182, initiating session
> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment
> complete on server h04/192.168.200.24:2182, sessionid =
> 0x22b21c04c330002, negotiated timeout = 60000
> 10/09/17 15:46:20 INFO mapred.JobClient: Running job:
> job_201009171510_0004
> 10/09/17 15:46:21 INFO mapred.JobClient:  map 0% reduce 0%
>
> 10/09/17 15:46:27 INFO mapred.JobClient: Task Id :
> attempt_201009171510_0004_m_000002_0, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>        at
> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext
> .java:193)
>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>        at java.lang.Class.forName0(Native Method)
>        at java.lang.Class.forName(Class.java:247)
>        at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java
> :762)
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>        ... 4 more
>
> 10/09/17 15:46:33 INFO mapred.JobClient: Task Id :
> attempt_201009171510_0004_r_000051_0, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>        at
> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext
> .java:193)
>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>        at java.lang.Class.forName0(Native Method)
>        at java.lang.Class.forName(Class.java:247)
>        at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java
> :762)
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>        ... 4 more
>
> I terminated the program here via <Control><C>, since the error msgs were simply repeating.
>
> [rtaylor@h01 Hadoop]$
>

Re: hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

The guava*.jar is used by part of the HBase code.  If you end up with
'class not found com.google.commons...' you will need it.

>From looking at the available info, things should work.  Usually the
first class that we fail to find is HBaseConfiguration in a job, so
perhaps the jar file doesnt contain the class you think... One of the
reasons why it might be confusing is there is an
org.apache.hadoop.hbase.mapreduce andorg.apache.hadoop.hbase.mapred
packages, both containing TableInputFormat/TableOutputFormat classes
each.

Without more info you're going to have to dig a bit more.  Check to
make sure the mapreduce job is running with the classpath you think it
should have (should be in the logs), md5sum the jars to make sure
things are not corrupted, other basic troubleshooting steps.

Good luck!
-ryan


On Mon, Sep 20, 2010 at 4:01 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
> Forgot to ask - I haven't heard back from anybody directly answering my question on the guava*.jar file.
>
> So - is such a file needed for MapReduce access to Hbase tables, or - one less worry - can I forget about lack of such a file being present being the cause of our problem here?
> Ron
>
> -----Original Message-----
> From: Taylor, Ronald C
> Sent: Monday, September 20, 2010 3:58 PM
> To: 'Ryan Rawson'
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; Witteveen, Tim; Taylor, Ronald C
> Subject: hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop
>
>
> Hi Ryan,
>
> Here are our answers:
>
>> What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...
>
> Answer:
>
> rpm -qa | egrep "java|jdk"
> jdk-1.6.0_21-fcs
> sun-javadb-docs-10.5.3-0.2
> sun-javadb-common-10.5.3-0.2
> sun-javadb-client-10.5.3-0.2
> sun-javadb-demo-10.5.3-0.2
> sun-javadb-javadoc-10.5.3-0.2
> sun-javadb-core-10.5.3-0.2
>
>> Do you have any custom changes to any of the bin/* scripts in hadoop?
>
> Answer: Nope. As far as we (my colleague Tim Witteveen and me) can remember, we made no changes at all.
>
>
>> What else can you tell us about your environment?
>
> Answer: It is
>            Redhat 5.5 x64_64
>
> I notethat we used JDK 1.6.0 - so we should be OK there, I suppose?
> Ron
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 2:11 PM
> To: Taylor, Ronald C
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; Witteveen, Tim
> Subject: Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop
>
> Ok that looks good.  Sometimes when you successively build and chain classpaths you can accidently overwrite the previous ones.  But we are looking fine here.
>
> What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...
>
> Do you have any custom changes to any of the bin/* scripts in hadoop?
>
> What else can you tell us about your environment?
>
>
> On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>
>>
>> Found it -
>> http://pastebin.com/SfFYSLJy
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Monday, September 20, 2010 1:50 PM
>> To: Taylor, Ronald C
>> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org;
>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still, normally those other jars are in another subdir so their full path should be:
>> /home/hbase/hbase/lib/log4j-1.2.16.jar
>>
>> the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir layout too much.
>>
>> As for the pastebin you will need to send us your direct link, since so many people post and there isnt really good searching systems, its generally preferred to send the direct link to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps done as well.
>>
>> Thanks!
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>> Ryan,
>>>
>>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>>>
>>> lor@h01 hbase]$ pwd
>>> /home/hbase
>>> [rtaylor@h01 hbase]$ ls -l
>>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>>> hbase-0.89.20100726
>>> [rtaylor@h01 hbase]$
>>>
>>>
>>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>>>
>>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>>
>>> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>>>
>>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>>>  Ron
>>>
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zook
>>> e
>>> eper-3.3.1.jar
>>>
>>>
>>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
>>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>>> hbase-site.xml  log4j.properties tohtml.xsl
>>>
>>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>>> /home/hbase/hbase/hbase-0.89.20100726.jar
>>> [rtaylor@h01 conf]$
>>>
>>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>>> /home/hbase/hbase/log4j-1.2.16.jar
>>> [rtaylor@h01 conf]$
>>>
>>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>>> /home/hbase/hbase/zookeeper-3.3.1.jar
>>> [rtaylor@h01 conf]$
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>> Sent: Monday, September 20, 2010 1:17 PM
>>> To: Taylor, Ronald C
>>> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
>>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>>> the hbase*.jar classes apparently not being found by Hadoop
>>>
>>> Hey,
>>>
>>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>>>
>>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>>>
>>> Good luck,
>>> -ryan
>>>
>>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>>
>>>> Hello Ryan, Dave, other developers,
>>>>
>>>> Have not fixed the problem. Here's where things stand:
>>>>
>>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>>>
>>>> export
>>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>>> 2
>>>> 0100726.jar:
>>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>>> j
>>>> ar
>>>>
>>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>>>
>>>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>>>
>>>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>>>
>>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>>>
>>>> Obviously, we need more help.
>>>>
>>>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>>>
>>>> 1) extract data from the source Hbase table and store in an HDFS
>>>> file, all data needed for analysis contained independently on each
>>>> row - this task to be done by a non-MapReduce class that can access
>>>> Hbase tables
>>>>
>>>> 2) call an MapReduce class that will process the file in parallel
>>>> and return an new file (well, a directory of files which I'll
>>>> combine into
>>>> one) as output
>>>>
>>>> 3) write the contents of the new results file back into an Hbase
>>>> table using another non-MapReduce class
>>>>
>>>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>>>
>>>> Does anybody have any advice?
>>>>  Cheers,
>>>>   Ron
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: ronald.taylor@pnl.gov
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Buttler, David [mailto:buttler1@llnl.gov]
>>>> Sent: Monday, September 20, 2010 10:17 AM
>>>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>>> apparently not being found by Hadoop
>>>>
>>>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>>>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>>>
>>>> Dave
>>>>
>>>> -----Original Message-----
>>>> From: Taylor, Ronald C
>>>> Sent: Sunday, September 19, 2010 9:59 PM
>>>> To: 'Ryan Rawson'; user@hbase.apache.org;
>>>> hbase-user@hadoop.apache.org
>>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>>> help, a class is apparently not being found by Hadoop
>>>>
>>>>
>>>> Ryan,
>>>>
>>>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>>>
>>>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>>>
>>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>>
>>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>>> dependency, June 11 2010) here
>>>>
>>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>> (see below, where I've included the text)
>>>>
>>>> which appears to say that Hbase (at least *some* release of Hbase -
>>>> does this include 0.89?) has a dependency on Guava, in order to run
>>>> a MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>>
>>>>
>>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/map
>>>> r
>>>> e
>>>> duce/package-summary.html#classpath
>>>>
>>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>>> or in how to set CLASSPATH or in what *.jar files to include so I
>>>> can use MapReduce with Hbase; the best guidance I can find is in
>>>> this earlier
>>>> document.)
>>>>
>>>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>>>
>>>>  Regards,
>>>>   Ron
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> From
>>>>
>>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>>
>>>>
>>>> Todd Lipcon commented on HBASE-2714:
>>>> ------------------------------------
>>>>
>>>> Why not?
>>>>
>>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>>>
>>>> ryan rawson commented on HBASE-2714:
>>>> ------------------------------------
>>>>
>>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>>>
>>>> Todd Lipcon commented on HBASE-2714:
>>>> ------------------------------------
>>>>
>>>> Does this mean in general that we can't add more dependencies to the
>>>> hbase client? I think instead we should make it easier to run hbase
>>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>>> clusters)
>>>>
>>>> stack commented on HBASE-2714:
>>>> ------------------------------
>>>>
>>>> So, we need to change our recommendations here:
>>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>>
>>>>
>>>>> Remove Guava as a client dependency
>>>>> -----------------------------------
>>>>>
>>>>>                 Key: HBASE-2714
>>>>>                 URL:
>>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>>             Project: HBase
>>>>>          Issue Type: Improvement
>>>>>          Components: client
>>>>>            Reporter: Jeff Hammerbacher
>>>>>
>>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>>> Sent: Sunday, September 19, 2010 12:45 AM
>>>> To: user@hbase.apache.org
>>>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>>> apparently not being found by Hadoop
>>>>
>>>> hey,
>>>>
>>>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>>>
>>>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>>>
>>>> -ryan
>>>>
>>>>
>>>>
>>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>>> program ProteinCounter1.java - shown in full below - reports out
>>>>> this error
>>>>>
>>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>>        at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:80
>>>>> 9
>>>>> )
>>>>>
>>>>> The full invocation and error msgs are shown at bottom.
>>>>>
>>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>>>
>>>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>>>
>>>>> But my test program for such, as you see from the error msg, is not
>>>>> working. Apparently the
>>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>>  class is not found.
>>>>>
>>>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>>>
>>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>>
>>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>>
>>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>>> and
>>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>>  is indeed present that Hbase *.jar file.
>>>>>
>>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>>
>>>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>>>
>>>>>   Regards,
>>>>>     Ron T.
>>>>>
>>>>> ___________________________________________
>>>>> Ronald Taylor, Ph.D.
>>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>>> National Laboratory
>>>>> 902 Battelle Boulevard
>>>>> P.O. Box 999, Mail Stop J4-33
>>>>> Richland, WA  99352 USA
>>>>> Office:  509-372-6568
>>>>> Email: ronald.taylor@pnl.gov
>>>>>
>>>>>
>>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>> %
>>>>> %
>>>>> %
>>>>> %%%%%%%%%%%%
>>>>>
>>>>> contents of the "ProteinCounter1.java" file:
>>>>>
>>>>>
>>>>>
>>>>> //  to compile
>>>>> // javac ProteinCounter1.java
>>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>>
>>>>> // to run
>>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>>
>>>>>
>>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>>> import org.apache.hadoop.conf.Configuration;
>>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>>> import org.apache.hadoop.mapreduce.Job; import
>>>>> org.apache.hadoop.io.IntWritable;
>>>>>
>>>>> import java.util.*;
>>>>> import java.io.*;
>>>>> import org.apache.hadoop.hbase.*;
>>>>> import org.apache.hadoop.hbase.client.*; import
>>>>> org.apache.hadoop.hbase.io.*; import
>>>>> org.apache.hadoop.hbase.util.*; import
>>>>> org.apache.hadoop.hbase.mapreduce.*;
>>>>>
>>>>>
>>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>>
>>>>> /**
>>>>>  * counts the number of times each protein appears in the
>>>>> proteinTable
>>>>>  *
>>>>>  */
>>>>> public class ProteinCounter1 {
>>>>>
>>>>>
>>>>>    static class ProteinMapper1 extends
>>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>>
>>>>>        private int numRecords = 0;
>>>>>        private static final IntWritable one = new IntWritable(1);
>>>>>
>>>>>        @Override
>>>>>            public void map(ImmutableBytesWritable row, Result
>>>>> values, Context context) throws IOException {
>>>>>
>>>>>            // retrieve the value of proteinID, which is the row key
>>>>> for each protein in the proteinTable
>>>>>            ImmutableBytesWritable proteinID_Key = new
>>>>> ImmutableBytesWritable(row.get());
>>>>>            try {
>>>>>                context.write(proteinID_Key, one);
>>>>>            } catch (InterruptedException e) {
>>>>>                throw new IOException(e);
>>>>>            }
>>>>>            numRecords++;
>>>>>            if ((numRecords % 100) == 0) {
>>>>>                context.setStatus("mapper processed " + numRecords + "
>>>>> proteinTable records so far");
>>>>>            }
>>>>>        }
>>>>>    }
>>>>>
>>>>>    public static class ProteinReducer1 extends
>>>>> TableReducer<ImmutableBytesWritable,
>>>>>                                               IntWritable,
>>>>> ImmutableBytesWritable> {
>>>>>
>>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>>> Iterable<IntWritable> values,
>>>>>                            Context context)
>>>>>            throws IOException, InterruptedException {
>>>>>            int sum = 0;
>>>>>            for (IntWritable val : values) {
>>>>>                sum += val.get();
>>>>>            }
>>>>>
>>>>>            Put put = new Put(proteinID_key.get());
>>>>>            put.add(Bytes.toBytes("resultFields"),
>>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>>            System.out.println(String.format("stats : proteinID_key :
>>>>> %d, count : %d",
>>>>>
>>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>>            context.write(proteinID_key, put);
>>>>>        }
>>>>>    }
>>>>>
>>>>>    public static void main(String[] args) throws Exception {
>>>>>
>>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>>           conf =
>>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>>
>>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>>
>>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>>
>>>>>        String colFamilyToUse = "proteinFields";
>>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>>
>>>>>        // retreive this one column from the specified family
>>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>>> Bytes.toBytes(fieldToUse));
>>>>>
>>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>>> filterToUse =
>>>>>                 new
>>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>>        scan.setFilter(filterToUse);
>>>>>
>>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>>> ProteinMapper1.class,
>>>>>                              ImmutableBytesWritable.class,
>>>>>                                              IntWritable.class,
>>>>> job);
>>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>>> ProteinReducer1.class, job);
>>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>>    }
>>>>> }
>>>>>
>>>>>
>>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>> %
>>>>>
>>
>

RE: hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.
Forgot to ask - I haven't heard back from anybody directly answering my question on the guava*.jar file.

So - is such a file needed for MapReduce access to Hbase tables, or - one less worry - can I forget about lack of such a file being present being the cause of our problem here?
Ron

-----Original Message-----
From: Taylor, Ronald C
Sent: Monday, September 20, 2010 3:58 PM
To: 'Ryan Rawson'
Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; Witteveen, Tim; Taylor, Ronald C
Subject: hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop


Hi Ryan,

Here are our answers:

> What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...

Answer:

rpm -qa | egrep "java|jdk"
jdk-1.6.0_21-fcs
sun-javadb-docs-10.5.3-0.2
sun-javadb-common-10.5.3-0.2
sun-javadb-client-10.5.3-0.2
sun-javadb-demo-10.5.3-0.2
sun-javadb-javadoc-10.5.3-0.2
sun-javadb-core-10.5.3-0.2

> Do you have any custom changes to any of the bin/* scripts in hadoop?

Answer: Nope. As far as we (my colleague Tim Witteveen and me) can remember, we made no changes at all.


> What else can you tell us about your environment?

Answer: It is
            Redhat 5.5 x64_64

I notethat we used JDK 1.6.0 - so we should be OK there, I suppose?
Ron


-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Monday, September 20, 2010 2:11 PM
To: Taylor, Ronald C
Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; Witteveen, Tim
Subject: Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Ok that looks good.  Sometimes when you successively build and chain classpaths you can accidently overwrite the previous ones.  But we are looking fine here.

What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...

Do you have any custom changes to any of the bin/* scripts in hadoop?

What else can you tell us about your environment?


On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
>
> Found it -
> http://pastebin.com/SfFYSLJy
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:50 PM
> To: Taylor, Ronald C
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org;
> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
> the hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still, normally those other jars are in another subdir so their full path should be:
> /home/hbase/hbase/lib/log4j-1.2.16.jar
>
> the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir layout too much.
>
> As for the pastebin you will need to send us your direct link, since so many people post and there isnt really good searching systems, its generally preferred to send the direct link to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps done as well.
>
> Thanks!
> -ryan
>
> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>> Ryan,
>>
>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>>
>> lor@h01 hbase]$ pwd
>> /home/hbase
>> [rtaylor@h01 hbase]$ ls -l
>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>> hbase-0.89.20100726
>> [rtaylor@h01 hbase]$
>>
>>
>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>>
>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>
>> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>>
>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>>  Ron
>>
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>> 2
>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zook
>> e
>> eper-3.3.1.jar
>>
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>> hbase-site.xml  log4j.properties tohtml.xsl
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>> /home/hbase/hbase/hbase-0.89.20100726.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>> /home/hbase/hbase/log4j-1.2.16.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>> /home/hbase/hbase/zookeeper-3.3.1.jar
>> [rtaylor@h01 conf]$
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Monday, September 20, 2010 1:17 PM
>> To: Taylor, Ronald C
>> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>>
>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>>
>> Good luck,
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>
>>> Hello Ryan, Dave, other developers,
>>>
>>> Have not fixed the problem. Here's where things stand:
>>>
>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:
>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>> j
>>> ar
>>>
>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>>
>>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>>
>>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>>
>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>>
>>> Obviously, we need more help.
>>>
>>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>>
>>> 1) extract data from the source Hbase table and store in an HDFS
>>> file, all data needed for analysis contained independently on each
>>> row - this task to be done by a non-MapReduce class that can access
>>> Hbase tables
>>>
>>> 2) call an MapReduce class that will process the file in parallel
>>> and return an new file (well, a directory of files which I'll
>>> combine into
>>> one) as output
>>>
>>> 3) write the contents of the new results file back into an Hbase
>>> table using another non-MapReduce class
>>>
>>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>>
>>> Does anybody have any advice?
>>>  Cheers,
>>>   Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> -----Original Message-----
>>> From: Buttler, David [mailto:buttler1@llnl.gov]
>>> Sent: Monday, September 20, 2010 10:17 AM
>>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: Taylor, Ronald C
>>> Sent: Sunday, September 19, 2010 9:59 PM
>>> To: 'Ryan Rawson'; user@hbase.apache.org;
>>> hbase-user@hadoop.apache.org
>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>> help, a class is apparently not being found by Hadoop
>>>
>>>
>>> Ryan,
>>>
>>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>>
>>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>>
>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>
>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>> dependency, June 11 2010) here
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>> (see below, where I've included the text)
>>>
>>> which appears to say that Hbase (at least *some* release of Hbase -
>>> does this include 0.89?) has a dependency on Guava, in order to run
>>> a MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>
>>>
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/map
>>> r
>>> e
>>> duce/package-summary.html#classpath
>>>
>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>> or in how to set CLASSPATH or in what *.jar files to include so I
>>> can use MapReduce with Hbase; the best guidance I can find is in
>>> this earlier
>>> document.)
>>>
>>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>>
>>>  Regards,
>>>   Ron
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> From
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Why not?
>>>
>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>>
>>> ryan rawson commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Does this mean in general that we can't add more dependencies to the
>>> hbase client? I think instead we should make it easier to run hbase
>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>> clusters)
>>>
>>> stack commented on HBASE-2714:
>>> ------------------------------
>>>
>>> So, we need to change our recommendations here:
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>
>>>
>>>> Remove Guava as a client dependency
>>>> -----------------------------------
>>>>
>>>>                 Key: HBASE-2714
>>>>                 URL:
>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>             Project: HBase
>>>>          Issue Type: Improvement
>>>>          Components: client
>>>>            Reporter: Jeff Hammerbacher
>>>>
>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>> Sent: Sunday, September 19, 2010 12:45 AM
>>> To: user@hbase.apache.org
>>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> hey,
>>>
>>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>>
>>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>> program ProteinCounter1.java - shown in full below - reports out
>>>> this error
>>>>
>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>        at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:80
>>>> 9
>>>> )
>>>>
>>>> The full invocation and error msgs are shown at bottom.
>>>>
>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>>
>>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>>
>>>> But my test program for such, as you see from the error msg, is not
>>>> working. Apparently the
>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>  class is not found.
>>>>
>>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>>
>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>
>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>
>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>> and
>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>  is indeed present that Hbase *.jar file.
>>>>
>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>
>>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>>
>>>>   Regards,
>>>>     Ron T.
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: ronald.taylor@pnl.gov
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>> %
>>>> %
>>>> %%%%%%%%%%%%
>>>>
>>>> contents of the "ProteinCounter1.java" file:
>>>>
>>>>
>>>>
>>>> //  to compile
>>>> // javac ProteinCounter1.java
>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>
>>>> // to run
>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>
>>>>
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>> import org.apache.hadoop.mapreduce.Job; import
>>>> org.apache.hadoop.io.IntWritable;
>>>>
>>>> import java.util.*;
>>>> import java.io.*;
>>>> import org.apache.hadoop.hbase.*;
>>>> import org.apache.hadoop.hbase.client.*; import
>>>> org.apache.hadoop.hbase.io.*; import
>>>> org.apache.hadoop.hbase.util.*; import
>>>> org.apache.hadoop.hbase.mapreduce.*;
>>>>
>>>>
>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> /**
>>>>  * counts the number of times each protein appears in the
>>>> proteinTable
>>>>  *
>>>>  */
>>>> public class ProteinCounter1 {
>>>>
>>>>
>>>>    static class ProteinMapper1 extends
>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>
>>>>        private int numRecords = 0;
>>>>        private static final IntWritable one = new IntWritable(1);
>>>>
>>>>        @Override
>>>>            public void map(ImmutableBytesWritable row, Result
>>>> values, Context context) throws IOException {
>>>>
>>>>            // retrieve the value of proteinID, which is the row key
>>>> for each protein in the proteinTable
>>>>            ImmutableBytesWritable proteinID_Key = new
>>>> ImmutableBytesWritable(row.get());
>>>>            try {
>>>>                context.write(proteinID_Key, one);
>>>>            } catch (InterruptedException e) {
>>>>                throw new IOException(e);
>>>>            }
>>>>            numRecords++;
>>>>            if ((numRecords % 100) == 0) {
>>>>                context.setStatus("mapper processed " + numRecords + "
>>>> proteinTable records so far");
>>>>            }
>>>>        }
>>>>    }
>>>>
>>>>    public static class ProteinReducer1 extends
>>>> TableReducer<ImmutableBytesWritable,
>>>>                                               IntWritable,
>>>> ImmutableBytesWritable> {
>>>>
>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>> Iterable<IntWritable> values,
>>>>                            Context context)
>>>>            throws IOException, InterruptedException {
>>>>            int sum = 0;
>>>>            for (IntWritable val : values) {
>>>>                sum += val.get();
>>>>            }
>>>>
>>>>            Put put = new Put(proteinID_key.get());
>>>>            put.add(Bytes.toBytes("resultFields"),
>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>            System.out.println(String.format("stats : proteinID_key :
>>>> %d, count : %d",
>>>>
>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>            context.write(proteinID_key, put);
>>>>        }
>>>>    }
>>>>
>>>>    public static void main(String[] args) throws Exception {
>>>>
>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>           conf =
>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>
>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>
>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>
>>>>        String colFamilyToUse = "proteinFields";
>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>
>>>>        // retreive this one column from the specified family
>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>> Bytes.toBytes(fieldToUse));
>>>>
>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>> filterToUse =
>>>>                 new
>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>        scan.setFilter(filterToUse);
>>>>
>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>> ProteinMapper1.class,
>>>>                              ImmutableBytesWritable.class,
>>>>                                              IntWritable.class,
>>>> job);
>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>> ProteinReducer1.class, job);
>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>    }
>>>> }
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>>
>

hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.
Hi Ryan,

Here are our answers:

> What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...

Answer:

rpm -qa | egrep "java|jdk"
jdk-1.6.0_21-fcs
sun-javadb-docs-10.5.3-0.2
sun-javadb-common-10.5.3-0.2
sun-javadb-client-10.5.3-0.2
sun-javadb-demo-10.5.3-0.2
sun-javadb-javadoc-10.5.3-0.2
sun-javadb-core-10.5.3-0.2

> Do you have any custom changes to any of the bin/* scripts in hadoop?

Answer: Nope. As far as we (my colleague Tim Witteveen and me) can remember, we made no changes at all.


> What else can you tell us about your environment?

Answer: It is
            Redhat 5.5 x64_64

I notethat we used JDK 1.6.0 - so we should be OK there, I suppose?
Ron


-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Monday, September 20, 2010 2:11 PM
To: Taylor, Ronald C
Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; Witteveen, Tim
Subject: Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Ok that looks good.  Sometimes when you successively build and chain classpaths you can accidently overwrite the previous ones.  But we are looking fine here.

What version of java is hadoop running under?  We are compiling our HBase jars using java6, so that is another source of potential incompatibilities...

Do you have any custom changes to any of the bin/* scripts in hadoop?

What else can you tell us about your environment?


On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
>
> Found it -
> http://pastebin.com/SfFYSLJy
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:50 PM
> To: Taylor, Ronald C
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org;
> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
> the hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still, normally those other jars are in another subdir so their full path should be:
> /home/hbase/hbase/lib/log4j-1.2.16.jar
>
> the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir layout too much.
>
> As for the pastebin you will need to send us your direct link, since so many people post and there isnt really good searching systems, its generally preferred to send the direct link to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps done as well.
>
> Thanks!
> -ryan
>
> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>> Ryan,
>>
>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>>
>> lor@h01 hbase]$ pwd
>> /home/hbase
>> [rtaylor@h01 hbase]$ ls -l
>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>> hbase-0.89.20100726
>> [rtaylor@h01 hbase]$
>>
>>
>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>>
>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>
>> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>>
>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>>  Ron
>>
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>> 2
>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zook
>> e
>> eper-3.3.1.jar
>>
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>> hbase-site.xml  log4j.properties
>> tohtml.xsl
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>> /home/hbase/hbase/hbase-0.89.20100726.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>> /home/hbase/hbase/log4j-1.2.16.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>> /home/hbase/hbase/zookeeper-3.3.1.jar
>> [rtaylor@h01 conf]$
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Monday, September 20, 2010 1:17 PM
>> To: Taylor, Ronald C
>> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>>
>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>>
>> Good luck,
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>
>>> Hello Ryan, Dave, other developers,
>>>
>>> Have not fixed the problem. Here's where things stand:
>>>
>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:
>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>> j
>>> ar
>>>
>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>>
>>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>>
>>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>>
>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>>
>>> Obviously, we need more help.
>>>
>>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>>
>>> 1) extract data from the source Hbase table and store in an HDFS
>>> file, all data needed for analysis contained independently on each
>>> row - this task to be done by a non-MapReduce class that can access
>>> Hbase tables
>>>
>>> 2) call an MapReduce class that will process the file in parallel
>>> and return an new file (well, a directory of files which I'll
>>> combine into
>>> one) as output
>>>
>>> 3) write the contents of the new results file back into an Hbase
>>> table using another non-MapReduce class
>>>
>>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>>
>>> Does anybody have any advice?
>>>  Cheers,
>>>   Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> -----Original Message-----
>>> From: Buttler, David [mailto:buttler1@llnl.gov]
>>> Sent: Monday, September 20, 2010 10:17 AM
>>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: Taylor, Ronald C
>>> Sent: Sunday, September 19, 2010 9:59 PM
>>> To: 'Ryan Rawson'; user@hbase.apache.org;
>>> hbase-user@hadoop.apache.org
>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>> help, a class is apparently not being found by Hadoop
>>>
>>>
>>> Ryan,
>>>
>>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>>
>>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>>
>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>
>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>> dependency, June 11 2010) here
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>> (see below, where I've included the text)
>>>
>>> which appears to say that Hbase (at least *some* release of Hbase -
>>> does this include 0.89?) has a dependency on Guava, in order to run
>>> a MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>
>>>
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/map
>>> r
>>> e
>>> duce/package-summary.html#classpath
>>>
>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>> or in how to set CLASSPATH or in what *.jar files to include so I
>>> can use MapReduce with Hbase; the best guidance I can find is in
>>> this earlier
>>> document.)
>>>
>>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>>
>>>  Regards,
>>>   Ron
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> From
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Why not?
>>>
>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>>
>>> ryan rawson commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Does this mean in general that we can't add more dependencies to the
>>> hbase client? I think instead we should make it easier to run hbase
>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>> clusters)
>>>
>>> stack commented on HBASE-2714:
>>> ------------------------------
>>>
>>> So, we need to change our recommendations here:
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>
>>>
>>>> Remove Guava as a client dependency
>>>> -----------------------------------
>>>>
>>>>                 Key: HBASE-2714
>>>>                 URL:
>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>             Project: HBase
>>>>          Issue Type: Improvement
>>>>          Components: client
>>>>            Reporter: Jeff Hammerbacher
>>>>
>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>> Sent: Sunday, September 19, 2010 12:45 AM
>>> To: user@hbase.apache.org
>>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> hey,
>>>
>>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>>
>>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>> program ProteinCounter1.java - shown in full below - reports out
>>>> this error
>>>>
>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>        at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:80
>>>> 9
>>>> )
>>>>
>>>> The full invocation and error msgs are shown at bottom.
>>>>
>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>>
>>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>>
>>>> But my test program for such, as you see from the error msg, is not
>>>> working. Apparently the
>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>  class is not found.
>>>>
>>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>>
>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>
>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>
>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>> and
>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>  is indeed present that Hbase *.jar file.
>>>>
>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>
>>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>>
>>>>   Regards,
>>>>     Ron T.
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: ronald.taylor@pnl.gov
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>> %
>>>> %
>>>> %%%%%%%%%%%%
>>>>
>>>> contents of the "ProteinCounter1.java" file:
>>>>
>>>>
>>>>
>>>> //  to compile
>>>> // javac ProteinCounter1.java
>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>
>>>> // to run
>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>
>>>>
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>> import org.apache.hadoop.mapreduce.Job; import
>>>> org.apache.hadoop.io.IntWritable;
>>>>
>>>> import java.util.*;
>>>> import java.io.*;
>>>> import org.apache.hadoop.hbase.*;
>>>> import org.apache.hadoop.hbase.client.*; import
>>>> org.apache.hadoop.hbase.io.*; import
>>>> org.apache.hadoop.hbase.util.*; import
>>>> org.apache.hadoop.hbase.mapreduce.*;
>>>>
>>>>
>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> /**
>>>>  * counts the number of times each protein appears in the
>>>> proteinTable
>>>>  *
>>>>  */
>>>> public class ProteinCounter1 {
>>>>
>>>>
>>>>    static class ProteinMapper1 extends
>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>
>>>>        private int numRecords = 0;
>>>>        private static final IntWritable one = new IntWritable(1);
>>>>
>>>>        @Override
>>>>            public void map(ImmutableBytesWritable row, Result
>>>> values, Context context) throws IOException {
>>>>
>>>>            // retrieve the value of proteinID, which is the row key
>>>> for each protein in the proteinTable
>>>>            ImmutableBytesWritable proteinID_Key = new
>>>> ImmutableBytesWritable(row.get());
>>>>            try {
>>>>                context.write(proteinID_Key, one);
>>>>            } catch (InterruptedException e) {
>>>>                throw new IOException(e);
>>>>            }
>>>>            numRecords++;
>>>>            if ((numRecords % 100) == 0) {
>>>>                context.setStatus("mapper processed " + numRecords + "
>>>> proteinTable records so far");
>>>>            }
>>>>        }
>>>>    }
>>>>
>>>>    public static class ProteinReducer1 extends
>>>> TableReducer<ImmutableBytesWritable,
>>>>                                               IntWritable,
>>>> ImmutableBytesWritable> {
>>>>
>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>> Iterable<IntWritable> values,
>>>>                            Context context)
>>>>            throws IOException, InterruptedException {
>>>>            int sum = 0;
>>>>            for (IntWritable val : values) {
>>>>                sum += val.get();
>>>>            }
>>>>
>>>>            Put put = new Put(proteinID_key.get());
>>>>            put.add(Bytes.toBytes("resultFields"),
>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>            System.out.println(String.format("stats : proteinID_key :
>>>> %d, count : %d",
>>>>
>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>            context.write(proteinID_key, put);
>>>>        }
>>>>    }
>>>>
>>>>    public static void main(String[] args) throws Exception {
>>>>
>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>           conf =
>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>
>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>
>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>
>>>>        String colFamilyToUse = "proteinFields";
>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>
>>>>        // retreive this one column from the specified family
>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>> Bytes.toBytes(fieldToUse));
>>>>
>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>> filterToUse =
>>>>                 new
>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>        scan.setFilter(filterToUse);
>>>>
>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>> ProteinMapper1.class,
>>>>                              ImmutableBytesWritable.class,
>>>>                                              IntWritable.class,
>>>> job);
>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>> ProteinReducer1.class, job);
>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>    }
>>>> }
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>>
>

Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Ok that looks good.  Sometimes when you successively build and chain
classpaths you can accidently overwrite the previous ones.  But we are
looking fine here.

What version of java is hadoop running under?  We are compiling our
HBase jars using java6, so that is another source of potential
incompatibilities...

Do you have any custom changes to any of the bin/* scripts in hadoop?

What else can you tell us about your environment?


On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
>
> Found it -
> http://pastebin.com/SfFYSLJy
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:50 PM
> To: Taylor, Ronald C
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still, normally those other jars are in another subdir so their full path should be:
> /home/hbase/hbase/lib/log4j-1.2.16.jar
>
> the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir layout too much.
>
> As for the pastebin you will need to send us your direct link, since so many people post and there isnt really good searching systems, its generally preferred to send the direct link to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps done as well.
>
> Thanks!
> -ryan
>
> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>> Ryan,
>>
>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>>
>> lor@h01 hbase]$ pwd
>> /home/hbase
>> [rtaylor@h01 hbase]$ ls -l
>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>> hbase-0.89.20100726
>> [rtaylor@h01 hbase]$
>>
>>
>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>>
>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>
>> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>>
>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>>  Ron
>>
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zooke
>> eper-3.3.1.jar
>>
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>> hbase-site.xml  log4j.properties                            tohtml.xsl
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>> /home/hbase/hbase/hbase-0.89.20100726.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>> /home/hbase/hbase/log4j-1.2.16.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>> /home/hbase/hbase/zookeeper-3.3.1.jar
>> [rtaylor@h01 conf]$
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Monday, September 20, 2010 1:17 PM
>> To: Taylor, Ronald C
>> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>>
>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>>
>> Good luck,
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>
>>> Hello Ryan, Dave, other developers,
>>>
>>> Have not fixed the problem. Here's where things stand:
>>>
>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:
>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>> j
>>> ar
>>>
>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>>
>>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>>
>>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>>
>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>>
>>> Obviously, we need more help.
>>>
>>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>>
>>> 1) extract data from the source Hbase table and store in an HDFS
>>> file, all data needed for analysis contained independently on each
>>> row - this task to be done by a non-MapReduce class that can access
>>> Hbase tables
>>>
>>> 2) call an MapReduce class that will process the file in parallel and
>>> return an new file (well, a directory of files which I'll combine
>>> into
>>> one) as output
>>>
>>> 3) write the contents of the new results file back into an Hbase
>>> table using another non-MapReduce class
>>>
>>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>>
>>> Does anybody have any advice?
>>>  Cheers,
>>>   Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> -----Original Message-----
>>> From: Buttler, David [mailto:buttler1@llnl.gov]
>>> Sent: Monday, September 20, 2010 10:17 AM
>>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: Taylor, Ronald C
>>> Sent: Sunday, September 19, 2010 9:59 PM
>>> To: 'Ryan Rawson'; user@hbase.apache.org;
>>> hbase-user@hadoop.apache.org
>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>> help, a class is apparently not being found by Hadoop
>>>
>>>
>>> Ryan,
>>>
>>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>>
>>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>>
>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>
>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>> dependency, June 11 2010) here
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>> (see below, where I've included the text)
>>>
>>> which appears to say that Hbase (at least *some* release of Hbase -
>>> does this include 0.89?) has a dependency on Guava, in order to run a
>>> MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>
>>>
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapr
>>> e
>>> duce/package-summary.html#classpath
>>>
>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>> or in how to set CLASSPATH or in what *.jar files to include so I can
>>> use MapReduce with Hbase; the best guidance I can find is in this
>>> earlier
>>> document.)
>>>
>>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>>
>>>  Regards,
>>>   Ron
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> From
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Why not?
>>>
>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>>
>>> ryan rawson commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Does this mean in general that we can't add more dependencies to the
>>> hbase client? I think instead we should make it easier to run hbase
>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>> clusters)
>>>
>>> stack commented on HBASE-2714:
>>> ------------------------------
>>>
>>> So, we need to change our recommendations here:
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>
>>>
>>>> Remove Guava as a client dependency
>>>> -----------------------------------
>>>>
>>>>                 Key: HBASE-2714
>>>>                 URL:
>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>             Project: HBase
>>>>          Issue Type: Improvement
>>>>          Components: client
>>>>            Reporter: Jeff Hammerbacher
>>>>
>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>> Sent: Sunday, September 19, 2010 12:45 AM
>>> To: user@hbase.apache.org
>>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> hey,
>>>
>>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>>
>>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>> program ProteinCounter1.java - shown in full below - reports out
>>>> this error
>>>>
>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>        at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809
>>>> )
>>>>
>>>> The full invocation and error msgs are shown at bottom.
>>>>
>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>>
>>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>>
>>>> But my test program for such, as you see from the error msg, is not
>>>> working. Apparently the
>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>  class is not found.
>>>>
>>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>>
>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>
>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>
>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>> and
>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>  is indeed present that Hbase *.jar file.
>>>>
>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>
>>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>>
>>>>   Regards,
>>>>     Ron T.
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: ronald.taylor@pnl.gov
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>> %
>>>> %%%%%%%%%%%%
>>>>
>>>> contents of the "ProteinCounter1.java" file:
>>>>
>>>>
>>>>
>>>> //  to compile
>>>> // javac ProteinCounter1.java
>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>
>>>> // to run
>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>
>>>>
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>> import org.apache.hadoop.mapreduce.Job; import
>>>> org.apache.hadoop.io.IntWritable;
>>>>
>>>> import java.util.*;
>>>> import java.io.*;
>>>> import org.apache.hadoop.hbase.*;
>>>> import org.apache.hadoop.hbase.client.*; import
>>>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>>>> import org.apache.hadoop.hbase.mapreduce.*;
>>>>
>>>>
>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> /**
>>>>  * counts the number of times each protein appears in the
>>>> proteinTable
>>>>  *
>>>>  */
>>>> public class ProteinCounter1 {
>>>>
>>>>
>>>>    static class ProteinMapper1 extends
>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>
>>>>        private int numRecords = 0;
>>>>        private static final IntWritable one = new IntWritable(1);
>>>>
>>>>        @Override
>>>>            public void map(ImmutableBytesWritable row, Result
>>>> values, Context context) throws IOException {
>>>>
>>>>            // retrieve the value of proteinID, which is the row key
>>>> for each protein in the proteinTable
>>>>            ImmutableBytesWritable proteinID_Key = new
>>>> ImmutableBytesWritable(row.get());
>>>>            try {
>>>>                context.write(proteinID_Key, one);
>>>>            } catch (InterruptedException e) {
>>>>                throw new IOException(e);
>>>>            }
>>>>            numRecords++;
>>>>            if ((numRecords % 100) == 0) {
>>>>                context.setStatus("mapper processed " + numRecords + "
>>>> proteinTable records so far");
>>>>            }
>>>>        }
>>>>    }
>>>>
>>>>    public static class ProteinReducer1 extends
>>>> TableReducer<ImmutableBytesWritable,
>>>>                                               IntWritable,
>>>> ImmutableBytesWritable> {
>>>>
>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>> Iterable<IntWritable> values,
>>>>                            Context context)
>>>>            throws IOException, InterruptedException {
>>>>            int sum = 0;
>>>>            for (IntWritable val : values) {
>>>>                sum += val.get();
>>>>            }
>>>>
>>>>            Put put = new Put(proteinID_key.get());
>>>>            put.add(Bytes.toBytes("resultFields"),
>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>            System.out.println(String.format("stats : proteinID_key :
>>>> %d, count : %d",
>>>>
>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>            context.write(proteinID_key, put);
>>>>        }
>>>>    }
>>>>
>>>>    public static void main(String[] args) throws Exception {
>>>>
>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>           conf =
>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>
>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>
>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>
>>>>        String colFamilyToUse = "proteinFields";
>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>
>>>>        // retreive this one column from the specified family
>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>> Bytes.toBytes(fieldToUse));
>>>>
>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>> filterToUse =
>>>>                 new
>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>        scan.setFilter(filterToUse);
>>>>
>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>> ProteinMapper1.class,
>>>>                              ImmutableBytesWritable.class,
>>>>                                              IntWritable.class,
>>>> job);
>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>> ProteinReducer1.class, job);
>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>    }
>>>> }
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>

Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.

Found it -
http://pastebin.com/SfFYSLJy


-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Monday, September 20, 2010 1:50 PM
To: Taylor, Ronald C
Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Hey,

yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still, normally those other jars are in another subdir so their full path should be:
/home/hbase/hbase/lib/log4j-1.2.16.jar

the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir layout too much.

As for the pastebin you will need to send us your direct link, since so many people post and there isnt really good searching systems, its generally preferred to send the direct link to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps done as well.

Thanks!
-ryan

On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
> Ryan,
>
> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>
> lor@h01 hbase]$ pwd
> /home/hbase
> [rtaylor@h01 hbase]$ ls -l
> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
> hbase-0.89.20100726
> [rtaylor@h01 hbase]$
>
>
> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>
> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>
> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>
> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>  Ron
>
>
> export
> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zooke
> eper-3.3.1.jar
>
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
> regionservers hbase-default_ORIG.xml     hbase-default.xml
> hbase-site.xml  log4j.properties                            tohtml.xsl
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
> /home/hbase/hbase/hbase-0.89.20100726.jar
> [rtaylor@h01 conf]$
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
> /home/hbase/hbase/log4j-1.2.16.jar
> [rtaylor@h01 conf]$
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
> /home/hbase/hbase/zookeeper-3.3.1.jar
> [rtaylor@h01 conf]$
>
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:17 PM
> To: Taylor, Ronald C
> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
> the hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>
> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>
> Good luck,
> -ryan
>
> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>
>> Hello Ryan, Dave, other developers,
>>
>> Have not fixed the problem. Here's where things stand:
>>
>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>> 2
>> 0100726.jar:
>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>> j
>> ar
>>
>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>
>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>
>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>
>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>
>> Obviously, we need more help.
>>
>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>
>> 1) extract data from the source Hbase table and store in an HDFS
>> file, all data needed for analysis contained independently on each
>> row - this task to be done by a non-MapReduce class that can access
>> Hbase tables
>>
>> 2) call an MapReduce class that will process the file in parallel and
>> return an new file (well, a directory of files which I'll combine
>> into
>> one) as output
>>
>> 3) write the contents of the new results file back into an Hbase
>> table using another non-MapReduce class
>>
>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>
>> Does anybody have any advice?
>>  Cheers,
>>   Ron
>>
>> ___________________________________________
>> Ronald Taylor, Ph.D.
>> Computational Biology & Bioinformatics Group Pacific Northwest
>> National Laboratory
>> 902 Battelle Boulevard
>> P.O. Box 999, Mail Stop J4-33
>> Richland, WA  99352 USA
>> Office:  509-372-6568
>> Email: ronald.taylor@pnl.gov
>>
>>
>> -----Original Message-----
>> From: Buttler, David [mailto:buttler1@llnl.gov]
>> Sent: Monday, September 20, 2010 10:17 AM
>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>> apparently not being found by Hadoop
>>
>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>
>> Dave
>>
>> -----Original Message-----
>> From: Taylor, Ronald C
>> Sent: Sunday, September 19, 2010 9:59 PM
>> To: 'Ryan Rawson'; user@hbase.apache.org;
>> hbase-user@hadoop.apache.org
>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>> help, a class is apparently not being found by Hadoop
>>
>>
>> Ryan,
>>
>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>
>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>
>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>  http://code.google.com/p/guava-libraries/downloads/list
>>
>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>> dependency, June 11 2010) here
>>
>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>> (see below, where I've included the text)
>>
>> which appears to say that Hbase (at least *some* release of Hbase -
>> does this include 0.89?) has a dependency on Guava, in order to run a
>> MapReduce job over Hbase. But nothing on Guava is mentioned at
>>
>>
>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapr
>> e
>> duce/package-summary.html#classpath
>>
>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>> or in how to set CLASSPATH or in what *.jar files to include so I can
>> use MapReduce with Hbase; the best guidance I can find is in this
>> earlier
>> document.)
>>
>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>
>>  Regards,
>>   Ron
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%
>>
>> From
>>
>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>
>>
>> Todd Lipcon commented on HBASE-2714:
>> ------------------------------------
>>
>> Why not?
>>
>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>
>> ryan rawson commented on HBASE-2714:
>> ------------------------------------
>>
>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>
>> Todd Lipcon commented on HBASE-2714:
>> ------------------------------------
>>
>> Does this mean in general that we can't add more dependencies to the
>> hbase client? I think instead we should make it easier to run hbase
>> MR jobs *without* touching the Hadoop config (eg right now you have
>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>> clusters)
>>
>> stack commented on HBASE-2714:
>> ------------------------------
>>
>> So, we need to change our recommendations here:
>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>
>>
>>> Remove Guava as a client dependency
>>> -----------------------------------
>>>
>>>                 Key: HBASE-2714
>>>                 URL:
>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>             Project: HBase
>>>          Issue Type: Improvement
>>>          Components: client
>>>            Reporter: Jeff Hammerbacher
>>>
>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Sunday, September 19, 2010 12:45 AM
>> To: user@hbase.apache.org
>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>> apparently not being found by Hadoop
>>
>> hey,
>>
>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>
>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>
>> -ryan
>>
>>
>>
>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>
>>> Hi folks,
>>>
>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>> program ProteinCounter1.java - shown in full below - reports out
>>> this error
>>>
>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809
>>> )
>>>
>>> The full invocation and error msgs are shown at bottom.
>>>
>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>
>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>
>>> But my test program for such, as you see from the error msg, is not
>>> working. Apparently the
>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>  class is not found.
>>>
>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>
>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>
>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>
>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>> and
>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>  is indeed present that Hbase *.jar file.
>>>
>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>
>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>
>>>   Regards,
>>>     Ron T.
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>> %
>>> %
>>> %%%%%%%%%%%%
>>>
>>> contents of the "ProteinCounter1.java" file:
>>>
>>>
>>>
>>> //  to compile
>>> // javac ProteinCounter1.java
>>> // jar cf ProteinCounterTest.jar  *.class
>>>
>>> // to run
>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>
>>>
>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>> import org.apache.hadoop.mapreduce.Job; import
>>> org.apache.hadoop.io.IntWritable;
>>>
>>> import java.util.*;
>>> import java.io.*;
>>> import org.apache.hadoop.hbase.*;
>>> import org.apache.hadoop.hbase.client.*; import
>>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>>> import org.apache.hadoop.hbase.mapreduce.*;
>>>
>>>
>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> /**
>>>  * counts the number of times each protein appears in the
>>> proteinTable
>>>  *
>>>  */
>>> public class ProteinCounter1 {
>>>
>>>
>>>    static class ProteinMapper1 extends
>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>
>>>        private int numRecords = 0;
>>>        private static final IntWritable one = new IntWritable(1);
>>>
>>>        @Override
>>>            public void map(ImmutableBytesWritable row, Result
>>> values, Context context) throws IOException {
>>>
>>>            // retrieve the value of proteinID, which is the row key
>>> for each protein in the proteinTable
>>>            ImmutableBytesWritable proteinID_Key = new
>>> ImmutableBytesWritable(row.get());
>>>            try {
>>>                context.write(proteinID_Key, one);
>>>            } catch (InterruptedException e) {
>>>                throw new IOException(e);
>>>            }
>>>            numRecords++;
>>>            if ((numRecords % 100) == 0) {
>>>                context.setStatus("mapper processed " + numRecords + "
>>> proteinTable records so far");
>>>            }
>>>        }
>>>    }
>>>
>>>    public static class ProteinReducer1 extends
>>> TableReducer<ImmutableBytesWritable,
>>>                                               IntWritable,
>>> ImmutableBytesWritable> {
>>>
>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>> Iterable<IntWritable> values,
>>>                            Context context)
>>>            throws IOException, InterruptedException {
>>>            int sum = 0;
>>>            for (IntWritable val : values) {
>>>                sum += val.get();
>>>            }
>>>
>>>            Put put = new Put(proteinID_key.get());
>>>            put.add(Bytes.toBytes("resultFields"),
>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>            System.out.println(String.format("stats : proteinID_key :
>>> %d, count : %d",
>>>
>>> Bytes.toInt(proteinID_key.get()), sum));
>>>            context.write(proteinID_key, put);
>>>        }
>>>    }
>>>
>>>    public static void main(String[] args) throws Exception {
>>>
>>>        org.apache.hadoop.conf.Configuration conf;
>>>           conf =
>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>
>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>        job.setJarByClass(ProteinCounter1.class);
>>>
>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>
>>>        String colFamilyToUse = "proteinFields";
>>>        String fieldToUse = "Protein_Ref_ID";
>>>
>>>        // retreive this one column from the specified family
>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>> Bytes.toBytes(fieldToUse));
>>>
>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>> filterToUse =
>>>                 new
>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>        scan.setFilter(filterToUse);
>>>
>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>> ProteinMapper1.class,
>>>                              ImmutableBytesWritable.class,
>>>                                              IntWritable.class,
>>> job);
>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>> ProteinReducer1.class, job);
>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>    }
>>> }
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>

Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

yes, the symlink is a pretty good way to be able to inplace upgrade
easily.  But still, normally those other jars are in another subdir so
their full path should be:
/home/hbase/hbase/lib/log4j-1.2.16.jar

the hbase scripts rely on those paths to build the classpath, so dont
rearrange the dir layout too much.

As for the pastebin you will need to send us your direct link, since
so many people post and there isnt really good searching systems, its
generally preferred to send the direct link to your pastebin.  If you
ever interact with us on IRC this is also how we get big dumps done as
well.

Thanks!
-ryan

On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
> Ryan,
>
> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?
>
> lor@h01 hbase]$ pwd
> /home/hbase
> [rtaylor@h01 hbase]$ ls -l
> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase -> hbase-0.89.20100726
> drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54 hbase-0.89.20100726
> [rtaylor@h01 hbase]$
>
>
> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:
>
> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>
> This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.
>
> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
>  Ron
>
>
> export HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.jar
>
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
> hadoop-metrics.properties  hbase-default_with_RT_mods.xml  hbase-env.sh    hbase-site.xml.psuedo-distributed.template  regionservers
> hbase-default_ORIG.xml     hbase-default.xml               hbase-site.xml  log4j.properties                            tohtml.xsl
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
> /home/hbase/hbase/hbase-0.89.20100726.jar
> [rtaylor@h01 conf]$
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
> /home/hbase/hbase/log4j-1.2.16.jar
> [rtaylor@h01 conf]$
>
> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
> /home/hbase/hbase/zookeeper-3.3.1.jar
> [rtaylor@h01 conf]$
>
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:17 PM
> To: Taylor, Ronald C
> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org; buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.
>
> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?
>
> Good luck,
> -ryan
>
> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>
>> Hello Ryan, Dave, other developers,
>>
>> Have not fixed the problem. Here's where things stand:
>>
>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
>> 0100726.jar:
>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.j
>> ar
>>
>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>>
>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>>
>> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>>
>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>>
>> Obviously, we need more help.
>>
>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>>
>> 1) extract data from the source Hbase table and store in an HDFS file,
>> all data needed for analysis contained independently on each row -
>> this task to be done by a non-MapReduce class that can access Hbase
>> tables
>>
>> 2) call an MapReduce class that will process the file in parallel and
>> return an new file (well, a directory of files which I'll combine into
>> one) as output
>>
>> 3) write the contents of the new results file back into an Hbase table
>> using another non-MapReduce class
>>
>> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>
>> Does anybody have any advice?
>>  Cheers,
>>   Ron
>>
>> ___________________________________________
>> Ronald Taylor, Ph.D.
>> Computational Biology & Bioinformatics Group Pacific Northwest
>> National Laboratory
>> 902 Battelle Boulevard
>> P.O. Box 999, Mail Stop J4-33
>> Richland, WA  99352 USA
>> Office:  509-372-6568
>> Email: ronald.taylor@pnl.gov
>>
>>
>> -----Original Message-----
>> From: Buttler, David [mailto:buttler1@llnl.gov]
>> Sent: Monday, September 20, 2010 10:17 AM
>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>> apparently not being found by Hadoop
>>
>> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
>> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>>
>> Dave
>>
>> -----Original Message-----
>> From: Taylor, Ronald C
>> Sent: Sunday, September 19, 2010 9:59 PM
>> To: 'Ryan Rawson'; user@hbase.apache.org; hbase-user@hadoop.apache.org
>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>> help, a class is apparently not being found by Hadoop
>>
>>
>> Ryan,
>>
>> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>>
>> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>>
>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>>  http://code.google.com/p/guava-libraries/downloads/list
>>
>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>> dependency, June 11 2010) here
>>
>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>> (see below, where I've included the text)
>>
>> which appears to say that Hbase (at least *some* release of Hbase -
>> does this include 0.89?) has a dependency on Guava, in order to run a
>> MapReduce job over Hbase. But nothing on Guava is mentioned at
>>
>>
>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapre
>> duce/package-summary.html#classpath
>>
>> (I cannot find anything in the Hbase 0.89 online documents on Guava or
>> in how to set CLASSPATH or in what *.jar files to include so I can use
>> MapReduce with Hbase; the best guidance I can find is in this earlier
>> document.)
>>
>> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>>
>>  Regards,
>>   Ron
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%
>>
>> From
>>
>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>
>>
>> Todd Lipcon commented on HBASE-2714:
>> ------------------------------------
>>
>> Why not?
>>
>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>>
>> ryan rawson commented on HBASE-2714:
>> ------------------------------------
>>
>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>>
>> Todd Lipcon commented on HBASE-2714:
>> ------------------------------------
>>
>> Does this mean in general that we can't add more dependencies to the
>> hbase client? I think instead we should make it easier to run hbase MR
>> jobs *without* touching the Hadoop config (eg right now you have to
>> restart MR to upgrade hbase, that's not going to fly for a lot of
>> clusters)
>>
>> stack commented on HBASE-2714:
>> ------------------------------
>>
>> So, we need to change our recommendations here:
>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>
>>
>>> Remove Guava as a client dependency
>>> -----------------------------------
>>>
>>>                 Key: HBASE-2714
>>>                 URL: https://issues.apache.org/jira/browse/HBASE-2714
>>>             Project: HBase
>>>          Issue Type: Improvement
>>>          Components: client
>>>            Reporter: Jeff Hammerbacher
>>>
>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Sunday, September 19, 2010 12:45 AM
>> To: user@hbase.apache.org
>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>> apparently not being found by Hadoop
>>
>> hey,
>>
>> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>>
>> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>>
>> -ryan
>>
>>
>>
>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>>
>>> Hi folks,
>>>
>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>> program ProteinCounter1.java - shown in full below - reports out this
>>> error
>>>
>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>>
>>> The full invocation and error msgs are shown at bottom.
>>>
>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>>
>>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>>
>>> But my test program for such, as you see from the error msg, is not
>>> working. Apparently the
>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>  class is not found.
>>>
>>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>
>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>
>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>
>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>> and
>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>  is indeed present that Hbase *.jar file.
>>>
>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>
>>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>>
>>>   Regards,
>>>     Ron T.
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>> %
>>> %%%%%%%%%%%%
>>>
>>> contents of the "ProteinCounter1.java" file:
>>>
>>>
>>>
>>> //  to compile
>>> // javac ProteinCounter1.java
>>> // jar cf ProteinCounterTest.jar  *.class
>>>
>>> // to run
>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>
>>>
>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>> import org.apache.hadoop.mapreduce.Job; import
>>> org.apache.hadoop.io.IntWritable;
>>>
>>> import java.util.*;
>>> import java.io.*;
>>> import org.apache.hadoop.hbase.*;
>>> import org.apache.hadoop.hbase.client.*; import
>>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>>> import org.apache.hadoop.hbase.mapreduce.*;
>>>
>>>
>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> /**
>>>  * counts the number of times each protein appears in the
>>> proteinTable
>>>  *
>>>  */
>>> public class ProteinCounter1 {
>>>
>>>
>>>    static class ProteinMapper1 extends
>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>
>>>        private int numRecords = 0;
>>>        private static final IntWritable one = new IntWritable(1);
>>>
>>>        @Override
>>>            public void map(ImmutableBytesWritable row, Result values,
>>> Context context) throws IOException {
>>>
>>>            // retrieve the value of proteinID, which is the row key
>>> for each protein in the proteinTable
>>>            ImmutableBytesWritable proteinID_Key = new
>>> ImmutableBytesWritable(row.get());
>>>            try {
>>>                context.write(proteinID_Key, one);
>>>            } catch (InterruptedException e) {
>>>                throw new IOException(e);
>>>            }
>>>            numRecords++;
>>>            if ((numRecords % 100) == 0) {
>>>                context.setStatus("mapper processed " + numRecords + "
>>> proteinTable records so far");
>>>            }
>>>        }
>>>    }
>>>
>>>    public static class ProteinReducer1 extends
>>> TableReducer<ImmutableBytesWritable,
>>>                                               IntWritable,
>>> ImmutableBytesWritable> {
>>>
>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>> Iterable<IntWritable> values,
>>>                            Context context)
>>>            throws IOException, InterruptedException {
>>>            int sum = 0;
>>>            for (IntWritable val : values) {
>>>                sum += val.get();
>>>            }
>>>
>>>            Put put = new Put(proteinID_key.get());
>>>            put.add(Bytes.toBytes("resultFields"),
>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>            System.out.println(String.format("stats : proteinID_key :
>>> %d, count : %d",
>>>
>>> Bytes.toInt(proteinID_key.get()), sum));
>>>            context.write(proteinID_key, put);
>>>        }
>>>    }
>>>
>>>    public static void main(String[] args) throws Exception {
>>>
>>>        org.apache.hadoop.conf.Configuration conf;
>>>           conf = org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>
>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>        job.setJarByClass(ProteinCounter1.class);
>>>
>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>
>>>        String colFamilyToUse = "proteinFields";
>>>        String fieldToUse = "Protein_Ref_ID";
>>>
>>>        // retreive this one column from the specified family
>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>> Bytes.toBytes(fieldToUse));
>>>
>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>> filterToUse =
>>>                 new
>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>        scan.setFilter(filterToUse);
>>>
>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>> ProteinMapper1.class,
>>>                              ImmutableBytesWritable.class,
>>>                                              IntWritable.class, job);
>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>> ProteinReducer1.class, job);
>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>    }
>>> }
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>> %
>>> %%%%%%%
>>>
>>>
>>> session output:
>>>
>>> [rtaylor@h01 Hadoop]$ javac ProteinCounter1.java
>>>
>>> [rtaylor@h01 Hadoop]$ jar cf ProteinCounterTest.jar  *.class
>>>
>>> [rtaylor@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar
>>> ProteinCounter1
>>>
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
>>> zookeeper
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14
>>> GMT
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:host.name=h01.emsl.pnl.gov
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_21
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/java/jdk1.6.0_21/jre
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java
>>> /
>>> default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/
>>> b
>>> in/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-c
>>> l
>>> i-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/
>>> h
>>> adoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/..
>>> /
>>> lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/commo
>>> n
>>> s-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-ap
>>> i
>>> -1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home
>>> /
>>> hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../li
>>> b
>>> /hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.
>>> 5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/ho
>>> m
>>> e/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/..
>>> /lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.1
>>> 4
>>> .jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hado
>>> o
>>> p/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.1
>>> 5
>>> .jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoo
>>> p
>>> /hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servl
>>> e
>>> t-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.j
>>> a
>>> r:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop
>>> /
>>> hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-
>>> 2
>>> .1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:
>>> /home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/hom
>>> e
>>> /rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeep
>>> e
>>> r-3.3.1.jar
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/L
>>> i
>>> nux-i386-32
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/tmp
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:os.name=Linux
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:os.arch=i386
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:os.version=2.6.18-194.11.1.el5
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:user.name=rtaylor
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/rtaylor
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>>> environment:user.dir=/home/rtaylor/HadoopWork/Hadoop
>>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client
>>> connection,
>>> connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h
>>> 0
>>> 8:2182,h07:2182,h06:2182 sessionTimeout=60000
>>> watcher=org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper@dcb03b
>>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket
>>> connection to server h04/192.168.200.24:2182
>>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection
>>> established to h04/192.168.200.24:2182, initiating session
>>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment
>>> complete on server h04/192.168.200.24:2182, sessionid =
>>> 0x22b21c04c330002, negotiated timeout = 60000
>>> 10/09/17 15:46:20 INFO mapred.JobClient: Running job:
>>> job_201009171510_0004
>>> 10/09/17 15:46:21 INFO mapred.JobClient:  map 0% reduce 0%
>>>
>>> 10/09/17 15:46:27 INFO mapred.JobClient: Task Id :
>>> attempt_201009171510_0004_m_000002_0, Status : FAILED
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>>        at
>>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContex
>>> t
>>> .java:193)
>>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>        at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>        at java.lang.Class.forName0(Native Method)
>>>        at java.lang.Class.forName(Class.java:247)
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.jav
>>> a
>>> :762)
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>>        ... 4 more
>>>
>>> 10/09/17 15:46:33 INFO mapred.JobClient: Task Id :
>>> attempt_201009171510_0004_r_000051_0, Status : FAILED
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>>        at
>>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContex
>>> t
>>> .java:193)
>>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>        at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>        at java.lang.Class.forName0(Native Method)
>>>        at java.lang.Class.forName(Class.java:247)
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.jav
>>> a
>>> :762)
>>>        at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>>        ... 4 more
>>>
>>> I terminated the program here via <Control><C>, since the error msgs were simply repeating.
>>>
>>> [rtaylor@h01 Hadoop]$
>>>
>>
>

RE: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.
Ryan,

The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is symbolic link on all the nodes (as you can see below), but that should not matter, right?

lor@h01 hbase]$ pwd
/home/hbase
[rtaylor@h01 hbase]$ ls -l
lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase -> hbase-0.89.20100726
drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54 hbase-0.89.20100726
[rtaylor@h01 hbase]$


Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted it under the title:

"Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"

This is the first time I've used pastebin.com, so hopefully I uploaded properly. Please let me know if not.

I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified file existence based on those spellings, see below "ls" listings), but very happy to have an expert take a look.
 Ron


export HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.jar


[rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
hadoop-metrics.properties  hbase-default_with_RT_mods.xml  hbase-env.sh    hbase-site.xml.psuedo-distributed.template  regionservers
hbase-default_ORIG.xml     hbase-default.xml               hbase-site.xml  log4j.properties                            tohtml.xsl

[rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
/home/hbase/hbase/hbase-0.89.20100726.jar
[rtaylor@h01 conf]$

[rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
/home/hbase/hbase/log4j-1.2.16.jar
[rtaylor@h01 conf]$

[rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
/home/hbase/hbase/zookeeper-3.3.1.jar
[rtaylor@h01 conf]$



-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Monday, September 20, 2010 1:17 PM
To: Taylor, Ronald C
Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org; buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Hey,

If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That would help... sometimes I have made errors in the bash shell trickery, and it probably would help to get more eyes checking it out.

Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the other jars in the lib/ sub directory, am I correct to assume you've moved the jars around a bit?

Good luck,
-ryan

On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
> Hello Ryan, Dave, other developers,
>
> Have not fixed the problem. Here's where things stand:
>
> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>
> export
> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
> 0100726.jar:
> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.j
> ar
>
> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>
> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>
> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>
> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>
> Obviously, we need more help.
>
> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>
> 1) extract data from the source Hbase table and store in an HDFS file,
> all data needed for analysis contained independently on each row -
> this task to be done by a non-MapReduce class that can access Hbase
> tables
>
> 2) call an MapReduce class that will process the file in parallel and
> return an new file (well, a directory of files which I'll combine into
> one) as output
>
> 3) write the contents of the new results file back into an Hbase table
> using another non-MapReduce class
>
> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>
> Does anybody have any advice?
>  Cheers,
>   Ron
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group Pacific Northwest
> National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, Mail Stop J4-33
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
>
>
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Monday, September 20, 2010 10:17 AM
> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
> Subject: RE: hadoop-hbase failure - could use some help, a class is
> apparently not being found by Hadoop
>
> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>
> Dave
>
> -----Original Message-----
> From: Taylor, Ronald C
> Sent: Sunday, September 19, 2010 9:59 PM
> To: 'Ryan Rawson'; user@hbase.apache.org; hbase-user@hadoop.apache.org
> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
> help, a class is apparently not being found by Hadoop
>
>
> Ryan,
>
> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>
> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>
> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>  http://code.google.com/p/guava-libraries/downloads/list
>
> I Googled and found issue HBASE-2714 (Remove Guava as a client
> dependency, June 11 2010) here
>
> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
> (see below, where I've included the text)
>
> which appears to say that Hbase (at least *some* release of Hbase -
> does this include 0.89?) has a dependency on Guava, in order to run a
> MapReduce job over Hbase. But nothing on Guava is mentioned at
>
>
> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapre
> duce/package-summary.html#classpath
>
> (I cannot find anything in the Hbase 0.89 online documents on Guava or
> in how to set CLASSPATH or in what *.jar files to include so I can use
> MapReduce with Hbase; the best guidance I can find is in this earlier
> document.)
>
> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>
>  Regards,
>   Ron
>
> %%%%%%%%%%%%%%%%%%%%%%%%
>
> From
>
> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>
>
> Todd Lipcon commented on HBASE-2714:
> ------------------------------------
>
> Why not?
>
> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>
> ryan rawson commented on HBASE-2714:
> ------------------------------------
>
> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>
> Todd Lipcon commented on HBASE-2714:
> ------------------------------------
>
> Does this mean in general that we can't add more dependencies to the
> hbase client? I think instead we should make it easier to run hbase MR
> jobs *without* touching the Hadoop config (eg right now you have to
> restart MR to upgrade hbase, that's not going to fly for a lot of
> clusters)
>
> stack commented on HBASE-2714:
> ------------------------------
>
> So, we need to change our recommendations here:
> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>
>
>> Remove Guava as a client dependency
>> -----------------------------------
>>
>>                 Key: HBASE-2714
>>                 URL: https://issues.apache.org/jira/browse/HBASE-2714
>>             Project: HBase
>>          Issue Type: Improvement
>>          Components: client
>>            Reporter: Jeff Hammerbacher
>>
>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%
>
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Sunday, September 19, 2010 12:45 AM
> To: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
> Subject: Re: hadoop-hbase failure - could use some help, a class is
> apparently not being found by Hadoop
>
> hey,
>
> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>
> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>
> -ryan
>
>
>
> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>
>> Hi folks,
>>
>> Got a problem in basic Hadoop-Hbase communication. My small test
>> program ProteinCounter1.java - shown in full below - reports out this
>> error
>>
>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>
>> The full invocation and error msgs are shown at bottom.
>>
>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>
>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>
>> But my test program for such, as you see from the error msg, is not
>> working. Apparently the
>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>  class is not found.
>>
>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>
>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>
>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>
>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>> and
>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>  is indeed present that Hbase *.jar file.
>>
>> Also, I have restarted both Hbase and Hadoop after making this change.
>>
>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>
>>   Regards,
>>     Ron T.
>>
>> ___________________________________________
>> Ronald Taylor, Ph.D.
>> Computational Biology & Bioinformatics Group Pacific Northwest
>> National Laboratory
>> 902 Battelle Boulevard
>> P.O. Box 999, Mail Stop J4-33
>> Richland, WA  99352 USA
>> Office:  509-372-6568
>> Email: ronald.taylor@pnl.gov
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %
>> %%%%%%%%%%%%
>>
>> contents of the "ProteinCounter1.java" file:
>>
>>
>>
>> //  to compile
>> // javac ProteinCounter1.java
>> // jar cf ProteinCounterTest.jar  *.class
>>
>> // to run
>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>
>>
>> import org.apache.hadoop.hbase.HBaseConfiguration;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>> import org.apache.hadoop.mapreduce.Job; import
>> org.apache.hadoop.io.IntWritable;
>>
>> import java.util.*;
>> import java.io.*;
>> import org.apache.hadoop.hbase.*;
>> import org.apache.hadoop.hbase.client.*; import
>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>> import org.apache.hadoop.hbase.mapreduce.*;
>>
>>
>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>> /**
>>  * counts the number of times each protein appears in the
>> proteinTable
>>  *
>>  */
>> public class ProteinCounter1 {
>>
>>
>>    static class ProteinMapper1 extends
>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>
>>        private int numRecords = 0;
>>        private static final IntWritable one = new IntWritable(1);
>>
>>        @Override
>>            public void map(ImmutableBytesWritable row, Result values,
>> Context context) throws IOException {
>>
>>            // retrieve the value of proteinID, which is the row key
>> for each protein in the proteinTable
>>            ImmutableBytesWritable proteinID_Key = new
>> ImmutableBytesWritable(row.get());
>>            try {
>>                context.write(proteinID_Key, one);
>>            } catch (InterruptedException e) {
>>                throw new IOException(e);
>>            }
>>            numRecords++;
>>            if ((numRecords % 100) == 0) {
>>                context.setStatus("mapper processed " + numRecords + "
>> proteinTable records so far");
>>            }
>>        }
>>    }
>>
>>    public static class ProteinReducer1 extends
>> TableReducer<ImmutableBytesWritable,
>>                                               IntWritable,
>> ImmutableBytesWritable> {
>>
>>        public void reduce(ImmutableBytesWritable proteinID_key,
>> Iterable<IntWritable> values,
>>                            Context context)
>>            throws IOException, InterruptedException {
>>            int sum = 0;
>>            for (IntWritable val : values) {
>>                sum += val.get();
>>            }
>>
>>            Put put = new Put(proteinID_key.get());
>>            put.add(Bytes.toBytes("resultFields"),
>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>            System.out.println(String.format("stats : proteinID_key :
>> %d, count : %d",
>>
>> Bytes.toInt(proteinID_key.get()), sum));
>>            context.write(proteinID_key, put);
>>        }
>>    }
>>
>>    public static void main(String[] args) throws Exception {
>>
>>        org.apache.hadoop.conf.Configuration conf;
>>           conf = org.apache.hadoop.hbase.HBaseConfiguration.create();
>>
>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>        job.setJarByClass(ProteinCounter1.class);
>>
>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>
>>        String colFamilyToUse = "proteinFields";
>>        String fieldToUse = "Protein_Ref_ID";
>>
>>        // retreive this one column from the specified family
>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>> Bytes.toBytes(fieldToUse));
>>
>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>> filterToUse =
>>                 new
>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>        scan.setFilter(filterToUse);
>>
>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>> ProteinMapper1.class,
>>                              ImmutableBytesWritable.class,
>>                                              IntWritable.class, job);
>>        TableMapReduceUtil.initTableReducerJob("testTable",
>> ProteinReducer1.class, job);
>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>    }
>> }
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %
>> %%%%%%%
>>
>>
>> session output:
>>
>> [rtaylor@h01 Hadoop]$ javac ProteinCounter1.java
>>
>> [rtaylor@h01 Hadoop]$ jar cf ProteinCounterTest.jar  *.class
>>
>> [rtaylor@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar
>> ProteinCounter1
>>
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
>> zookeeper
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14
>> GMT
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:host.name=h01.emsl.pnl.gov
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_21
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.home=/usr/java/jdk1.6.0_21/jre
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java
>> /
>> default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/
>> b
>> in/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-c
>> l
>> i-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/
>> h
>> adoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/..
>> /
>> lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/commo
>> n
>> s-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-ap
>> i
>> -1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home
>> /
>> hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../li
>> b
>> /hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.
>> 5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/ho
>> m
>> e/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/..
>> /lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.1
>> 4
>> .jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hado
>> o
>> p/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.1
>> 5
>> .jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoo
>> p
>> /hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servl
>> e
>> t-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.j
>> a
>> r:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop
>> /
>> hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-
>> 2
>> .1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:
>> /home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/hom
>> e
>> /rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeep
>> e
>> r-3.3.1.jar
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/L
>> i
>> nux-i386-32
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/tmp
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.arch=i386
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.18-194.11.1.el5
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.name=rtaylor
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.home=/home/rtaylor
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.dir=/home/rtaylor/HadoopWork/Hadoop
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client
>> connection,
>> connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h
>> 0
>> 8:2182,h07:2182,h06:2182 sessionTimeout=60000
>> watcher=org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper@dcb03b
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket
>> connection to server h04/192.168.200.24:2182
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection
>> established to h04/192.168.200.24:2182, initiating session
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server h04/192.168.200.24:2182, sessionid =
>> 0x22b21c04c330002, negotiated timeout = 60000
>> 10/09/17 15:46:20 INFO mapred.JobClient: Running job:
>> job_201009171510_0004
>> 10/09/17 15:46:21 INFO mapred.JobClient:  map 0% reduce 0%
>>
>> 10/09/17 15:46:27 INFO mapred.JobClient: Task Id :
>> attempt_201009171510_0004_m_000002_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>        at
>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContex
>> t
>> .java:193)
>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.jav
>> a
>> :762)
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>        ... 4 more
>>
>> 10/09/17 15:46:33 INFO mapred.JobClient: Task Id :
>> attempt_201009171510_0004_r_000051_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>        at
>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContex
>> t
>> .java:193)
>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>        at
>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.jav
>> a
>> :762)
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>        ... 4 more
>>
>> I terminated the program here via <Control><C>, since the error msgs were simply repeating.
>>
>> [rtaylor@h01 Hadoop]$
>>
>

Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

If you could, perhaps you could paste up your hadoop-env.sh on
pastebin.com?  That would help... sometimes I have made errors in the
bash shell trickery, and it probably would help to get more eyes
checking it out.

Normally in the stock hbase distro the Hbase JAR is in the root hbase
dir, and the other jars in the lib/ sub directory, am I correct to
assume you've moved the jars around a bit?

Good luck,
-ryan

On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
> Hello Ryan, Dave, other developers,
>
> Have not fixed the problem. Here's where things stand:
>
> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>
> export HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar: /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.jar
>
> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase again. No change - program still fails on not finding  the TableOutputFormat class.
>
> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat (by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but changed it to output to a file, instead of an Hbase table.
>
> That did not work either. I tried running the new program from the hadoop acct and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper class cannot be found. So - it is not just TableOutputFormat class - it is all the classes in the hbase*.jar file that are not being found.
>
> Does this have anything to do with the guava*.jar file that Ryan mentioned, which (as far as I can tell) we don't have installed?
>
> Obviously, we need more help.
>
> In the meantime, as a stop-gap, I'm planning on writing our analysis programs this way:
>
> 1) extract data from the source Hbase table and store in an HDFS file, all data needed for analysis contained independently on each row - this task to be done by a non-MapReduce class that can access Hbase tables
>
> 2) call an MapReduce class that will process the file in parallel and return an new file (well, a directory of files which I'll combine into one) as output
>
> 3) write the contents of the new results file back into an Hbase table using another non-MapReduce class
>
> I presume this will work, but again, obviously, it's not optimal and we need to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>
> Does anybody have any advice?
>  Cheers,
>   Ron
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, Mail Stop J4-33
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
>
>
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Monday, September 20, 2010 10:17 AM
> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
> Subject: RE: hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop
>
> I find it is often faster to skip the reduce phase when updating rows in hbase.  (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase.
> The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
>
> Dave
>
> -----Original Message-----
> From: Taylor, Ronald C
> Sent: Sunday, September 19, 2010 9:59 PM
> To: 'Ryan Rawson'; user@hbase.apache.org; hbase-user@hadoop.apache.org
> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop
>
>
> Ryan,
>
> Thanks for the quick feedback. I will check the other nodes on the cluster to see if they have been properly updated.
>
> However, I am now really confused as to use of the guava*.jar file that you talk about. This is the first time I've heard about this. I presume we are talking about a jar file packaging the guava libraries from Google?
>
> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other *.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should I now download it - since we appear to be missing it - from here?
>  http://code.google.com/p/guava-libraries/downloads/list
>
> I Googled and found issue HBASE-2714 (Remove Guava as a client dependency, June 11 2010) here
>
> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html  (see below, where I've included the text)
>
> which appears to say that Hbase (at least *some* release of Hbase - does this include 0.89?) has a dependency on Guava, in order to run a MapReduce job over Hbase. But nothing on Guava is mentioned at
>
>   http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
>
> (I cannot find anything in the Hbase 0.89 online documents on Guava or in how to set CLASSPATH or in what *.jar files to include so I can use MapReduce with Hbase; the best guidance I can find is in this earlier document.)
>
> So - I could really use further clarification in regard to Guava as to what I should be doing to set up Hbase-MapReduce work.
>
>  Regards,
>   Ron
>
> %%%%%%%%%%%%%%%%%%%%%%%%
>
> From
>
> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>
>
> Todd Lipcon commented on HBASE-2714:
> ------------------------------------
>
> Why not?
>
> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping it in the distributedcache. Apparently it's not working?
>
> ryan rawson commented on HBASE-2714:
> ------------------------------------
>
> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus not requiring every job include the hbase jars.
>
> Todd Lipcon commented on HBASE-2714:
> ------------------------------------
>
> Does this mean in general that we can't add more dependencies to the hbase client? I think instead we should make it easier to run hbase MR jobs *without* touching the Hadoop config (eg right now you have to restart MR to upgrade hbase, that's not going to fly for a lot of clusters)
>
> stack commented on HBASE-2714:
> ------------------------------
>
> So, we need to change our recommendations here:
> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>
>
>> Remove Guava as a client dependency
>> -----------------------------------
>>
>>                 Key: HBASE-2714
>>                 URL: https://issues.apache.org/jira/browse/HBASE-2714
>>             Project: HBase
>>          Issue Type: Improvement
>>          Components: client
>>            Reporter: Jeff Hammerbacher
>>
>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%
>
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Sunday, September 19, 2010 12:45 AM
> To: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
> Subject: Re: hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop
>
> hey,
>
> looks like you've done all the right things... you might want to double check that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced therein is present _on all the machines_.
>
> You also need to include the guava*.jar as well.  the log4j is already included by mapred by default, so no need there.
>
> -ryan
>
>
>
> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>>
>> Hi folks,
>>
>> Got a problem in basic Hadoop-Hbase communication. My small test
>> program ProteinCounter1.java - shown in full below - reports out this
>> error
>>
>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>
>> The full invocation and error msgs are shown at bottom.
>>
>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly.
>>
>> Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table.
>>
>> But my test program for such, as you see from the error msg, is not
>> working. Apparently the
>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>  class is not found.
>>
>> However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>
>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>
>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>
>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>> and
>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>  is indeed present that Hbase *.jar file.
>>
>> Also, I have restarted both Hbase and Hadoop after making this change.
>>
>> Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon.
>>
>>   Regards,
>>     Ron T.
>>
>> ___________________________________________
>> Ronald Taylor, Ph.D.
>> Computational Biology & Bioinformatics Group Pacific Northwest
>> National Laboratory
>> 902 Battelle Boulevard
>> P.O. Box 999, Mail Stop J4-33
>> Richland, WA  99352 USA
>> Office:  509-372-6568
>> Email: ronald.taylor@pnl.gov
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %%%%%%%%%%%%
>>
>> contents of the "ProteinCounter1.java" file:
>>
>>
>>
>> //  to compile
>> // javac ProteinCounter1.java
>> // jar cf ProteinCounterTest.jar  *.class
>>
>> // to run
>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>
>>
>> import org.apache.hadoop.hbase.HBaseConfiguration;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>> import org.apache.hadoop.mapreduce.Job; import
>> org.apache.hadoop.io.IntWritable;
>>
>> import java.util.*;
>> import java.io.*;
>> import org.apache.hadoop.hbase.*;
>> import org.apache.hadoop.hbase.client.*; import
>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>> import org.apache.hadoop.hbase.mapreduce.*;
>>
>>
>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>> /**
>>  * counts the number of times each protein appears in the proteinTable
>>  *
>>  */
>> public class ProteinCounter1 {
>>
>>
>>    static class ProteinMapper1 extends
>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>
>>        private int numRecords = 0;
>>        private static final IntWritable one = new IntWritable(1);
>>
>>        @Override
>>            public void map(ImmutableBytesWritable row, Result values,
>> Context context) throws IOException {
>>
>>            // retrieve the value of proteinID, which is the row key
>> for each protein in the proteinTable
>>            ImmutableBytesWritable proteinID_Key = new
>> ImmutableBytesWritable(row.get());
>>            try {
>>                context.write(proteinID_Key, one);
>>            } catch (InterruptedException e) {
>>                throw new IOException(e);
>>            }
>>            numRecords++;
>>            if ((numRecords % 100) == 0) {
>>                context.setStatus("mapper processed " + numRecords + "
>> proteinTable records so far");
>>            }
>>        }
>>    }
>>
>>    public static class ProteinReducer1 extends
>> TableReducer<ImmutableBytesWritable,
>>                                               IntWritable,
>> ImmutableBytesWritable> {
>>
>>        public void reduce(ImmutableBytesWritable proteinID_key,
>> Iterable<IntWritable> values,
>>                            Context context)
>>            throws IOException, InterruptedException {
>>            int sum = 0;
>>            for (IntWritable val : values) {
>>                sum += val.get();
>>            }
>>
>>            Put put = new Put(proteinID_key.get());
>>            put.add(Bytes.toBytes("resultFields"),
>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>            System.out.println(String.format("stats : proteinID_key :
>> %d, count : %d",
>>
>> Bytes.toInt(proteinID_key.get()), sum));
>>            context.write(proteinID_key, put);
>>        }
>>    }
>>
>>    public static void main(String[] args) throws Exception {
>>
>>        org.apache.hadoop.conf.Configuration conf;
>>           conf = org.apache.hadoop.hbase.HBaseConfiguration.create();
>>
>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>        job.setJarByClass(ProteinCounter1.class);
>>
>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>
>>        String colFamilyToUse = "proteinFields";
>>        String fieldToUse = "Protein_Ref_ID";
>>
>>        // retreive this one column from the specified family
>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>> Bytes.toBytes(fieldToUse));
>>
>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>> filterToUse =
>>                 new
>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>        scan.setFilter(filterToUse);
>>
>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>> ProteinMapper1.class,
>>                              ImmutableBytesWritable.class,
>>                                              IntWritable.class, job);
>>        TableMapReduceUtil.initTableReducerJob("testTable",
>> ProteinReducer1.class, job);
>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>    }
>> }
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %%%%%%%
>>
>>
>> session output:
>>
>> [rtaylor@h01 Hadoop]$ javac ProteinCounter1.java
>>
>> [rtaylor@h01 Hadoop]$ jar cf ProteinCounterTest.jar  *.class
>>
>> [rtaylor@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar
>> ProteinCounter1
>>
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir;  Ignoring.
>> 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir;  Ignoring.
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
>> zookeeper
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14
>> GMT
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:host.name=h01.emsl.pnl.gov
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_21
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.home=/usr/java/jdk1.6.0_21/jre
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java/
>> default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/b
>> in/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-cl
>> i-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/h
>> adoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/../
>> lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/common
>> s-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-api
>> -1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/
>> hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../lib
>> /hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.
>> 5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/hom
>> e/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/..
>> /lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.14
>> .jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoo
>> p/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.15
>> .jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoop
>> /hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servle
>> t-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.ja
>> r:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/
>> hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-2
>> .1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:
>> /home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home
>> /rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeepe
>> r-3.3.1.jar
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/Li
>> nux-i386-32
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/tmp
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.arch=i386
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.18-194.11.1.el5
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.name=rtaylor
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.home=/home/rtaylor
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
>> environment:user.dir=/home/rtaylor/HadoopWork/Hadoop
>> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client
>> connection,
>> connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h0
>> 8:2182,h07:2182,h06:2182 sessionTimeout=60000
>> watcher=org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper@dcb03b
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket connection
>> to server h04/192.168.200.24:2182
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection
>> established to h04/192.168.200.24:2182, initiating session
>> 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server h04/192.168.200.24:2182, sessionid =
>> 0x22b21c04c330002, negotiated timeout = 60000
>> 10/09/17 15:46:20 INFO mapred.JobClient: Running job:
>> job_201009171510_0004
>> 10/09/17 15:46:21 INFO mapred.JobClient:  map 0% reduce 0%
>>
>> 10/09/17 15:46:27 INFO mapred.JobClient: Task Id :
>> attempt_201009171510_0004_m_000002_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>        at
>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext
>> .java:193)
>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java
>> :762)
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>        ... 4 more
>>
>> 10/09/17 15:46:33 INFO mapred.JobClient: Task Id :
>> attempt_201009171510_0004_r_000051_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>>        at
>> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext
>> .java:193)
>>        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java
>> :762)
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>>        ... 4 more
>>
>> I terminated the program here via <Control><C>, since the error msgs were simply repeating.
>>
>> [rtaylor@h01 Hadoop]$
>>
>