You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Royston Sellman <ro...@googlemail.com> on 2012/04/10 11:58:40 UTC

Not a host:port issue

We have been running M-R jobs successfully on Hadoop v1 and HBase 0.93 SNAPSHOT (built from trunk) using the HBase Java API. We recently updated our Hadoop and HBase installations to the latest versions of the code from the source repositories. 

 

We now have a working Hadoop 1.0.2 cluster with HBase (trunk) and Zookeeper (3.3.3) running. Standard M-R jobs run fine. HBase shell works fine (you can read and write to tables).

 

However, when we try to run M-R jobs that use the HBase API we get the following error:

 

  [sshexec] Exception in thread "main" 

  [sshexec] java.lang.IllegalArgumentException: Not a host:port pair: �[][][]

  [sshexec] 5086@namenode[][]namenode,60000,1334048759631

  [sshexec] 

  [sshexec]          at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:60)

  [sshexec] 

  [sshexec]          at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)

  [sshexec] 

  [sshexec]          at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:352)

  [sshexec] 

  [sshexec]          at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:90)

  [sshexec]          at uk.org.cse.utility.HBaseUtility.CreateHBaseTable(HBaseUtility.java:41)

  [sshexec]          at uk.org.cse.ingestion.SampleUploader.main(SampleUploader.java:198)

  [sshexec]          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

  [sshexec]          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

  [sshexec]          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

  [sshexec]          at java.lang.reflect.Method.invoke(Method.java:597)

  [sshexec]          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 

NOTE: The "[]" strings are to represent the unknown character symbol presented in the program output. 

 

After some searching we found that this problem can be caused by classpath (client/server jar version mismatches) so we removed all references to the HBase jars from the HADOOP_CLASSPATH in hadoop-env.sh and added them using the backtick `hbase classpath` method described in the documentation (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath). 

 

So we now launch a M-R HBase API job as follows:

 

HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath` $HADOOP_INSTALL/bin/hadoop jar ${targetjardir2}/${jarfilename} ${targetclass} ${testargs}

 

One of our programs prints the current system classpath before running any M-R code. We captured the output from this program (in the pastebin below) and it shows that the classpath contains lib jars for both Hadoop and HBase. 

 

Please find our hadoop-env.sh, hbase-env.sh and complete program output in the following pastebins:

 

Hadoop-env.sh:

 

http://pastebin.com/3CQcHjds

 

HBase-env.sh:

 

http://pastebin.com/49AdAzv7

 

Program output:

 

http://pastebin.com/wbwEL9Li

 

Any help will be most gratefully received,

 

Royston


Re: Not a host:port issue

Posted by Royston Sellman <ro...@googlemail.com>.
On 12 Apr 2012, at 15:49, Stack wrote:

> On Thu, Apr 12, 2012 at 2:52 AM, Tom Wilcox <To...@cse.org.uk> wrote:
>> I have not confirmed this, but I have suspicions that the old hbase client references were coming from the pig contributions to the classpath. I am wondering if perhaps the current hbase and zookeeper libs would have been included in the pig jar when we built from source. So that the old hbase client would be found inside the pig jar at the front of the classpath in the past...?
>> 
> 
> The joys of classpath!

Amen to that! You think you've made friends with classpath then it whacks you round the back of the head.

Royston


Re: Not a host:port issue

Posted by Stack <st...@duboce.net>.
On Thu, Apr 12, 2012 at 2:52 AM, Tom Wilcox <To...@cse.org.uk> wrote:
> I have not confirmed this, but I have suspicions that the old hbase client references were coming from the pig contributions to the classpath. I am wondering if perhaps the current hbase and zookeeper libs would have been included in the pig jar when we built from source. So that the old hbase client would be found inside the pig jar at the front of the classpath in the past...?
>

The joys of classpath!

> WRT to the DiskErrorException we mentioned in the previous email, that was being caused by one of the key worker nodes having no disk space left. After deleting a few redundant files from each of our cluster nodes we were back up and running.
>

Good to hear you are up again.

St.Ack

RE: Not a host:port issue

Posted by Tom Wilcox <To...@cse.org.uk>.
Thanks Stack,

We have successfully restored our Hadoop/HBase cluster to a healthy state. 

It seems that moving the pig and zookeeper references to the back of the HADOOP_CLASSPATH so that the backticked `hbase classpath` came first in the HADOOP_CLASSPATH resolved the issue.

I have not confirmed this, but I have suspicions that the old hbase client references were coming from the pig contributions to the classpath. I am wondering if perhaps the current hbase and zookeeper libs would have been included in the pig jar when we built from source. So that the old hbase client would be found inside the pig jar at the front of the classpath in the past...? 

WRT to the DiskErrorException we mentioned in the previous email, that was being caused by one of the key worker nodes having no disk space left. After deleting a few redundant files from each of our cluster nodes we were back up and running.

Thanks for all your help.

Tom and Royston

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 12 April 2012 05:14
To: user@hbase.apache.org
Subject: Re: Not a host:port issue

On Wed, Apr 11, 2012 at 5:14 AM, Tom Wilcox <To...@cse.org.uk> wrote:
> 1) Removed all references to HADOOP_CLASSPATH in hadoop-env.sh and replaced with the following so that any initial HADOOP_CLASSPATH settings have precedence:
>
> # Extra Java CLASSPATH elements.  Optional.
> export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$ZOOKEEPER_INSTALL/*"
> export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIGDIR/*"
>

Above you are including a version that is probably different from
hbase's and its being stuck ahead of ours on the classpath IIRC.

Not sure why this would give you the behavior you are seeing.  I'd
have thought it'd have made no difference.  Could it be that your
hbase is homed at different locations up in zk and you are picking up
an old home because you are picking up an old config?  (It doesn't
looks so when I look at your pastebins -- you seem to have same
ensemble in each case w/ same /zookeeper_data homedir).  Different zk
instances up for each test?  I'm a little baffled.


> 2) Ran the job with the following (so that HADOOP_CLASSPATH contained all appropriate HBase API jars):
>
> HADOOP_CLASSPATH=`hbase classpath` hadoop jar SampleUploader.jar uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no
>
> We are now dealing with the following error:
>
> [sshexec] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/hadoop1/distcache/-6735763131868259398_188156722_559071878/namenode/tmp/mapred/staging/hadoop1/.staging/job_201204111219_0013/libjars/hbase-0.95-SNAPSHOT.jar
>  [sshexec]     at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>  [sshexec]     at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>  [sshexec]     at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:172)
>  [sshexec]     at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212)
>  [sshexec]     at java.security.AccessController.doPrivileged(Native Method)
>  [sshexec]     at javax.security.auth.Subject.doAs(Subject.java:396)
>  [sshexec]     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
>  [sshexec]     at java.lang.Thread.run(Thread.java:662)
>  [sshexec]
>

These dirs set up out on your cluster?

Google it.  There's a couple of possible explanations.

You might go review how to package a jar for mapreduce.   It can be a
little tricky to get right.  Best to ship in the job jar all of its
dependencies and keep your cluster CLASSPATH clean.  See the trick
where the hbase mapreduce jobs pull in jars of the CLASSPATH that its
needs down in TableMapReduceUtil#addDependencyJars.  Perhaps review
too the hbase story on mapreduce and CLASSPATHing:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Good luck lads,


St.Ack

Re: Not a host:port issue

Posted by Stack <st...@duboce.net>.
On Wed, Apr 11, 2012 at 5:14 AM, Tom Wilcox <To...@cse.org.uk> wrote:
> 1) Removed all references to HADOOP_CLASSPATH in hadoop-env.sh and replaced with the following so that any initial HADOOP_CLASSPATH settings have precedence:
>
> # Extra Java CLASSPATH elements.  Optional.
> export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$ZOOKEEPER_INSTALL/*"
> export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIGDIR/*"
>

Above you are including a version that is probably different from
hbase's and its being stuck ahead of ours on the classpath IIRC.

Not sure why this would give you the behavior you are seeing.  I'd
have thought it'd have made no difference.  Could it be that your
hbase is homed at different locations up in zk and you are picking up
an old home because you are picking up an old config?  (It doesn't
looks so when I look at your pastebins -- you seem to have same
ensemble in each case w/ same /zookeeper_data homedir).  Different zk
instances up for each test?  I'm a little baffled.


> 2) Ran the job with the following (so that HADOOP_CLASSPATH contained all appropriate HBase API jars):
>
> HADOOP_CLASSPATH=`hbase classpath` hadoop jar SampleUploader.jar uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no
>
> We are now dealing with the following error:
>
> [sshexec] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/hadoop1/distcache/-6735763131868259398_188156722_559071878/namenode/tmp/mapred/staging/hadoop1/.staging/job_201204111219_0013/libjars/hbase-0.95-SNAPSHOT.jar
>  [sshexec]     at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>  [sshexec]     at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>  [sshexec]     at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:172)
>  [sshexec]     at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212)
>  [sshexec]     at java.security.AccessController.doPrivileged(Native Method)
>  [sshexec]     at javax.security.auth.Subject.doAs(Subject.java:396)
>  [sshexec]     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
>  [sshexec]     at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
>  [sshexec]     at java.lang.Thread.run(Thread.java:662)
>  [sshexec]
>

These dirs set up out on your cluster?

Google it.  There's a couple of possible explanations.

You might go review how to package a jar for mapreduce.   It can be a
little tricky to get right.  Best to ship in the job jar all of its
dependencies and keep your cluster CLASSPATH clean.  See the trick
where the hbase mapreduce jobs pull in jars of the CLASSPATH that its
needs down in TableMapReduceUtil#addDependencyJars.  Perhaps review
too the hbase story on mapreduce and CLASSPATHing:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Good luck lads,


St.Ack

RE: Not a host:port issue

Posted by Tom Wilcox <To...@cse.org.uk>.
I forgot to mention, that I am working with Royston on this issue. We have made some progress...

We have managed to get rid of our version/classpath issue by compiling the Java client class on the namenode with the HBase Classpath and running it from there (however, I then get errors due to missing client classes where the class has not been distributed to the map nodes).

So if we compile and run on the namenode as follows:

[hadoop1@namenode src]$ javac uk/org/cse/ingestion/SampleUploader.java -cp  `hbase classpath`
[hadoop1@namenode src]$ java -cp `hbase classpath` uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no 1>class_output 2>&1

The output from this can be found here: http://pastebin.com/jn5e7E2K

Note that, in contrast to the following and previous runs using jars, we get the appropriate zookeeper version in the client output:

12/04/11 12:20:12 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT

In contrast to this, if we build a jar from the same place on the namenode and deploy using hadoop as follows:

jar cf SampleUploader.jar src/uk
HADOOP_CLASSPATH=`hbase classpath` hadoop jar SampleUploader.jar uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no 1>jar_output 2>&1 

We get the following output: http://pastebin.com/xG98KfYe

With the offending client output showing that we are using the old version of zookeeper:

12/04/11 12:03:25 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT

It seems that the "hadoop" command is introducing the older client API jars into the classpath in place of/ahead of the latest. By testing the successful class run above using the "hadoop" command in place of "java" we get the same error as with the jar.
 
HADOOP_CLASSPATH=`hbase classpath` hadoop uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no 1>class_output_with_hadoop 2>&1

The output (http://pastebin.com/8tfCqGgV) shows the incorrect zookeeper client in use.

Our hadoop-env.sh file has the following additions to the HADOOP_CLASSPATH: 

# Extra Java CLASSPATH elements.  Optional.
export HADOOP_CLASSPATH="$ZOOKEEPER_INSTALL/*:$HADOOP_CLASSPATH"
export HADOOP_CLASSPATH="$PIGDIR/*:$HADOOP_CLASSPATH"

Commenting out these lines and rerunning gets rid of the version errors! We are now getting what appears to be an unrelated local disk error from the job output, but our issue with the client version mismatch appears to have been resolved by 2 things:

1) Removed all references to HADOOP_CLASSPATH in hadoop-env.sh and replaced with the following so that any initial HADOOP_CLASSPATH settings have precedence:

# Extra Java CLASSPATH elements.  Optional.
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$ZOOKEEPER_INSTALL/*"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIGDIR/*"

2) Ran the job with the following (so that HADOOP_CLASSPATH contained all appropriate HBase API jars):

HADOOP_CLASSPATH=`hbase classpath` hadoop jar SampleUploader.jar uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no

We are now dealing with the following error:

[sshexec] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/hadoop1/distcache/-6735763131868259398_188156722_559071878/namenode/tmp/mapred/staging/hadoop1/.staging/job_201204111219_0013/libjars/hbase-0.95-SNAPSHOT.jar
  [sshexec] 	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
  [sshexec] 	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
  [sshexec] 	at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:172)
  [sshexec] 	at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187)
  [sshexec] 	at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212)
  [sshexec] 	at java.security.AccessController.doPrivileged(Native Method)
  [sshexec] 	at javax.security.auth.Subject.doAs(Subject.java:396)
  [sshexec] 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
  [sshexec] 	at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
  [sshexec] 	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
  [sshexec] 	at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
  [sshexec] 	at java.lang.Thread.run(Thread.java:662)
  [sshexec]

Thanks,
Tom



-----Original Message-----
From: Tom Wilcox [mailto:Tom.Wilcox@cse.org.uk] 
Sent: 11 April 2012 10:36
To: user@hbase.apache.org
Subject: RE: Not a host:port issue

I am not sure how to confirm which version of the HBase API the client is using. Although we are referencing the HBase-0.95-SNAPSHOT and zookeeper 3.4.3 jars, we are still seeing the following message in the program output when building a job:

[sshexec] 12/04/11 10:35:13 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT

Not that it is stating the zookeeper version as 3.3.3 and it was built in February (whereas our referenced jars were built yesterday).

How should I be using the HBase jars with our HBase Client Java program to ensure that it is the latest version (and how can I properly confirm this)?

Thanks,
Tom

-----Original Message-----
From: Royston Sellman [mailto:royston.sellman@googlemail.com] 
Sent: 10 April 2012 18:38
To: user@hbase.apache.org
Subject: RE: Not a host:port issue

The CLASSPATH(S) are here: http://pastebin.com/wbwEL9Li
Looks to me like the client is 0.95-SNAPSHOT as is our HBase server.
However I just noticed the client is built with ZK 3.4.3 but our ZK server is 3.3.3. Is there any incompatibility between those versions of ZK? (I'm going to make them the same but that will take a few minutes :)

Thanks,
Royston



-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 10 April 2012 17:08
To: user@hbase.apache.org
Subject: Re: Not a host:port issue

On Tue, Apr 10, 2012 at 2:58 AM, Royston Sellman <ro...@googlemail.com> wrote:
>  [sshexec] java.lang.IllegalArgumentException: Not a host:port pair: 
>  [][][]
>

We changed how we persist names to zookeeper in 0.92.x.  It used to be a host:port but now is a ServerName which is host comma port comma startcode and all is prefixed with zk sequenceid.

It looks like your mapreduce job is using an old hbase client.  Is that possible?  Can you check its CLASSPATH?

St.Ack


RE: Not a host:port issue

Posted by Tom Wilcox <To...@cse.org.uk>.
I am not sure how to confirm which version of the HBase API the client is using. Although we are referencing the HBase-0.95-SNAPSHOT and zookeeper 3.4.3 jars, we are still seeing the following message in the program output when building a job:

[sshexec] 12/04/11 10:35:13 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT

Not that it is stating the zookeeper version as 3.3.3 and it was built in February (whereas our referenced jars were built yesterday).

How should I be using the HBase jars with our HBase Client Java program to ensure that it is the latest version (and how can I properly confirm this)?

Thanks,
Tom

-----Original Message-----
From: Royston Sellman [mailto:royston.sellman@googlemail.com] 
Sent: 10 April 2012 18:38
To: user@hbase.apache.org
Subject: RE: Not a host:port issue

The CLASSPATH(S) are here: http://pastebin.com/wbwEL9Li
Looks to me like the client is 0.95-SNAPSHOT as is our HBase server.
However I just noticed the client is built with ZK 3.4.3 but our ZK server is 3.3.3. Is there any incompatibility between those versions of ZK? (I'm going to make them the same but that will take a few minutes :)

Thanks,
Royston



-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 10 April 2012 17:08
To: user@hbase.apache.org
Subject: Re: Not a host:port issue

On Tue, Apr 10, 2012 at 2:58 AM, Royston Sellman <ro...@googlemail.com> wrote:
>  [sshexec] java.lang.IllegalArgumentException: Not a host:port pair: 
>  [][][]
>

We changed how we persist names to zookeeper in 0.92.x.  It used to be a host:port but now is a ServerName which is host comma port comma startcode and all is prefixed with zk sequenceid.

It looks like your mapreduce job is using an old hbase client.  Is that possible?  Can you check its CLASSPATH?

St.Ack


RE: Not a host:port issue

Posted by Royston Sellman <ro...@googlemail.com>.
The CLASSPATH(S) are here: http://pastebin.com/wbwEL9Li
Looks to me like the client is 0.95-SNAPSHOT as is our HBase server.
However I just noticed the client is built with ZK 3.4.3 but our ZK server is 3.3.3. Is there any incompatibility between those versions of ZK? (I'm going to make them the same but that will take a few minutes :)

Thanks,
Royston



-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 10 April 2012 17:08
To: user@hbase.apache.org
Subject: Re: Not a host:port issue

On Tue, Apr 10, 2012 at 2:58 AM, Royston Sellman <ro...@googlemail.com> wrote:
>  [sshexec] java.lang.IllegalArgumentException: Not a host:port pair: 
>  [][][]
>

We changed how we persist names to zookeeper in 0.92.x.  It used to be a host:port but now is a ServerName which is host comma port comma startcode and all is prefixed with zk sequenceid.

It looks like your mapreduce job is using an old hbase client.  Is that possible?  Can you check its CLASSPATH?

St.Ack


Re: Not a host:port issue

Posted by Stack <st...@duboce.net>.
On Tue, Apr 10, 2012 at 2:58 AM, Royston Sellman
<ro...@googlemail.com> wrote:
>  [sshexec] java.lang.IllegalArgumentException: Not a host:port pair: �[][][]
>

We changed how we persist names to zookeeper in 0.92.x.  It used to be
a host:port but now is a ServerName which is host comma port comma
startcode and all is prefixed with zk sequenceid.

It looks like your mapreduce job is using an old hbase client.  Is
that possible?  Can you check its CLASSPATH?

St.Ack