You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by Frédéric Cons <fr...@gmail.com> on 2012/03/12 14:12:25 UTC

Hbase + ec2 deployment issue

Hi whirr users

I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
following network issue

Here's my config file :

whirr.cluster-name=my-hbase-cluster
whirr.instance-templates=1
zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
hadoop-datanode+hadoop-tasktracker+hbase-regionserver
hbase-site.dfs.replication=1
whirr.provider=aws-ec2
whirr.identity=<my_id>
whirr.credential=<my_cred>
whirr.hardware-id=m1.large
whirr.image-id=eu-west-1/ami-895069fd
whirr.location-id=eu-west-1
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
whirr.public-key-file=${whirr.private-key-file}.pub
whirr.hbase.tarball.url=
http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
whirr.hadoop.tarball.url=
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz

So the configuration is pretty standard...

The problem is : the regions servers can't talk to the master server,
because port 60000 does not seem to be opened (the hbase master rpc port if
I get it correctly)

* Througt telnet :
telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
Trying 10.58.170.126...
telnet: Unable to connect to remote host: Connection refused

* In the region server log :

2012-03-12 12:53:20,889 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
2012-03-12 12:54:21,000 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
master. Retrying. Error was:
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
        at java.lang.Thread.run(Thread.java:662)


Note that other hadoop / hbase related ports look fine : i can telnet from
the region server to the master on port 60010 for example.
The hadoop logs on the region servers (who also act as datanodes /
tasktrackers) look fine

The EC2 security group also look fine : ports 1 - 65535 for tcp and udp
seem to be opened for the whole security group.

I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
combinations

Any idea on what's going on here ?

Best regards
Fred

Re: Hbase + ec2 deployment issue

Posted by Andrei Savu <sa...@gmail.com>.
Hi Frédéric,

That's really strange. Can you confirm the HBase master is running and
listening? Have you tried in a different Amazon location? Same behaviour?

I know Tom had a similar issue some time ago and it seems like it was
somehow generated by the fact that the Amazon Account had too many keypairs
or security groups (not sure).

Is there anything out of the ordinary? Can you use a different account?

-- Andrei Savu / andreisavu.ro

2012/3/12 Frédéric Cons <fr...@gmail.com>

> Hi whirr users
>
> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
> following network issue
>
> Here's my config file :
>
> whirr.cluster-name=my-hbase-cluster
> whirr.instance-templates=1
> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
> hbase-site.dfs.replication=1
> whirr.provider=aws-ec2
> whirr.identity=<my_id>
> whirr.credential=<my_cred>
> whirr.hardware-id=m1.large
> whirr.image-id=eu-west-1/ami-895069fd
> whirr.location-id=eu-west-1
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
> whirr.public-key-file=${whirr.private-key-file}.pub
> whirr.hbase.tarball.url=
> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
> whirr.hadoop.tarball.url=
> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>
> So the configuration is pretty standard...
>
> The problem is : the regions servers can't talk to the master server,
> because port 60000 does not seem to be opened (the hbase master rpc port if
> I get it correctly)
>
> * Througt telnet :
> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
> Trying 10.58.170.126...
> telnet: Unable to connect to remote host: Connection refused
>
> * In the region server log :
>
> 2012-03-12 12:53:20,889 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
> 2012-03-12 12:54:21,000 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
> master. Retrying. Error was:
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy5.getProtocolVersion(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> Note that other hadoop / hbase related ports look fine : i can telnet from
> the region server to the master on port 60010 for example.
> The hadoop logs on the region servers (who also act as datanodes /
> tasktrackers) look fine
>
> The EC2 security group also look fine : ports 1 - 65535 for tcp and udp
> seem to be opened for the whole security group.
>
> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
> combinations
>
> Any idea on what's going on here ?
>
> Best regards
> Fred
>
>
>
>
>
>
>

Re: Hbase + ec2 deployment issue

Posted by Ioannis Canellos <io...@gmail.com>.
Usually when a port is not opened the error message you get is "connection
timeout" and not "connection refused".
That makes me believe that either the service is not running or it falsly
attempts to connect to the wrong host.

-- 
*Ioannis Canellos*
*
FuseSource <http://fusesource.com>

**
Blog: http://iocanel.blogspot.com
**
Twitter: iocanel
*

Re: Hbase + ec2 deployment issue

Posted by Andrei Savu <sa...@gmail.com>.
2012/3/13 Frédéric Cons <fr...@gmail.com>

> Using the 'host' command from my computer gives the following output :
>
> host 46.137.24.14
> 14.24.137.46.in-addr.arpa domain name pointer
> ec2-46-137-24-14.eu-west-1.compute.amazonaws.com.
> host 46.51.133.246
> 246.133.51.46.in-addr.arpa domain name pointer
> ec2-46-51-133-246.eu-west-1.compute.amazonaws.com.
> host 79.125.35.234
> 234.35.125.79.in-addr.arpa domain name pointer
> ec2-79-125-35-234.eu-west-1.compute.amazonaws.com
>

Seems to be working as expected.


>
> Note that the regionservers do correctly try to find the master on its
> internal hostname (ip-10-227-58-207.eu-west-1.compute.internal, aka
> 10.227.58.207), and that a few telnet command give me:
>
> telnet ip-10-227-58-207.eu-west-1.compute.internal 60000
> Trying 10.227.58.207...
> telnet: Unable to connect to remote host: Connection refused
>
> telnet ip-10-227-58-207.eu-west-1.compute.internal 60010
> Trying 10.227.58.207...
> Connected to ip-10-227-58-207.eu-west-1.compute.internal.
> Escape character is '^]'.
>
> Meanwhile on the master :
> fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60000
> tcp        0      0 127.0.1.1:60000         0.0.0.0:*
> LISTEN      16502/java
> fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60010
> tcp        0      0 0.0.0.0:60010           0.0.0.0:*
> LISTEN      16502/java
>
> (note the 127.0.1.1 vs 0.0.0.0 host address)
>
> And when I telnet these master ports from the master itself :
>
> telnet (localhost|127.0.0.1|127.0.1.1) 60010 are ok
>
> but only
>
> telnet 127.0.1.1 60000 is ok (meaning than even using 'localhost' does not
> work for port 60000)
>
> So the question is: why is there this discrepancy in local adresses for
> two different ports, but for the same process ?
>

It may be a configuration / socket binding issue. Not sure if this is a
HBase bug or we are doing something wrong.

I will take your recipe and try to start a cluster from my machine to check
if I see the same behaviour.


> I guess it's more an hbase question than a whirr one, but if anyone here
> has a hint, I'd love to hear it :)
>
> Regards
> Fred
>
>
> 2012/3/13 Andrei Savu <sa...@gmail.com>
>
>> Frédéric can you perform reverse DNS queries for the public IP addresses
>> of the VMs
>> started in Amazon from the local machine?
>>
>>
>> 2012/3/13 Frédéric Cons <fr...@gmail.com>
>>
>>> Hi Andrei
>>> I tried to use another AWS account, another region, still no luck...
>>> And the master process is running (looping on 'waiting for region
>>> servers to check in' messages)
>>> As I managed to make it work 2 weeks ago, I also suspect it is a weird
>>> aws account issue.
>>> I'll update this thread if I finally find a solution
>>> Thank you
>>> Fred
>>>
>>>
>>> 2012/3/12 Andrei Savu <sa...@gmail.com>
>>>
>>>> This is how the recipe we are using for integration testing looks like:
>>>>
>>>> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties
>>>>
>>>> If you specify only the location-id and no image-id Whirr should be
>>>> able to find the right image for you.
>>>>
>>>>
>>>> 2012/3/12 Frédéric Cons <fr...@gmail.com>
>>>>
>>>>> Hi whirr users
>>>>>
>>>>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
>>>>> following network issue
>>>>>
>>>>> Here's my config file :
>>>>>
>>>>> whirr.cluster-name=my-hbase-cluster
>>>>> whirr.instance-templates=1
>>>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
>>>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>>>>> hbase-site.dfs.replication=1
>>>>> whirr.provider=aws-ec2
>>>>> whirr.identity=<my_id>
>>>>> whirr.credential=<my_cred>
>>>>> whirr.hardware-id=m1.large
>>>>> whirr.image-id=eu-west-1/ami-895069fd
>>>>> whirr.location-id=eu-west-1
>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
>>>>> whirr.public-key-file=${whirr.private-key-file}.pub
>>>>> whirr.hbase.tarball.url=
>>>>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
>>>>> whirr.hadoop.tarball.url=
>>>>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>>>>>
>>>>> So the configuration is pretty standard...
>>>>>
>>>>> The problem is : the regions servers can't talk to the master server,
>>>>> because port 60000 does not seem to be opened (the hbase master rpc port if
>>>>> I get it correctly)
>>>>>
>>>>> * Througt telnet :
>>>>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
>>>>> Trying 10.58.170.126...
>>>>> telnet: Unable to connect to remote host: Connection refused
>>>>>
>>>>> * In the region server log :
>>>>>
>>>>> 2012-03-12 12:53:20,889 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
>>>>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
>>>>> 2012-03-12 12:54:21,000 WARN
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
>>>>> master. Retrying. Error was:
>>>>> java.net.ConnectException: Connection refused
>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>>>>         at $Proxy5.getProtocolVersion(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>>>>>         at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>> Note that other hadoop / hbase related ports look fine : i can telnet
>>>>> from the region server to the master on port 60010 for example.
>>>>> The hadoop logs on the region servers (who also act as datanodes /
>>>>> tasktrackers) look fine
>>>>>
>>>>> The EC2 security group also look fine : ports 1 - 65535 for tcp and
>>>>> udp seem to be opened for the whole security group.
>>>>>
>>>>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
>>>>> combinations
>>>>>
>>>>> Any idea on what's going on here ?
>>>>>
>>>>> Best regards
>>>>> Fred
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hbase + ec2 deployment issue

Posted by Frédéric Cons <fr...@gmail.com>.
Using the 'host' command from my computer gives the following output :

host 46.137.24.14
14.24.137.46.in-addr.arpa domain name pointer
ec2-46-137-24-14.eu-west-1.compute.amazonaws.com.
host 46.51.133.246
246.133.51.46.in-addr.arpa domain name pointer
ec2-46-51-133-246.eu-west-1.compute.amazonaws.com.
host 79.125.35.234
234.35.125.79.in-addr.arpa domain name pointer
ec2-79-125-35-234.eu-west-1.compute.amazonaws.com

Note that the regionservers do correctly try to find the master on its
internal hostname (ip-10-227-58-207.eu-west-1.compute.internal, aka
10.227.58.207), and that a few telnet command give me:

telnet ip-10-227-58-207.eu-west-1.compute.internal 60000
Trying 10.227.58.207...
telnet: Unable to connect to remote host: Connection refused

telnet ip-10-227-58-207.eu-west-1.compute.internal 60010
Trying 10.227.58.207...
Connected to ip-10-227-58-207.eu-west-1.compute.internal.
Escape character is '^]'.

Meanwhile on the master :
fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60000
tcp        0      0 127.0.1.1:60000         0.0.0.0:*               LISTEN
     16502/java
fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60010
tcp        0      0 0.0.0.0:60010           0.0.0.0:*               LISTEN
     16502/java

(note the 127.0.1.1 vs 0.0.0.0 host address)

And when I telnet these master ports from the master itself :

telnet (localhost|127.0.0.1|127.0.1.1) 60010 are ok

but only

telnet 127.0.1.1 60000 is ok (meaning than even using 'localhost' does not
work for port 60000)

So the question is: why is there this discrepancy in local adresses for two
different ports, but for the same process ?
I guess it's more an hbase question than a whirr one, but if anyone here
has a hint, I'd love to hear it :)

Regards
Fred


2012/3/13 Andrei Savu <sa...@gmail.com>

> Frédéric can you perform reverse DNS queries for the public IP addresses
> of the VMs
> started in Amazon from the local machine?
>
>
> 2012/3/13 Frédéric Cons <fr...@gmail.com>
>
>> Hi Andrei
>> I tried to use another AWS account, another region, still no luck...
>> And the master process is running (looping on 'waiting for region servers
>> to check in' messages)
>> As I managed to make it work 2 weeks ago, I also suspect it is a weird
>> aws account issue.
>> I'll update this thread if I finally find a solution
>> Thank you
>> Fred
>>
>>
>> 2012/3/12 Andrei Savu <sa...@gmail.com>
>>
>>> This is how the recipe we are using for integration testing looks like:
>>>
>>> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties
>>>
>>> If you specify only the location-id and no image-id Whirr should be able
>>> to find the right image for you.
>>>
>>>
>>> 2012/3/12 Frédéric Cons <fr...@gmail.com>
>>>
>>>> Hi whirr users
>>>>
>>>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
>>>> following network issue
>>>>
>>>> Here's my config file :
>>>>
>>>> whirr.cluster-name=my-hbase-cluster
>>>> whirr.instance-templates=1
>>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
>>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>>>> hbase-site.dfs.replication=1
>>>> whirr.provider=aws-ec2
>>>> whirr.identity=<my_id>
>>>> whirr.credential=<my_cred>
>>>> whirr.hardware-id=m1.large
>>>> whirr.image-id=eu-west-1/ami-895069fd
>>>> whirr.location-id=eu-west-1
>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
>>>> whirr.public-key-file=${whirr.private-key-file}.pub
>>>> whirr.hbase.tarball.url=
>>>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
>>>> whirr.hadoop.tarball.url=
>>>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>>>>
>>>> So the configuration is pretty standard...
>>>>
>>>> The problem is : the regions servers can't talk to the master server,
>>>> because port 60000 does not seem to be opened (the hbase master rpc port if
>>>> I get it correctly)
>>>>
>>>> * Througt telnet :
>>>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
>>>> Trying 10.58.170.126...
>>>> telnet: Unable to connect to remote host: Connection refused
>>>>
>>>> * In the region server log :
>>>>
>>>> 2012-03-12 12:53:20,889 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
>>>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
>>>> 2012-03-12 12:54:21,000 WARN
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
>>>> master. Retrying. Error was:
>>>> java.net.ConnectException: Connection refused
>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>         at
>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>>>         at $Proxy5.getProtocolVersion(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>>>         at
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>>>>         at java.lang.Thread.run(Thread.java:662)
>>>>
>>>>
>>>> Note that other hadoop / hbase related ports look fine : i can telnet
>>>> from the region server to the master on port 60010 for example.
>>>> The hadoop logs on the region servers (who also act as datanodes /
>>>> tasktrackers) look fine
>>>>
>>>> The EC2 security group also look fine : ports 1 - 65535 for tcp and
>>>> udp seem to be opened for the whole security group.
>>>>
>>>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
>>>> combinations
>>>>
>>>> Any idea on what's going on here ?
>>>>
>>>> Best regards
>>>> Fred
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hbase + ec2 deployment issue

Posted by Andrei Savu <sa...@gmail.com>.
Frédéric can you perform reverse DNS queries for the public IP addresses of
the VMs
started in Amazon from the local machine?

2012/3/13 Frédéric Cons <fr...@gmail.com>

> Hi Andrei
> I tried to use another AWS account, another region, still no luck...
> And the master process is running (looping on 'waiting for region servers
> to check in' messages)
> As I managed to make it work 2 weeks ago, I also suspect it is a weird aws
> account issue.
> I'll update this thread if I finally find a solution
> Thank you
> Fred
>
>
> 2012/3/12 Andrei Savu <sa...@gmail.com>
>
>> This is how the recipe we are using for integration testing looks like:
>>
>> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties
>>
>> If you specify only the location-id and no image-id Whirr should be able
>> to find the right image for you.
>>
>>
>> 2012/3/12 Frédéric Cons <fr...@gmail.com>
>>
>>> Hi whirr users
>>>
>>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
>>> following network issue
>>>
>>> Here's my config file :
>>>
>>> whirr.cluster-name=my-hbase-cluster
>>> whirr.instance-templates=1
>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>>> hbase-site.dfs.replication=1
>>> whirr.provider=aws-ec2
>>> whirr.identity=<my_id>
>>> whirr.credential=<my_cred>
>>> whirr.hardware-id=m1.large
>>> whirr.image-id=eu-west-1/ami-895069fd
>>> whirr.location-id=eu-west-1
>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
>>> whirr.public-key-file=${whirr.private-key-file}.pub
>>> whirr.hbase.tarball.url=
>>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
>>> whirr.hadoop.tarball.url=
>>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>>>
>>> So the configuration is pretty standard...
>>>
>>> The problem is : the regions servers can't talk to the master server,
>>> because port 60000 does not seem to be opened (the hbase master rpc port if
>>> I get it correctly)
>>>
>>> * Througt telnet :
>>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
>>> Trying 10.58.170.126...
>>> telnet: Unable to connect to remote host: Connection refused
>>>
>>> * In the region server log :
>>>
>>> 2012-03-12 12:53:20,889 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
>>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
>>> 2012-03-12 12:54:21,000 WARN
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
>>> master. Retrying. Error was:
>>> java.net.ConnectException: Connection refused
>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>         at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>>         at $Proxy5.getProtocolVersion(Unknown Source)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>>         at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>>         at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>>>         at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>>>         at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>>>         at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>> Note that other hadoop / hbase related ports look fine : i can telnet
>>> from the region server to the master on port 60010 for example.
>>> The hadoop logs on the region servers (who also act as datanodes /
>>> tasktrackers) look fine
>>>
>>> The EC2 security group also look fine : ports 1 - 65535 for tcp and udp
>>> seem to be opened for the whole security group.
>>>
>>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
>>> combinations
>>>
>>> Any idea on what's going on here ?
>>>
>>> Best regards
>>> Fred
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Hbase + ec2 deployment issue

Posted by Frédéric Cons <fr...@gmail.com>.
Hi Andrei
I tried to use another AWS account, another region, still no luck...
And the master process is running (looping on 'waiting for region servers
to check in' messages)
As I managed to make it work 2 weeks ago, I also suspect it is a weird aws
account issue.
I'll update this thread if I finally find a solution
Thank you
Fred

2012/3/12 Andrei Savu <sa...@gmail.com>

> This is how the recipe we are using for integration testing looks like:
>
> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties
>
> If you specify only the location-id and no image-id Whirr should be able
> to find the right image for you.
>
>
> 2012/3/12 Frédéric Cons <fr...@gmail.com>
>
>> Hi whirr users
>>
>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
>> following network issue
>>
>> Here's my config file :
>>
>> whirr.cluster-name=my-hbase-cluster
>> whirr.instance-templates=1
>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>> hbase-site.dfs.replication=1
>> whirr.provider=aws-ec2
>> whirr.identity=<my_id>
>> whirr.credential=<my_cred>
>> whirr.hardware-id=m1.large
>> whirr.image-id=eu-west-1/ami-895069fd
>> whirr.location-id=eu-west-1
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
>> whirr.public-key-file=${whirr.private-key-file}.pub
>> whirr.hbase.tarball.url=
>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
>> whirr.hadoop.tarball.url=
>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>>
>> So the configuration is pretty standard...
>>
>> The problem is : the regions servers can't talk to the master server,
>> because port 60000 does not seem to be opened (the hbase master rpc port if
>> I get it correctly)
>>
>> * Througt telnet :
>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
>> Trying 10.58.170.126...
>> telnet: Unable to connect to remote host: Connection refused
>>
>> * In the region server log :
>>
>> 2012-03-12 12:53:20,889 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
>> 2012-03-12 12:54:21,000 WARN
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
>> master. Retrying. Error was:
>> java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>         at $Proxy5.getProtocolVersion(Unknown Source)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>> Note that other hadoop / hbase related ports look fine : i can telnet
>> from the region server to the master on port 60010 for example.
>> The hadoop logs on the region servers (who also act as datanodes /
>> tasktrackers) look fine
>>
>> The EC2 security group also look fine : ports 1 - 65535 for tcp and udp
>> seem to be opened for the whole security group.
>>
>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
>> combinations
>>
>> Any idea on what's going on here ?
>>
>> Best regards
>> Fred
>>
>>
>>
>>
>>
>>
>>
>

Re: Hbase + ec2 deployment issue

Posted by Andrei Savu <sa...@gmail.com>.
This is how the recipe we are using for integration testing looks like:
https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties

If you specify only the location-id and no image-id Whirr should be able to
find the right image for you.

2012/3/12 Frédéric Cons <fr...@gmail.com>

> Hi whirr users
>
> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
> following network issue
>
> Here's my config file :
>
> whirr.cluster-name=my-hbase-cluster
> whirr.instance-templates=1
> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
> hbase-site.dfs.replication=1
> whirr.provider=aws-ec2
> whirr.identity=<my_id>
> whirr.credential=<my_cred>
> whirr.hardware-id=m1.large
> whirr.image-id=eu-west-1/ami-895069fd
> whirr.location-id=eu-west-1
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
> whirr.public-key-file=${whirr.private-key-file}.pub
> whirr.hbase.tarball.url=
> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
> whirr.hadoop.tarball.url=
> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>
> So the configuration is pretty standard...
>
> The problem is : the regions servers can't talk to the master server,
> because port 60000 does not seem to be opened (the hbase master rpc port if
> I get it correctly)
>
> * Througt telnet :
> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
> Trying 10.58.170.126...
> telnet: Unable to connect to remote host: Connection refused
>
> * In the region server log :
>
> 2012-03-12 12:53:20,889 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
> 2012-03-12 12:54:21,000 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
> master. Retrying. Error was:
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy5.getProtocolVersion(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> Note that other hadoop / hbase related ports look fine : i can telnet from
> the region server to the master on port 60010 for example.
> The hadoop logs on the region servers (who also act as datanodes /
> tasktrackers) look fine
>
> The EC2 security group also look fine : ports 1 - 65535 for tcp and udp
> seem to be opened for the whole security group.
>
> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
> combinations
>
> Any idea on what's going on here ?
>
> Best regards
> Fred
>
>
>
>
>
>
>