You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Segel <mi...@hotmail.com> on 2010/06/01 20:13:57 UTC

Weird problem on one of my clouds...

Ok, 

I didn't set this cloud up, one of my admins did based on the work I did for the other clouds.

When we start up hbase, it looks ok. 
When I enter hbase shell, I get the shell prompt and when I enter status, I get that all region servers are up.

When I type in list, I get a "10/06/01 13:09:56 INFO ipc.HbaseRPC: Server at /10.8.239.185:60020 could not be reached after 1 tries, giving up."

Ok... when I try tools zk_dump I get the following:
 Region servers:
    - 127.0.0.1:60020
    - 10.8.239.89:60020
    - 127.0.0.1:60020
    - 10.8.239.87:60020
    - 10.8.239.95:60020
    - 127.0.0.1:60020
    - 10.8.239.93:60020
    - 10.8.239.91:60020
    - 127.0.0.1:60020
    - 127.0.0.1:60020
    - 127.0.0.1:60020


This doesn't make sense.
Why would some of my region servers show up saying that they are local hosts?
All of the servers have the same set of configuration files.

Thoughts?

Thx

-Mike


 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Re: Weird problem on one of my clouds...

Posted by Stack <st...@duboce.net>.
On Tue, Jun 1, 2010 at 11:51 AM, Hegner, Travis <TH...@trilliumit.com> wrote:
> First instinct is that you have a name resolution issue.
>
> Best Bet: set up forward/reverse DNS.

Yeah.  You need both directions (Fixed in TRUNK -- you should be able
to run with just forward in TRUNK)

> Whether it's causing your issue or not is disputable, but I think most people will agree that having a proper name/address resolution system in place for hadoop/hbase is essential.
>
+1

St.Ack

RE: Weird problem on one of my clouds...

Posted by "Hegner, Travis" <TH...@trilliumit.com>.
First instinct is that you have a name resolution issue.

Best Bet: set up forward/reverse DNS.
more difficult: hosts file on every machine should map to every other machine.

Setting up bind on your master will take almost no resources, and isn't too difficult. I've never checked with the list if that is recommended or not, if not then list, yell at me now. I have a tutorial on my website on how to set up an authoritative dns server running on ubuntu. http://www.travishegner.com/2009/06/authoritative-dns.html

Whether it's causing your issue or not is disputable, but I think most people will agree that having a proper name/address resolution system in place for hadoop/hbase is essential.

Thanks,

Travis Hegner
http://www.travishegner.com/

-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com]
Sent: Tuesday, June 01, 2010 2:14 PM
To: hbase-user@hadoop.apache.org
Subject: Weird problem on one of my clouds...
Importance: High


Ok,

I didn't set this cloud up, one of my admins did based on the work I did for the other clouds.

When we start up hbase, it looks ok.
When I enter hbase shell, I get the shell prompt and when I enter status, I get that all region servers are up.

When I type in list, I get a "10/06/01 13:09:56 INFO ipc.HbaseRPC: Server at /10.8.239.185:60020 could not be reached after 1 tries, giving up."

Ok... when I try tools zk_dump I get the following:
 Region servers:
    - 127.0.0.1:60020
    - 10.8.239.89:60020
    - 127.0.0.1:60020
    - 10.8.239.87:60020
    - 10.8.239.95:60020
    - 127.0.0.1:60020
    - 10.8.239.93:60020
    - 10.8.239.91:60020
    - 127.0.0.1:60020
    - 127.0.0.1:60020
    - 127.0.0.1:60020


This doesn't make sense.
Why would some of my region servers show up saying that they are local hosts?
All of the servers have the same set of configuration files.

Thoughts?

Thx

-Mike



_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

The information contained in this communication is confidential and is intended only for the use of the named recipient.  Unauthorized use, disclosure, or copying is strictly prohibited and may be unlawful.  If you have received this communication in error, you should know that you are bound to confidentiality, and should please immediately notify the sender or our IT Department at  866.459.4599.

Re: Multi-family bulk load/update..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
>> It doesn't currently, though it should be pretty easy to improve it.
It looked like it: I guess we would need to coordinate across Hfiles belonging to different column families..

>> If you would like to work on it, though, feel free!
Right now, we don't have the time but we will let you guys know if we are going to :)

Cheers,
Vidhya

On 6/1/10 3:57 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

On Tue, Jun 1, 2010 at 3:47 PM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>  I just noticed you guys have submitted a patch on bulk incremental upload.
> Sweet.
>
>  Can you let me know if the patch supports multi-family bulk updates? (It
> looks like it does not, wanted to check anyways)..
>

It doesn't currently, though it should be pretty easy to improve it. I hope
to do that in the coming weeks after this first cut is committed. If you
would like to work on it, though, feel free!

Thanks
-Todd



--
Todd Lipcon
Software Engineer, Cloudera


Re: Multi-family bulk load/update..

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Jun 1, 2010 at 3:47 PM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>  I just noticed you guys have submitted a patch on bulk incremental upload.
> Sweet.
>
>  Can you let me know if the patch supports multi-family bulk updates? (It
> looks like it does not, wanted to check anyways)..
>

It doesn't currently, though it should be pretty easy to improve it. I hope
to do that in the coming weeks after this first cut is committed. If you
would like to work on it, though, feel free!

Thanks
-Todd



-- 
Todd Lipcon
Software Engineer, Cloudera

Multi-family bulk load/update..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
Hi
  I just noticed you guys have submitted a patch on bulk incremental upload. Sweet.

  Can you let me know if the patch supports multi-family bulk updates? (It looks like it does not, wanted to check anyways)..

Thanks
Vidhya

Re: Weird problem on one of my clouds...

Posted by Todd Lipcon <to...@cloudera.com>.
Who wants to volunteer to write the "sanity check DNS across my cluster"
tool?

-Todd

On Tue, Jun 1, 2010 at 12:01 PM, Michael Segel <mi...@hotmail.com>wrote:

>
>
> Actually within 5 mins of sending out this puzzle, I found the answer...
>
> In the machine's /etc/host, there wasn't an entry for the machine's IP
> address, only the loopback local host.
> When we added the entry in to those machines, everything went Ok.
>
> I didn't check the logs, but I bet that you are correct.
>
> Thanks!
>
> -Mike
>
> > Date: Tue, 1 Jun 2010 11:56:54 -0700
> > Subject: Re: Weird problem on one of my clouds...
> > From: stack@duboce.net
> > To: user@hbase.apache.org
> >
> > On Tue, Jun 1, 2010 at 11:13 AM, Michael Segel
> > <mi...@hotmail.com> wrote:
> > > Ok... when I try tools zk_dump I get the following:
> > >  Region servers:
> > >    - 127.0.0.1:60020
> > >    - 10.8.239.89:60020
> > >    - 127.0.0.1:60020
> > >    - 10.8.239.87:60020
> > >    - 10.8.239.95:60020
> > >    - 127.0.0.1:60020
> > >    - 10.8.239.93:60020
> > >    - 10.8.239.91:60020
> > >    - 127.0.0.1:60020
> > >    - 127.0.0.1:60020
> > >    - 127.0.0.1:60020
> > >
> > >
> > > This doesn't make sense.
> > > Why would some of my region servers show up saying that they are local
> hosts?
> >
> > Because thats what its getting for its machine name.  See the head of
> > HRegionServer around #242 where it does this to figure out the whoami:
> >
> >     machineName = DNS.getDefaultHost(
> >         conf.get("hbase.regionserver.dns.interface","default"),
> >         conf.get("hbase.regionserver.dns.nameserver","default"));
> >
> > > All of the servers have the same set of configuration files.
> > >
> > If you look in the regionserver log, is it saying that its registered
> > on 127.0.0.1:60020?
> >
> > St.Ack
>
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3




-- 
Todd Lipcon
Software Engineer, Cloudera

RE: Weird problem on one of my clouds...

Posted by Michael Segel <mi...@hotmail.com>.

Actually within 5 mins of sending out this puzzle, I found the answer...

In the machine's /etc/host, there wasn't an entry for the machine's IP address, only the loopback local host.
When we added the entry in to those machines, everything went Ok.

I didn't check the logs, but I bet that you are correct.

Thanks!

-Mike

> Date: Tue, 1 Jun 2010 11:56:54 -0700
> Subject: Re: Weird problem on one of my clouds...
> From: stack@duboce.net
> To: user@hbase.apache.org
> 
> On Tue, Jun 1, 2010 at 11:13 AM, Michael Segel
> <mi...@hotmail.com> wrote:
> > Ok... when I try tools zk_dump I get the following:
> >  Region servers:
> >    - 127.0.0.1:60020
> >    - 10.8.239.89:60020
> >    - 127.0.0.1:60020
> >    - 10.8.239.87:60020
> >    - 10.8.239.95:60020
> >    - 127.0.0.1:60020
> >    - 10.8.239.93:60020
> >    - 10.8.239.91:60020
> >    - 127.0.0.1:60020
> >    - 127.0.0.1:60020
> >    - 127.0.0.1:60020
> >
> >
> > This doesn't make sense.
> > Why would some of my region servers show up saying that they are local hosts?
> 
> Because thats what its getting for its machine name.  See the head of
> HRegionServer around #242 where it does this to figure out the whoami:
> 
>     machineName = DNS.getDefaultHost(
>         conf.get("hbase.regionserver.dns.interface","default"),
>         conf.get("hbase.regionserver.dns.nameserver","default"));
> 
> > All of the servers have the same set of configuration files.
> >
> If you look in the regionserver log, is it saying that its registered
> on 127.0.0.1:60020?
> 
> St.Ack
 		 	   		  
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Re: Weird problem on one of my clouds...

Posted by Stack <st...@duboce.net>.
On Tue, Jun 1, 2010 at 11:13 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Ok... when I try tools zk_dump I get the following:
>  Region servers:
>    - 127.0.0.1:60020
>    - 10.8.239.89:60020
>    - 127.0.0.1:60020
>    - 10.8.239.87:60020
>    - 10.8.239.95:60020
>    - 127.0.0.1:60020
>    - 10.8.239.93:60020
>    - 10.8.239.91:60020
>    - 127.0.0.1:60020
>    - 127.0.0.1:60020
>    - 127.0.0.1:60020
>
>
> This doesn't make sense.
> Why would some of my region servers show up saying that they are local hosts?

Because thats what its getting for its machine name.  See the head of
HRegionServer around #242 where it does this to figure out the whoami:

    machineName = DNS.getDefaultHost(
        conf.get("hbase.regionserver.dns.interface","default"),
        conf.get("hbase.regionserver.dns.nameserver","default"));

> All of the servers have the same set of configuration files.
>
If you look in the regionserver log, is it saying that its registered
on 127.0.0.1:60020?

St.Ack