You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com> on 2015/05/22 19:17:59 UTC

DNS mismatch between master and regionserver causes doubly registered regionservers

In our system each server has 2 dns associated with it, one always points
to a private address and the other to public or private depending on the
context.

This issue did not show up in 0.94.x, but is showing up on my new 1.x
cluster.  Basically it goes like this:

1. Regionserver starts up, get's its hostname which returns
`hostA.external` due to our /etc/hosts
2. Regionserver registers itself in zookeeper as `hostA.external`
3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
and returns `hostA.internal`.
4. HMaster registers server as `hostA.internal`
5. Regionserver receives the RegionServerStartupResponse, which contains
`hostA.internal` and uses that for its RPCs
6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
regionserver that hasn't checked in yet, and registers it.

So I think the problem is that step #2 happens before step #5.  You can
clearly see this in the HRegionServer.java run() function.

In 0.94, the `createMyEphemeralNode` function was called within
`handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
`handleReportForDutyResponse`.


I can work around this by handling /etc/hosts specially for my
regionservers.  We have our /etc/hosts file set up like this for a reason,
but I think I can special case regionservers.

However, it seems like a bug that there are mechanisms built in for the
HMaster to determine the RegionServer hostname, but that these mechanisms
do not account for doubly-registered regionservers due to zookeeper and
hmaster mismatch.

I tried to create a JIRA for this, but either my username no longer has
permissions for creating, or I can't find the place to create them
anymore.  Any help?
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Thank you guys for the help. I'm reading through the comments now to try to
get a handle on why this changed.

Looking forward to seeing HBASE-12954 in CDH5.4.3

On Fri, May 22, 2015 at 4:02 PM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Correct, but settings hbase.regionserver.hostname should be enough if I
> remember correctly, also you need to define hbase.master.hostname if you
> are using HBase 1.1
>
> cheers,
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Fri, May 22, 2015 at 12:55 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Thanks Esteban.  So the idea is you set hbase.master.dns.* on the master
> > side, and hbase.regionserver.hostname to a value matching what the master
> > DNS server would return on the regionserver side?
> >
> > On Fri, May 22, 2015 at 3:51 PM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Hi Bryan,
> > >
> > > The HBase 1.0.0 release won't honor the hbase.master.dns.* settings so
> > that
> > > might be causing the issue you mention. In 1.1 however it does have
> > > HBASE-12954 so you can solve the problem also by specifying a custom
> > > hostname.
> > >
> > > thanks,
> > > esteban.
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > Thank you Esteban.  I checked two different versions:
> > > >
> > > > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > > > - hbase-1.1.0.1 (just wanted to check the latest release)
> > > >
> > > > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <
> > esteban@cloudera.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Bryan,
> > > > >
> > > > > could you please be more specific about the 1.x version that you
> are
> > > > using?
> > > > > we have  HBASE-13481 and HBASE-12954 so it depends on which version
> > of
> > > > 1.x
> > > > > you are using.
> > > > >
> > > > > Regarding your account issue, I have created an INFRA JIRA on your
> > > behalf
> > > > > to look into your account problem.
> > > > >
> > > > > thanks,
> > > > > esteban.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cloudera, Inc.
> > > > >
> > > > >
> > > > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > > > In our system each server has 2 dns associated with it, one
> always
> > > > points
> > > > > > to a private address and the other to public or private depending
> > on
> > > > the
> > > > > > context.
> > > > > >
> > > > > > This issue did not show up in 0.94.x, but is showing up on my new
> > 1.x
> > > > > > cluster.  Basically it goes like this:
> > > > > >
> > > > > > 1. Regionserver starts up, get's its hostname which returns
> > > > > > `hostA.external` due to our /etc/hosts
> > > > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > > > 3. Regionserver reports for duty in to HMaster, which re-resolves
> > the
> > > > DNS
> > > > > > and returns `hostA.internal`.
> > > > > > 4. HMaster registers server as `hostA.internal`
> > > > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > > > contains
> > > > > > `hostA.internal` and uses that for its RPCs
> > > > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > > > regionserver that hasn't checked in yet, and registers it.
> > > > > >
> > > > > > So I think the problem is that step #2 happens before step #5.
> You
> > > can
> > > > > > clearly see this in the HRegionServer.java run() function.
> > > > > >
> > > > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > > > BEFORE
> > > > > > `handleReportForDutyResponse`.
> > > > > >
> > > > > >
> > > > > > I can work around this by handling /etc/hosts specially for my
> > > > > > regionservers.  We have our /etc/hosts file set up like this for
> a
> > > > > reason,
> > > > > > but I think I can special case regionservers.
> > > > > >
> > > > > > However, it seems like a bug that there are mechanisms built in
> for
> > > the
> > > > > > HMaster to determine the RegionServer hostname, but that these
> > > > mechanisms
> > > > > > do not account for doubly-registered regionservers due to
> zookeeper
> > > and
> > > > > > hmaster mismatch.
> > > > > >
> > > > > > I tried to create a JIRA for this, but either my username no
> longer
> > > has
> > > > > > permissions for creating, or I can't find the place to create
> them
> > > > > > anymore.  Any help?
> > > > > >
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Esteban Gutierrez <es...@cloudera.com>.
Correct, but settings hbase.regionserver.hostname should be enough if I
remember correctly, also you need to define hbase.master.hostname if you
are using HBase 1.1

cheers,
esteban.

--
Cloudera, Inc.


On Fri, May 22, 2015 at 12:55 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Thanks Esteban.  So the idea is you set hbase.master.dns.* on the master
> side, and hbase.regionserver.hostname to a value matching what the master
> DNS server would return on the regionserver side?
>
> On Fri, May 22, 2015 at 3:51 PM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Hi Bryan,
> >
> > The HBase 1.0.0 release won't honor the hbase.master.dns.* settings so
> that
> > might be causing the issue you mention. In 1.1 however it does have
> > HBASE-12954 so you can solve the problem also by specifying a custom
> > hostname.
> >
> > thanks,
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > Thank you Esteban.  I checked two different versions:
> > >
> > > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > > - hbase-1.1.0.1 (just wanted to check the latest release)
> > >
> > > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <
> esteban@cloudera.com
> > >
> > > wrote:
> > >
> > > > Hi Bryan,
> > > >
> > > > could you please be more specific about the 1.x version that you are
> > > using?
> > > > we have  HBASE-13481 and HBASE-12954 so it depends on which version
> of
> > > 1.x
> > > > you are using.
> > > >
> > > > Regarding your account issue, I have created an INFRA JIRA on your
> > behalf
> > > > to look into your account problem.
> > > >
> > > > thanks,
> > > > esteban.
> > > >
> > > >
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > > > In our system each server has 2 dns associated with it, one always
> > > points
> > > > > to a private address and the other to public or private depending
> on
> > > the
> > > > > context.
> > > > >
> > > > > This issue did not show up in 0.94.x, but is showing up on my new
> 1.x
> > > > > cluster.  Basically it goes like this:
> > > > >
> > > > > 1. Regionserver starts up, get's its hostname which returns
> > > > > `hostA.external` due to our /etc/hosts
> > > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > > 3. Regionserver reports for duty in to HMaster, which re-resolves
> the
> > > DNS
> > > > > and returns `hostA.internal`.
> > > > > 4. HMaster registers server as `hostA.internal`
> > > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > > contains
> > > > > `hostA.internal` and uses that for its RPCs
> > > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > > regionserver that hasn't checked in yet, and registers it.
> > > > >
> > > > > So I think the problem is that step #2 happens before step #5.  You
> > can
> > > > > clearly see this in the HRegionServer.java run() function.
> > > > >
> > > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > > BEFORE
> > > > > `handleReportForDutyResponse`.
> > > > >
> > > > >
> > > > > I can work around this by handling /etc/hosts specially for my
> > > > > regionservers.  We have our /etc/hosts file set up like this for a
> > > > reason,
> > > > > but I think I can special case regionservers.
> > > > >
> > > > > However, it seems like a bug that there are mechanisms built in for
> > the
> > > > > HMaster to determine the RegionServer hostname, but that these
> > > mechanisms
> > > > > do not account for doubly-registered regionservers due to zookeeper
> > and
> > > > > hmaster mismatch.
> > > > >
> > > > > I tried to create a JIRA for this, but either my username no longer
> > has
> > > > > permissions for creating, or I can't find the place to create them
> > > > > anymore.  Any help?
> > > > >
> > >
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > > >
> > > >
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Ted Yu <yu...@gmail.com>.
Bryan:
HBASE-12954 introduced config for region server hostname.

The following added config for master hostname:
HBASE-13481 Master should respect master (old) DNS/bind related
configurations

I will link the above JIRA to HBASE-12954

Cheers

On Fri, May 22, 2015 at 12:55 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Thanks Esteban.  So the idea is you set hbase.master.dns.* on the master
> side, and hbase.regionserver.hostname to a value matching what the master
> DNS server would return on the regionserver side?
>
> On Fri, May 22, 2015 at 3:51 PM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Hi Bryan,
> >
> > The HBase 1.0.0 release won't honor the hbase.master.dns.* settings so
> that
> > might be causing the issue you mention. In 1.1 however it does have
> > HBASE-12954 so you can solve the problem also by specifying a custom
> > hostname.
> >
> > thanks,
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > Thank you Esteban.  I checked two different versions:
> > >
> > > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > > - hbase-1.1.0.1 (just wanted to check the latest release)
> > >
> > > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <
> esteban@cloudera.com
> > >
> > > wrote:
> > >
> > > > Hi Bryan,
> > > >
> > > > could you please be more specific about the 1.x version that you are
> > > using?
> > > > we have  HBASE-13481 and HBASE-12954 so it depends on which version
> of
> > > 1.x
> > > > you are using.
> > > >
> > > > Regarding your account issue, I have created an INFRA JIRA on your
> > behalf
> > > > to look into your account problem.
> > > >
> > > > thanks,
> > > > esteban.
> > > >
> > > >
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > > > In our system each server has 2 dns associated with it, one always
> > > points
> > > > > to a private address and the other to public or private depending
> on
> > > the
> > > > > context.
> > > > >
> > > > > This issue did not show up in 0.94.x, but is showing up on my new
> 1.x
> > > > > cluster.  Basically it goes like this:
> > > > >
> > > > > 1. Regionserver starts up, get's its hostname which returns
> > > > > `hostA.external` due to our /etc/hosts
> > > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > > 3. Regionserver reports for duty in to HMaster, which re-resolves
> the
> > > DNS
> > > > > and returns `hostA.internal`.
> > > > > 4. HMaster registers server as `hostA.internal`
> > > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > > contains
> > > > > `hostA.internal` and uses that for its RPCs
> > > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > > regionserver that hasn't checked in yet, and registers it.
> > > > >
> > > > > So I think the problem is that step #2 happens before step #5.  You
> > can
> > > > > clearly see this in the HRegionServer.java run() function.
> > > > >
> > > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > > BEFORE
> > > > > `handleReportForDutyResponse`.
> > > > >
> > > > >
> > > > > I can work around this by handling /etc/hosts specially for my
> > > > > regionservers.  We have our /etc/hosts file set up like this for a
> > > > reason,
> > > > > but I think I can special case regionservers.
> > > > >
> > > > > However, it seems like a bug that there are mechanisms built in for
> > the
> > > > > HMaster to determine the RegionServer hostname, but that these
> > > mechanisms
> > > > > do not account for doubly-registered regionservers due to zookeeper
> > and
> > > > > hmaster mismatch.
> > > > >
> > > > > I tried to create a JIRA for this, but either my username no longer
> > has
> > > > > permissions for creating, or I can't find the place to create them
> > > > > anymore.  Any help?
> > > > >
> > >
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > > >
> > > >
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Thanks Esteban.  So the idea is you set hbase.master.dns.* on the master
side, and hbase.regionserver.hostname to a value matching what the master
DNS server would return on the regionserver side?

On Fri, May 22, 2015 at 3:51 PM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Hi Bryan,
>
> The HBase 1.0.0 release won't honor the hbase.master.dns.* settings so that
> might be causing the issue you mention. In 1.1 however it does have
> HBASE-12954 so you can solve the problem also by specifying a custom
> hostname.
>
> thanks,
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Thank you Esteban.  I checked two different versions:
> >
> > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > - hbase-1.1.0.1 (just wanted to check the latest release)
> >
> > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Hi Bryan,
> > >
> > > could you please be more specific about the 1.x version that you are
> > using?
> > > we have  HBASE-13481 and HBASE-12954 so it depends on which version of
> > 1.x
> > > you are using.
> > >
> > > Regarding your account issue, I have created an INFRA JIRA on your
> behalf
> > > to look into your account problem.
> > >
> > > thanks,
> > > esteban.
> > >
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > In our system each server has 2 dns associated with it, one always
> > points
> > > > to a private address and the other to public or private depending on
> > the
> > > > context.
> > > >
> > > > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > > > cluster.  Basically it goes like this:
> > > >
> > > > 1. Regionserver starts up, get's its hostname which returns
> > > > `hostA.external` due to our /etc/hosts
> > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > 3. Regionserver reports for duty in to HMaster, which re-resolves the
> > DNS
> > > > and returns `hostA.internal`.
> > > > 4. HMaster registers server as `hostA.internal`
> > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > contains
> > > > `hostA.internal` and uses that for its RPCs
> > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > regionserver that hasn't checked in yet, and registers it.
> > > >
> > > > So I think the problem is that step #2 happens before step #5.  You
> can
> > > > clearly see this in the HRegionServer.java run() function.
> > > >
> > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > BEFORE
> > > > `handleReportForDutyResponse`.
> > > >
> > > >
> > > > I can work around this by handling /etc/hosts specially for my
> > > > regionservers.  We have our /etc/hosts file set up like this for a
> > > reason,
> > > > but I think I can special case regionservers.
> > > >
> > > > However, it seems like a bug that there are mechanisms built in for
> the
> > > > HMaster to determine the RegionServer hostname, but that these
> > mechanisms
> > > > do not account for doubly-registered regionservers due to zookeeper
> and
> > > > hmaster mismatch.
> > > >
> > > > I tried to create a JIRA for this, but either my username no longer
> has
> > > > permissions for creating, or I can't find the place to create them
> > > > anymore.  Any help?
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > >
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Bryan,

The HBase 1.0.0 release won't honor the hbase.master.dns.* settings so that
might be causing the issue you mention. In 1.1 however it does have
HBASE-12954 so you can solve the problem also by specifying a custom
hostname.

thanks,
esteban.

--
Cloudera, Inc.


On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Thank you Esteban.  I checked two different versions:
>
> - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> - hbase-1.1.0.1 (just wanted to check the latest release)
>
> On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Hi Bryan,
> >
> > could you please be more specific about the 1.x version that you are
> using?
> > we have  HBASE-13481 and HBASE-12954 so it depends on which version of
> 1.x
> > you are using.
> >
> > Regarding your account issue, I have created an INFRA JIRA on your behalf
> > to look into your account problem.
> >
> > thanks,
> > esteban.
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > In our system each server has 2 dns associated with it, one always
> points
> > > to a private address and the other to public or private depending on
> the
> > > context.
> > >
> > > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > > cluster.  Basically it goes like this:
> > >
> > > 1. Regionserver starts up, get's its hostname which returns
> > > `hostA.external` due to our /etc/hosts
> > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > 3. Regionserver reports for duty in to HMaster, which re-resolves the
> DNS
> > > and returns `hostA.internal`.
> > > 4. HMaster registers server as `hostA.internal`
> > > 5. Regionserver receives the RegionServerStartupResponse, which
> contains
> > > `hostA.internal` and uses that for its RPCs
> > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > regionserver that hasn't checked in yet, and registers it.
> > >
> > > So I think the problem is that step #2 happens before step #5.  You can
> > > clearly see this in the HRegionServer.java run() function.
> > >
> > > In 0.94, the `createMyEphemeralNode` function was called within
> > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> BEFORE
> > > `handleReportForDutyResponse`.
> > >
> > >
> > > I can work around this by handling /etc/hosts specially for my
> > > regionservers.  We have our /etc/hosts file set up like this for a
> > reason,
> > > but I think I can special case regionservers.
> > >
> > > However, it seems like a bug that there are mechanisms built in for the
> > > HMaster to determine the RegionServer hostname, but that these
> mechanisms
> > > do not account for doubly-registered regionservers due to zookeeper and
> > > hmaster mismatch.
> > >
> > > I tried to create a JIRA for this, but either my username no longer has
> > > permissions for creating, or I can't find the place to create them
> > > anymore.  Any help?
> > >
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Bryan Beaudreault <bb...@hubspot.com>.
HBASE-12954 looks like it would solve my issue, but is not in cdh5.4.0.  I
also don't think it fixes what I think the real bug is -- it's more of a
workaround.

In terms of the actual bug, I think one of at least two possible solutions
should be considered:

1. Remove the support for hostnameFromMasterPOV in
HRegionServer#handleReportForDutyResponse

2. Move HRegionServer#createMyEphemeralNode in HRegionServer#run to *after* the
call to HRegionServer#handleReportForDutyResponse.  This way, any new
hostname returned by the HMaster would be reflected in the ZNodes created
in createMyEphemeralNode.

The latter seems like the better fix, since it doesn't remove any
functionality.  Of course, there might be historical reasons for this
ordering that I am not aware.

bq. To my knowledge, latest release was 1.1.0. The release before that was
1.0.1

I went to http://www.apache.org/dyn/closer.cgi/hbase/, chose a mirror, and
chose the latest. http://mirror.metrocast.net/apache/hbase/

I just verified that the snippets of the run() function I've referred to
are pretty much identical in the latest stable,
http://mirror.metrocast.net/apache/hbase/stable/ (1.0.1.1) as well

On Fri, May 22, 2015 at 3:34 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. hbase-1.1.0.1
>
> To my knowledge, latest release was 1.1.0. The release before that was
> 1.0.1
>
> Can you clarify ?
>
> Thanks
>
> On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Thank you Esteban.  I checked two different versions:
> >
> > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > - hbase-1.1.0.1 (just wanted to check the latest release)
> >
> > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Hi Bryan,
> > >
> > > could you please be more specific about the 1.x version that you are
> > using?
> > > we have  HBASE-13481 and HBASE-12954 so it depends on which version of
> > 1.x
> > > you are using.
> > >
> > > Regarding your account issue, I have created an INFRA JIRA on your
> behalf
> > > to look into your account problem.
> > >
> > > thanks,
> > > esteban.
> > >
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > In our system each server has 2 dns associated with it, one always
> > points
> > > > to a private address and the other to public or private depending on
> > the
> > > > context.
> > > >
> > > > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > > > cluster.  Basically it goes like this:
> > > >
> > > > 1. Regionserver starts up, get's its hostname which returns
> > > > `hostA.external` due to our /etc/hosts
> > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > 3. Regionserver reports for duty in to HMaster, which re-resolves the
> > DNS
> > > > and returns `hostA.internal`.
> > > > 4. HMaster registers server as `hostA.internal`
> > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > contains
> > > > `hostA.internal` and uses that for its RPCs
> > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > regionserver that hasn't checked in yet, and registers it.
> > > >
> > > > So I think the problem is that step #2 happens before step #5.  You
> can
> > > > clearly see this in the HRegionServer.java run() function.
> > > >
> > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > BEFORE
> > > > `handleReportForDutyResponse`.
> > > >
> > > >
> > > > I can work around this by handling /etc/hosts specially for my
> > > > regionservers.  We have our /etc/hosts file set up like this for a
> > > reason,
> > > > but I think I can special case regionservers.
> > > >
> > > > However, it seems like a bug that there are mechanisms built in for
> the
> > > > HMaster to determine the RegionServer hostname, but that these
> > mechanisms
> > > > do not account for doubly-registered regionservers due to zookeeper
> and
> > > > hmaster mismatch.
> > > >
> > > > I tried to create a JIRA for this, but either my username no longer
> has
> > > > permissions for creating, or I can't find the place to create them
> > > > anymore.  Any help?
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > >
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Andrew Purtell <ap...@apache.org>.
There's always some delay between when release artifacts are sent onward to
the mirrors and when the announcements go out, for various reasons. We made
three patch releases this week. I have it on good authority the
announcements for the current crop of releases will go out this coming
Monday.


On Fri, May 22, 2015 at 1:25 PM, Sean Busbey <bu...@cloudera.com> wrote:

> On Fri, May 22, 2015 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. hbase-1.1.0.1
> >
> > To my knowledge, latest release was 1.1.0. The release before that was
> > 1.0.1
> >
> > Can you clarify ?
> >
> > Thanks
> >
> >
> The 1.1.0.1 release votes all passed. I don't think the announcement has
> gone out yet because we were waiting for mirror propagation, but the
> artifacts likely show up places.
>
> --
> Sean

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Sean Busbey <bu...@cloudera.com>.
On Fri, May 22, 2015 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. hbase-1.1.0.1
>
> To my knowledge, latest release was 1.1.0. The release before that was
> 1.0.1
>
> Can you clarify ?
>
> Thanks
>
>
The 1.1.0.1 release votes all passed. I don't think the announcement has
gone out yet because we were waiting for mirror propagation, but the
artifacts likely show up places.

-- 
Sean

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Ted Yu <yu...@gmail.com>.
bq. hbase-1.1.0.1

To my knowledge, latest release was 1.1.0. The release before that was 1.0.1

Can you clarify ?

Thanks

On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Thank you Esteban.  I checked two different versions:
>
> - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> - hbase-1.1.0.1 (just wanted to check the latest release)
>
> On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Hi Bryan,
> >
> > could you please be more specific about the 1.x version that you are
> using?
> > we have  HBASE-13481 and HBASE-12954 so it depends on which version of
> 1.x
> > you are using.
> >
> > Regarding your account issue, I have created an INFRA JIRA on your behalf
> > to look into your account problem.
> >
> > thanks,
> > esteban.
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > In our system each server has 2 dns associated with it, one always
> points
> > > to a private address and the other to public or private depending on
> the
> > > context.
> > >
> > > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > > cluster.  Basically it goes like this:
> > >
> > > 1. Regionserver starts up, get's its hostname which returns
> > > `hostA.external` due to our /etc/hosts
> > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > 3. Regionserver reports for duty in to HMaster, which re-resolves the
> DNS
> > > and returns `hostA.internal`.
> > > 4. HMaster registers server as `hostA.internal`
> > > 5. Regionserver receives the RegionServerStartupResponse, which
> contains
> > > `hostA.internal` and uses that for its RPCs
> > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > regionserver that hasn't checked in yet, and registers it.
> > >
> > > So I think the problem is that step #2 happens before step #5.  You can
> > > clearly see this in the HRegionServer.java run() function.
> > >
> > > In 0.94, the `createMyEphemeralNode` function was called within
> > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> BEFORE
> > > `handleReportForDutyResponse`.
> > >
> > >
> > > I can work around this by handling /etc/hosts specially for my
> > > regionservers.  We have our /etc/hosts file set up like this for a
> > reason,
> > > but I think I can special case regionservers.
> > >
> > > However, it seems like a bug that there are mechanisms built in for the
> > > HMaster to determine the RegionServer hostname, but that these
> mechanisms
> > > do not account for doubly-registered regionservers due to zookeeper and
> > > hmaster mismatch.
> > >
> > > I tried to create a JIRA for this, but either my username no longer has
> > > permissions for creating, or I can't find the place to create them
> > > anymore.  Any help?
> > >
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > >
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Thank you Esteban.  I checked two different versions:

- hbase-1.0.0-cdh5.4.0 (this is the version I use)
- hbase-1.1.0.1 (just wanted to check the latest release)

On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Hi Bryan,
>
> could you please be more specific about the 1.x version that you are using?
> we have  HBASE-13481 and HBASE-12954 so it depends on which version of 1.x
> you are using.
>
> Regarding your account issue, I have created an INFRA JIRA on your behalf
> to look into your account problem.
>
> thanks,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
> On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > In our system each server has 2 dns associated with it, one always points
> > to a private address and the other to public or private depending on the
> > context.
> >
> > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > cluster.  Basically it goes like this:
> >
> > 1. Regionserver starts up, get's its hostname which returns
> > `hostA.external` due to our /etc/hosts
> > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
> > and returns `hostA.internal`.
> > 4. HMaster registers server as `hostA.internal`
> > 5. Regionserver receives the RegionServerStartupResponse, which contains
> > `hostA.internal` and uses that for its RPCs
> > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > regionserver that hasn't checked in yet, and registers it.
> >
> > So I think the problem is that step #2 happens before step #5.  You can
> > clearly see this in the HRegionServer.java run() function.
> >
> > In 0.94, the `createMyEphemeralNode` function was called within
> > `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
> > `handleReportForDutyResponse`.
> >
> >
> > I can work around this by handling /etc/hosts specially for my
> > regionservers.  We have our /etc/hosts file set up like this for a
> reason,
> > but I think I can special case regionservers.
> >
> > However, it seems like a bug that there are mechanisms built in for the
> > HMaster to determine the RegionServer hostname, but that these mechanisms
> > do not account for doubly-registered regionservers due to zookeeper and
> > hmaster mismatch.
> >
> > I tried to create a JIRA for this, but either my username no longer has
> > permissions for creating, or I can't find the place to create them
> > anymore.  Any help?
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> >
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Bryan,

could you please be more specific about the 1.x version that you are using?
we have  HBASE-13481 and HBASE-12954 so it depends on which version of 1.x
you are using.

Regarding your account issue, I have created an INFRA JIRA on your behalf
to look into your account problem.

thanks,
esteban.



--
Cloudera, Inc.


On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> In our system each server has 2 dns associated with it, one always points
> to a private address and the other to public or private depending on the
> context.
>
> This issue did not show up in 0.94.x, but is showing up on my new 1.x
> cluster.  Basically it goes like this:
>
> 1. Regionserver starts up, get's its hostname which returns
> `hostA.external` due to our /etc/hosts
> 2. Regionserver registers itself in zookeeper as `hostA.external`
> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
> and returns `hostA.internal`.
> 4. HMaster registers server as `hostA.internal`
> 5. Regionserver receives the RegionServerStartupResponse, which contains
> `hostA.internal` and uses that for its RPCs
> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> regionserver that hasn't checked in yet, and registers it.
>
> So I think the problem is that step #2 happens before step #5.  You can
> clearly see this in the HRegionServer.java run() function.
>
> In 0.94, the `createMyEphemeralNode` function was called within
> `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
> `handleReportForDutyResponse`.
>
>
> I can work around this by handling /etc/hosts specially for my
> regionservers.  We have our /etc/hosts file set up like this for a reason,
> but I think I can special case regionservers.
>
> However, it seems like a bug that there are mechanisms built in for the
> HMaster to determine the RegionServer hostname, but that these mechanisms
> do not account for doubly-registered regionservers due to zookeeper and
> hmaster mismatch.
>
> I tried to create a JIRA for this, but either my username no longer has
> permissions for creating, or I can't find the place to create them
> anymore.  Any help?
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Stack <st...@duboce.net>.
On Fri, May 22, 2015 at 10:12 PM, Stack <st...@duboce.net> wrote:

> On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> In our system each server has 2 dns associated with it, one always points
>> to a private address and the other to public or private depending on the
>> context.
>>
>> This issue did not show up in 0.94.x, but is showing up on my new 1.x
>> cluster.  Basically it goes like this:
>>
>> 1. Regionserver starts up, get's its hostname which returns
>> `hostA.external` due to our /etc/hosts
>> 2. Regionserver registers itself in zookeeper as `hostA.external`
>> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
>> and returns `hostA.internal`.
>> 4. HMaster registers server as `hostA.internal`
>> 5. Regionserver receives the RegionServerStartupResponse, which contains
>> `hostA.internal` and uses that for its RPCs
>> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
>> regionserver that hasn't checked in yet, and registers it.
>>
>> So I think the problem is that step #2 happens before step #5.  You can
>> clearly see this in the HRegionServer.java run() function.
>>
>>
> Yes. Looks like a regression.
>
> commit 10d336a51d3a5a2694f1898e52afa01dc9dc1798
> Author: rajeshbabu <ra...@unknown>
> Date:   Thu Oct 24 18:26:42 2013 +0000
>
>     HBASE-9593 Region server left in online servers list forever if it
> went down after registering to master and before creating ephemeral node
>
>     git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1535479
> 13f79535-47bb-0310-9956-ffa450edef68
>
> Regionserver used to use the name given it by the master registering in zk
> and when it heartbeated the master. We arrived at this approach after lots
> of pain double registering regionservers because of disagreements in naming
> between cluster nodes. Above commit changed the order and seems to have
> broken this facility.
>
> Will open issue to fix....
>

HBASE-13753.
St.Ack



> St.Ack
>
>
>> In 0.94, the `createMyEphemeralNode` function was called within
>> `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
>> `handleReportForDutyResponse`.
>>
>>
>> I can work around this by handling /etc/hosts specially for my
>> regionservers.  We have our /etc/hosts file set up like this for a reason,
>> but I think I can special case regionservers.
>>
>> However, it seems like a bug that there are mechanisms built in for the
>> HMaster to determine the RegionServer hostname, but that these mechanisms
>> do not account for doubly-registered regionservers due to zookeeper and
>> hmaster mismatch.
>>
>> I tried to create a JIRA for this, but either my username no longer has
>> permissions for creating, or I can't find the place to create them
>> anymore.  Any help?
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
>>
>
>

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

Posted by Stack <st...@duboce.net>.
On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> In our system each server has 2 dns associated with it, one always points
> to a private address and the other to public or private depending on the
> context.
>
> This issue did not show up in 0.94.x, but is showing up on my new 1.x
> cluster.  Basically it goes like this:
>
> 1. Regionserver starts up, get's its hostname which returns
> `hostA.external` due to our /etc/hosts
> 2. Regionserver registers itself in zookeeper as `hostA.external`
> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
> and returns `hostA.internal`.
> 4. HMaster registers server as `hostA.internal`
> 5. Regionserver receives the RegionServerStartupResponse, which contains
> `hostA.internal` and uses that for its RPCs
> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> regionserver that hasn't checked in yet, and registers it.
>
> So I think the problem is that step #2 happens before step #5.  You can
> clearly see this in the HRegionServer.java run() function.
>
>
Yes. Looks like a regression.

commit 10d336a51d3a5a2694f1898e52afa01dc9dc1798
Author: rajeshbabu <ra...@unknown>
Date:   Thu Oct 24 18:26:42 2013 +0000

    HBASE-9593 Region server left in online servers list forever if it went
down after registering to master and before creating ephemeral node

    git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1535479
13f79535-47bb-0310-9956-ffa450edef68

Regionserver used to use the name given it by the master registering in zk
and when it heartbeated the master. We arrived at this approach after lots
of pain double registering regionservers because of disagreements in naming
between cluster nodes. Above commit changed the order and seems to have
broken this facility.

Will open issue to fix....

St.Ack


> In 0.94, the `createMyEphemeralNode` function was called within
> `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
> `handleReportForDutyResponse`.
>
>
> I can work around this by handling /etc/hosts specially for my
> regionservers.  We have our /etc/hosts file set up like this for a reason,
> but I think I can special case regionservers.
>
> However, it seems like a bug that there are mechanisms built in for the
> HMaster to determine the RegionServer hostname, but that these mechanisms
> do not account for doubly-registered regionservers due to zookeeper and
> hmaster mismatch.
>
> I tried to create a JIRA for this, but either my username no longer has
> permissions for creating, or I can't find the place to create them
> anymore.  Any help?
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
>