You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2014/01/28 04:26:28 UTC

Balancer switch runs causing problems

We are seeing one other issue with high read latency (p99 etc.) on one of
our read heavy hbase clusters which is correlated with the balancer runs -
every 5 minutes.

If there is no balancing to do, does the balancer only scan the table every
5 minutes - does it do anything on top of that if the regions are balanced ?

Varun

Re: Balancer switch runs causing problems

Posted by Stack <st...@duboce.net>.

/**
 * A janitor for the catalog tables.  Scans the <code>.META.</code> catalog
 * table on a period looking for unused regions to garbage collect.
 */
class CatalogJanitor extends Chore {
  private static final Log LOG =
LogFactory.getLog(CatalogJanitor.class.getName());
  private final Server server;
  private final MasterServices services;
  private boolean enabled = true;

  CatalogJanitor(final Server server, final MasterServices services) {
    super(server.getServerName() + "-CatalogJanitor",
      server.getConfiguration().getInt("hbase.catalogjanitor.interval",
300000),
....

Is it the above?

Enable RPC DEBUG-level logging on that server around the time that you see
the incidence and see if that helps give you a clue.

St.Ack



On Mon, Jan 27, 2014 at 8:56 PM, Varun Sharma <va...@pinterest.com> wrote:

> But continue to see reads on META - no idea why ?
>
>
> On Mon, Jan 27, 2014 at 8:52 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > We are not seeing any balancer related logs btw anymore...
> >
> >
> > On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Looking at the changes since release 0.94.7, I found:
> >>
> >> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case
> >> only
> >> when useCache is set to true)
> >> HBASE-8698 potential thread creation in MetaScanner.metaScan
> >>
> >> If possible, can you upgrade your cluster ?
> >>
> >> Cheers
> >>
> >>
> >> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >> > Do you see the following (from
> >> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
> >> >
> >> >             if (LOG.isDebugEnabled()) {
> >> >               LOG.debug("locateRegionInMeta parentTable=" +
> >> >                 Bytes.toString(parentTable) + ", metaLocation=" +
> >> >                 ((metaLocation == null)? "null": "{" + metaLocation +
> >> "}")
> >> > +
> >> >                 ", attempt=" + tries + " of " +
> >> >                 this.numRetries + " failed; retrying after sleep of "
> +
> >> >
> >> >
> >> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >> >
> >> >> Actually not sometimes but we are always seeing a large # of .META.
> >> reads
> >> >> every 5 minutes.
> >> >>
> >> >>
> >> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
> >> >> wrote:
> >> >>
> >> >> > The default one with 0.94.7... - I dont see any of those logs. Also
> >> we
> >> >> > turned off the balancer switch - but looks like sometimes we still
> >> see a
> >> >> > large number of requests to .META. table every 5 minutes.
> >> >> >
> >> >> > Varun
> >> >> >
> >> >> >
> >> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >> >
> >> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >> >> >>
> >> >> >>         for (RegionPlan plan: plans) {
> >> >> >>           LOG.info("balance " + plan);
> >> >> >>
> >> >> >> Do you see such log in master log ?
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <
> varun@pinterest.com>
> >> >> >> wrote:
> >> >> >>
> >> >> >> > We are seeing one other issue with high read latency (p99 etc.)
> on
> >> >> one
> >> >> >> of
> >> >> >> > our read heavy hbase clusters which is correlated with the
> >> balancer
> >> >> >> runs -
> >> >> >> > every 5 minutes.
> >> >> >> >
> >> >> >> > If there is no balancing to do, does the balancer only scan the
> >> table
> >> >> >> every
> >> >> >> > 5 minutes - does it do anything on top of that if the regions
> are
> >> >> >> balanced
> >> >> >> > ?
> >> >> >> >
> >> >> >> > Varun
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Stack <st...@duboce.net>.

/**
 * A janitor for the catalog tables.  Scans the <code>.META.</code> catalog
 * table on a period looking for unused regions to garbage collect.
 */
class CatalogJanitor extends Chore {
  private static final Log LOG =
LogFactory.getLog(CatalogJanitor.class.getName());
  private final Server server;
  private final MasterServices services;
  private boolean enabled = true;

  CatalogJanitor(final Server server, final MasterServices services) {
    super(server.getServerName() + "-CatalogJanitor",
      server.getConfiguration().getInt("hbase.catalogjanitor.interval",
300000),
....

Is it the above?

Enable RPC DEBUG-level logging on that server around the time that you see
the incidence and see if that helps give you a clue.

St.Ack



On Mon, Jan 27, 2014 at 8:56 PM, Varun Sharma <va...@pinterest.com> wrote:

> But continue to see reads on META - no idea why ?
>
>
> On Mon, Jan 27, 2014 at 8:52 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > We are not seeing any balancer related logs btw anymore...
> >
> >
> > On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Looking at the changes since release 0.94.7, I found:
> >>
> >> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case
> >> only
> >> when useCache is set to true)
> >> HBASE-8698 potential thread creation in MetaScanner.metaScan
> >>
> >> If possible, can you upgrade your cluster ?
> >>
> >> Cheers
> >>
> >>
> >> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >> > Do you see the following (from
> >> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
> >> >
> >> >             if (LOG.isDebugEnabled()) {
> >> >               LOG.debug("locateRegionInMeta parentTable=" +
> >> >                 Bytes.toString(parentTable) + ", metaLocation=" +
> >> >                 ((metaLocation == null)? "null": "{" + metaLocation +
> >> "}")
> >> > +
> >> >                 ", attempt=" + tries + " of " +
> >> >                 this.numRetries + " failed; retrying after sleep of "
> +
> >> >
> >> >
> >> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >> >
> >> >> Actually not sometimes but we are always seeing a large # of .META.
> >> reads
> >> >> every 5 minutes.
> >> >>
> >> >>
> >> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
> >> >> wrote:
> >> >>
> >> >> > The default one with 0.94.7... - I dont see any of those logs. Also
> >> we
> >> >> > turned off the balancer switch - but looks like sometimes we still
> >> see a
> >> >> > large number of requests to .META. table every 5 minutes.
> >> >> >
> >> >> > Varun
> >> >> >
> >> >> >
> >> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >> >
> >> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >> >> >>
> >> >> >>         for (RegionPlan plan: plans) {
> >> >> >>           LOG.info("balance " + plan);
> >> >> >>
> >> >> >> Do you see such log in master log ?
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <
> varun@pinterest.com>
> >> >> >> wrote:
> >> >> >>
> >> >> >> > We are seeing one other issue with high read latency (p99 etc.)
> on
> >> >> one
> >> >> >> of
> >> >> >> > our read heavy hbase clusters which is correlated with the
> >> balancer
> >> >> >> runs -
> >> >> >> > every 5 minutes.
> >> >> >> >
> >> >> >> > If there is no balancing to do, does the balancer only scan the
> >> table
> >> >> >> every
> >> >> >> > 5 minutes - does it do anything on top of that if the regions
> are
> >> >> >> balanced
> >> >> >> > ?
> >> >> >> >
> >> >> >> > Varun
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

But continue to see reads on META - no idea why ?


On Mon, Jan 27, 2014 at 8:52 PM, Varun Sharma <va...@pinterest.com> wrote:

> We are not seeing any balancer related logs btw anymore...
>
>
> On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Looking at the changes since release 0.94.7, I found:
>>
>> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case
>> only
>> when useCache is set to true)
>> HBASE-8698 potential thread creation in MetaScanner.metaScan
>>
>> If possible, can you upgrade your cluster ?
>>
>> Cheers
>>
>>
>> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Do you see the following (from
>> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
>> >
>> >             if (LOG.isDebugEnabled()) {
>> >               LOG.debug("locateRegionInMeta parentTable=" +
>> >                 Bytes.toString(parentTable) + ", metaLocation=" +
>> >                 ((metaLocation == null)? "null": "{" + metaLocation +
>> "}")
>> > +
>> >                 ", attempt=" + tries + " of " +
>> >                 this.numRetries + " failed; retrying after sleep of " +
>> >
>> >
>> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>> >
>> >> Actually not sometimes but we are always seeing a large # of .META.
>> reads
>> >> every 5 minutes.
>> >>
>> >>
>> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
>> >> wrote:
>> >>
>> >> > The default one with 0.94.7... - I dont see any of those logs. Also
>> we
>> >> > turned off the balancer switch - but looks like sometimes we still
>> see a
>> >> > large number of requests to .META. table every 5 minutes.
>> >> >
>> >> > Varun
>> >> >
>> >> >
>> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>> >> >
>> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
>> >> >>
>> >> >>         for (RegionPlan plan: plans) {
>> >> >>           LOG.info("balance " + plan);
>> >> >>
>> >> >> Do you see such log in master log ?
>> >> >>
>> >> >>
>> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> >> >> wrote:
>> >> >>
>> >> >> > We are seeing one other issue with high read latency (p99 etc.) on
>> >> one
>> >> >> of
>> >> >> > our read heavy hbase clusters which is correlated with the
>> balancer
>> >> >> runs -
>> >> >> > every 5 minutes.
>> >> >> >
>> >> >> > If there is no balancing to do, does the balancer only scan the
>> table
>> >> >> every
>> >> >> > 5 minutes - does it do anything on top of that if the regions are
>> >> >> balanced
>> >> >> > ?
>> >> >> >
>> >> >> > Varun
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

But continue to see reads on META - no idea why ?


On Mon, Jan 27, 2014 at 8:52 PM, Varun Sharma <va...@pinterest.com> wrote:

> We are not seeing any balancer related logs btw anymore...
>
>
> On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Looking at the changes since release 0.94.7, I found:
>>
>> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case
>> only
>> when useCache is set to true)
>> HBASE-8698 potential thread creation in MetaScanner.metaScan
>>
>> If possible, can you upgrade your cluster ?
>>
>> Cheers
>>
>>
>> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Do you see the following (from
>> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
>> >
>> >             if (LOG.isDebugEnabled()) {
>> >               LOG.debug("locateRegionInMeta parentTable=" +
>> >                 Bytes.toString(parentTable) + ", metaLocation=" +
>> >                 ((metaLocation == null)? "null": "{" + metaLocation +
>> "}")
>> > +
>> >                 ", attempt=" + tries + " of " +
>> >                 this.numRetries + " failed; retrying after sleep of " +
>> >
>> >
>> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>> >
>> >> Actually not sometimes but we are always seeing a large # of .META.
>> reads
>> >> every 5 minutes.
>> >>
>> >>
>> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
>> >> wrote:
>> >>
>> >> > The default one with 0.94.7... - I dont see any of those logs. Also
>> we
>> >> > turned off the balancer switch - but looks like sometimes we still
>> see a
>> >> > large number of requests to .META. table every 5 minutes.
>> >> >
>> >> > Varun
>> >> >
>> >> >
>> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>> >> >
>> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
>> >> >>
>> >> >>         for (RegionPlan plan: plans) {
>> >> >>           LOG.info("balance " + plan);
>> >> >>
>> >> >> Do you see such log in master log ?
>> >> >>
>> >> >>
>> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> >> >> wrote:
>> >> >>
>> >> >> > We are seeing one other issue with high read latency (p99 etc.) on
>> >> one
>> >> >> of
>> >> >> > our read heavy hbase clusters which is correlated with the
>> balancer
>> >> >> runs -
>> >> >> > every 5 minutes.
>> >> >> >
>> >> >> > If there is no balancing to do, does the balancer only scan the
>> table
>> >> >> every
>> >> >> > 5 minutes - does it do anything on top of that if the regions are
>> >> >> balanced
>> >> >> > ?
>> >> >> >
>> >> >> > Varun
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

We are not seeing any balancer related logs btw anymore...


On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Looking at the changes since release 0.94.7, I found:
>
> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case only
> when useCache is set to true)
> HBASE-8698 potential thread creation in MetaScanner.metaScan
>
> If possible, can you upgrade your cluster ?
>
> Cheers
>
>
> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Do you see the following (from
> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
> >
> >             if (LOG.isDebugEnabled()) {
> >               LOG.debug("locateRegionInMeta parentTable=" +
> >                 Bytes.toString(parentTable) + ", metaLocation=" +
> >                 ((metaLocation == null)? "null": "{" + metaLocation +
> "}")
> > +
> >                 ", attempt=" + tries + " of " +
> >                 this.numRetries + " failed; retrying after sleep of " +
> >
> >
> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
> wrote:
> >
> >> Actually not sometimes but we are always seeing a large # of .META.
> reads
> >> every 5 minutes.
> >>
> >>
> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >>
> >> > The default one with 0.94.7... - I dont see any of those logs. Also we
> >> > turned off the balancer switch - but looks like sometimes we still
> see a
> >> > large number of requests to .META. table every 5 minutes.
> >> >
> >> > Varun
> >> >
> >> >
> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
> >> >
> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >> >>
> >> >>         for (RegionPlan plan: plans) {
> >> >>           LOG.info("balance " + plan);
> >> >>
> >> >> Do you see such log in master log ?
> >> >>
> >> >>
> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
> >> >> wrote:
> >> >>
> >> >> > We are seeing one other issue with high read latency (p99 etc.) on
> >> one
> >> >> of
> >> >> > our read heavy hbase clusters which is correlated with the balancer
> >> >> runs -
> >> >> > every 5 minutes.
> >> >> >
> >> >> > If there is no balancing to do, does the balancer only scan the
> table
> >> >> every
> >> >> > 5 minutes - does it do anything on top of that if the regions are
> >> >> balanced
> >> >> > ?
> >> >> >
> >> >> > Varun
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

We are not seeing any balancer related logs btw anymore...


On Mon, Jan 27, 2014 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Looking at the changes since release 0.94.7, I found:
>
> HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case only
> when useCache is set to true)
> HBASE-8698 potential thread creation in MetaScanner.metaScan
>
> If possible, can you upgrade your cluster ?
>
> Cheers
>
>
> On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Do you see the following (from
> > HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
> >
> >             if (LOG.isDebugEnabled()) {
> >               LOG.debug("locateRegionInMeta parentTable=" +
> >                 Bytes.toString(parentTable) + ", metaLocation=" +
> >                 ((metaLocation == null)? "null": "{" + metaLocation +
> "}")
> > +
> >                 ", attempt=" + tries + " of " +
> >                 this.numRetries + " failed; retrying after sleep of " +
> >
> >
> > On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com>
> wrote:
> >
> >> Actually not sometimes but we are always seeing a large # of .META.
> reads
> >> every 5 minutes.
> >>
> >>
> >> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >>
> >> > The default one with 0.94.7... - I dont see any of those logs. Also we
> >> > turned off the balancer switch - but looks like sometimes we still
> see a
> >> > large number of requests to .META. table every 5 minutes.
> >> >
> >> > Varun
> >> >
> >> >
> >> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
> >> >
> >> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >> >>
> >> >>         for (RegionPlan plan: plans) {
> >> >>           LOG.info("balance " + plan);
> >> >>
> >> >> Do you see such log in master log ?
> >> >>
> >> >>
> >> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
> >> >> wrote:
> >> >>
> >> >> > We are seeing one other issue with high read latency (p99 etc.) on
> >> one
> >> >> of
> >> >> > our read heavy hbase clusters which is correlated with the balancer
> >> >> runs -
> >> >> > every 5 minutes.
> >> >> >
> >> >> > If there is no balancing to do, does the balancer only scan the
> table
> >> >> every
> >> >> > 5 minutes - does it do anything on top of that if the regions are
> >> >> balanced
> >> >> > ?
> >> >> >
> >> >> > Varun
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

Looking at the changes since release 0.94.7, I found:

HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case only
when useCache is set to true)
HBASE-8698 potential thread creation in MetaScanner.metaScan

If possible, can you upgrade your cluster ?

Cheers


On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:

> Do you see the following (from
> HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
>
>             if (LOG.isDebugEnabled()) {
>               LOG.debug("locateRegionInMeta parentTable=" +
>                 Bytes.toString(parentTable) + ", metaLocation=" +
>                 ((metaLocation == null)? "null": "{" + metaLocation + "}")
> +
>                 ", attempt=" + tries + " of " +
>                 this.numRetries + " failed; retrying after sleep of " +
>
>
> On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Actually not sometimes but we are always seeing a large # of .META. reads
>> every 5 minutes.
>>
>>
>> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>>
>> > The default one with 0.94.7... - I dont see any of those logs. Also we
>> > turned off the balancer switch - but looks like sometimes we still see a
>> > large number of requests to .META. table every 5 minutes.
>> >
>> > Varun
>> >
>> >
>> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
>> >>
>> >>         for (RegionPlan plan: plans) {
>> >>           LOG.info("balance " + plan);
>> >>
>> >> Do you see such log in master log ?
>> >>
>> >>
>> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> >> wrote:
>> >>
>> >> > We are seeing one other issue with high read latency (p99 etc.) on
>> one
>> >> of
>> >> > our read heavy hbase clusters which is correlated with the balancer
>> >> runs -
>> >> > every 5 minutes.
>> >> >
>> >> > If there is no balancing to do, does the balancer only scan the table
>> >> every
>> >> > 5 minutes - does it do anything on top of that if the regions are
>> >> balanced
>> >> > ?
>> >> >
>> >> > Varun
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

Looking at the changes since release 0.94.7, I found:

HBASE-8655 Backport to 94 - HBASE-8346(Prefetching .META. rows in case only
when useCache is set to true)
HBASE-8698 potential thread creation in MetaScanner.metaScan

If possible, can you upgrade your cluster ?

Cheers


On Mon, Jan 27, 2014 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:

> Do you see the following (from
> HConnectionManager$HConnectionImplementation#locateRegionInMeta) ?
>
>             if (LOG.isDebugEnabled()) {
>               LOG.debug("locateRegionInMeta parentTable=" +
>                 Bytes.toString(parentTable) + ", metaLocation=" +
>                 ((metaLocation == null)? "null": "{" + metaLocation + "}")
> +
>                 ", attempt=" + tries + " of " +
>                 this.numRetries + " failed; retrying after sleep of " +
>
>
> On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Actually not sometimes but we are always seeing a large # of .META. reads
>> every 5 minutes.
>>
>>
>> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>>
>> > The default one with 0.94.7... - I dont see any of those logs. Also we
>> > turned off the balancer switch - but looks like sometimes we still see a
>> > large number of requests to .META. table every 5 minutes.
>> >
>> > Varun
>> >
>> >
>> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
>> >>
>> >>         for (RegionPlan plan: plans) {
>> >>           LOG.info("balance " + plan);
>> >>
>> >> Do you see such log in master log ?
>> >>
>> >>
>> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> >> wrote:
>> >>
>> >> > We are seeing one other issue with high read latency (p99 etc.) on
>> one
>> >> of
>> >> > our read heavy hbase clusters which is correlated with the balancer
>> >> runs -
>> >> > every 5 minutes.
>> >> >
>> >> > If there is no balancing to do, does the balancer only scan the table
>> >> every
>> >> > 5 minutes - does it do anything on top of that if the regions are
>> >> balanced
>> >> > ?
>> >> >
>> >> > Varun
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

Do you see the following (from HConnectionManager$HConnectionImplementation#
locateRegionInMeta) ?

            if (LOG.isDebugEnabled()) {
              LOG.debug("locateRegionInMeta parentTable=" +
                Bytes.toString(parentTable) + ", metaLocation=" +
                ((metaLocation == null)? "null": "{" + metaLocation + "}") +
                ", attempt=" + tries + " of " +
                this.numRetries + " failed; retrying after sleep of " +


On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com> wrote:

> Actually not sometimes but we are always seeing a large # of .META. reads
> every 5 minutes.
>
>
> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > The default one with 0.94.7... - I dont see any of those logs. Also we
> > turned off the balancer switch - but looks like sometimes we still see a
> > large number of requests to .META. table every 5 minutes.
> >
> > Varun
> >
> >
> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >>
> >>         for (RegionPlan plan: plans) {
> >>           LOG.info("balance " + plan);
> >>
> >> Do you see such log in master log ?
> >>
> >>
> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >>
> >> > We are seeing one other issue with high read latency (p99 etc.) on one
> >> of
> >> > our read heavy hbase clusters which is correlated with the balancer
> >> runs -
> >> > every 5 minutes.
> >> >
> >> > If there is no balancing to do, does the balancer only scan the table
> >> every
> >> > 5 minutes - does it do anything on top of that if the regions are
> >> balanced
> >> > ?
> >> >
> >> > Varun
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

Do you see the following (from HConnectionManager$HConnectionImplementation#
locateRegionInMeta) ?

            if (LOG.isDebugEnabled()) {
              LOG.debug("locateRegionInMeta parentTable=" +
                Bytes.toString(parentTable) + ", metaLocation=" +
                ((metaLocation == null)? "null": "{" + metaLocation + "}") +
                ", attempt=" + tries + " of " +
                this.numRetries + " failed; retrying after sleep of " +


On Mon, Jan 27, 2014 at 7:51 PM, Varun Sharma <va...@pinterest.com> wrote:

> Actually not sometimes but we are always seeing a large # of .META. reads
> every 5 minutes.
>
>
> On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > The default one with 0.94.7... - I dont see any of those logs. Also we
> > turned off the balancer switch - but looks like sometimes we still see a
> > large number of requests to .META. table every 5 minutes.
> >
> > Varun
> >
> >
> > On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> In HMaster#balance(), we have (same for 0.94 and 0.96):
> >>
> >>         for (RegionPlan plan: plans) {
> >>           LOG.info("balance " + plan);
> >>
> >> Do you see such log in master log ?
> >>
> >>
> >> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
> >> wrote:
> >>
> >> > We are seeing one other issue with high read latency (p99 etc.) on one
> >> of
> >> > our read heavy hbase clusters which is correlated with the balancer
> >> runs -
> >> > every 5 minutes.
> >> >
> >> > If there is no balancing to do, does the balancer only scan the table
> >> every
> >> > 5 minutes - does it do anything on top of that if the regions are
> >> balanced
> >> > ?
> >> >
> >> > Varun
> >> >
> >>
> >
> >
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

Actually not sometimes but we are always seeing a large # of .META. reads
every 5 minutes.


On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com> wrote:

> The default one with 0.94.7... - I dont see any of those logs. Also we
> turned off the balancer switch - but looks like sometimes we still see a
> large number of requests to .META. table every 5 minutes.
>
> Varun
>
>
> On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> In HMaster#balance(), we have (same for 0.94 and 0.96):
>>
>>         for (RegionPlan plan: plans) {
>>           LOG.info("balance " + plan);
>>
>> Do you see such log in master log ?
>>
>>
>> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>>
>> > We are seeing one other issue with high read latency (p99 etc.) on one
>> of
>> > our read heavy hbase clusters which is correlated with the balancer
>> runs -
>> > every 5 minutes.
>> >
>> > If there is no balancing to do, does the balancer only scan the table
>> every
>> > 5 minutes - does it do anything on top of that if the regions are
>> balanced
>> > ?
>> >
>> > Varun
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

Actually not sometimes but we are always seeing a large # of .META. reads
every 5 minutes.


On Mon, Jan 27, 2014 at 7:47 PM, Varun Sharma <va...@pinterest.com> wrote:

> The default one with 0.94.7... - I dont see any of those logs. Also we
> turned off the balancer switch - but looks like sometimes we still see a
> large number of requests to .META. table every 5 minutes.
>
> Varun
>
>
> On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> In HMaster#balance(), we have (same for 0.94 and 0.96):
>>
>>         for (RegionPlan plan: plans) {
>>           LOG.info("balance " + plan);
>>
>> Do you see such log in master log ?
>>
>>
>> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>>
>> > We are seeing one other issue with high read latency (p99 etc.) on one
>> of
>> > our read heavy hbase clusters which is correlated with the balancer
>> runs -
>> > every 5 minutes.
>> >
>> > If there is no balancing to do, does the balancer only scan the table
>> every
>> > 5 minutes - does it do anything on top of that if the regions are
>> balanced
>> > ?
>> >
>> > Varun
>> >
>>
>
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

The default one with 0.94.7... - I dont see any of those logs. Also we
turned off the balancer switch - but looks like sometimes we still see a
large number of requests to .META. table every 5 minutes.

Varun


On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:

> In HMaster#balance(), we have (same for 0.94 and 0.96):
>
>         for (RegionPlan plan: plans) {
>           LOG.info("balance " + plan);
>
> Do you see such log in master log ?
>
>
> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > We are seeing one other issue with high read latency (p99 etc.) on one of
> > our read heavy hbase clusters which is correlated with the balancer runs
> -
> > every 5 minutes.
> >
> > If there is no balancing to do, does the balancer only scan the table
> every
> > 5 minutes - does it do anything on top of that if the regions are
> balanced
> > ?
> >
> > Varun
> >
>

Re: Balancer switch runs causing problems

Posted by Varun Sharma <va...@pinterest.com>.

The default one with 0.94.7... - I dont see any of those logs. Also we
turned off the balancer switch - but looks like sometimes we still see a
large number of requests to .META. table every 5 minutes.

Varun


On Mon, Jan 27, 2014 at 7:37 PM, Ted Yu <yu...@gmail.com> wrote:

> In HMaster#balance(), we have (same for 0.94 and 0.96):
>
>         for (RegionPlan plan: plans) {
>           LOG.info("balance " + plan);
>
> Do you see such log in master log ?
>
>
> On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > We are seeing one other issue with high read latency (p99 etc.) on one of
> > our read heavy hbase clusters which is correlated with the balancer runs
> -
> > every 5 minutes.
> >
> > If there is no balancing to do, does the balancer only scan the table
> every
> > 5 minutes - does it do anything on top of that if the regions are
> balanced
> > ?
> >
> > Varun
> >
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

In HMaster#balance(), we have (same for 0.94 and 0.96):

        for (RegionPlan plan: plans) {
          LOG.info("balance " + plan);

Do you see such log in master log ?


On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com> wrote:

> We are seeing one other issue with high read latency (p99 etc.) on one of
> our read heavy hbase clusters which is correlated with the balancer runs -
> every 5 minutes.
>
> If there is no balancing to do, does the balancer only scan the table every
> 5 minutes - does it do anything on top of that if the regions are balanced
> ?
>
> Varun
>

Re: Balancer switch runs causing problems

Posted by Ted Yu <yu...@gmail.com>.

In HMaster#balance(), we have (same for 0.94 and 0.96):

        for (RegionPlan plan: plans) {
          LOG.info("balance " + plan);

Do you see such log in master log ?


On Mon, Jan 27, 2014 at 7:26 PM, Varun Sharma <va...@pinterest.com> wrote:

> We are seeing one other issue with high read latency (p99 etc.) on one of
> our read heavy hbase clusters which is correlated with the balancer runs -
> every 5 minutes.
>
> If there is no balancing to do, does the balancer only scan the table every
> 5 minutes - does it do anything on top of that if the regions are balanced
> ?
>
> Varun
>

Re: Balancer switch runs causing problems

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Varun,

Which balancer have you configured, and which version of HBase are you
using?


JM


2014-01-27 Varun Sharma <va...@pinterest.com>

> We are seeing one other issue with high read latency (p99 etc.) on one of
> our read heavy hbase clusters which is correlated with the balancer runs -
> every 5 minutes.
>
> If there is no balancing to do, does the balancer only scan the table every
> 5 minutes - does it do anything on top of that if the regions are balanced
> ?
>
> Varun
>