You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shawn Quinn <sq...@moxiegroup.com> on 2012/01/24 18:15:40 UTC

Possible ZooKeeper Connection Leak in TableOutputFormat

Hello,

Our application runs Map/Reduce tasks fairly frequently against HBase
(Cloudera distribution 0.90.4), and we're making using of the default
org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce
step which the TableMapReduceUtil.initTableReducerJob() sets up.  We invoke
the Map/Reduce tasks via the standard Hadoop Job API, but they're all
triggered from the same virtual machine that stays running (so we aren't
shutting down the virtual machine after each job runs).  We've been
noticing that we've been running out of ZooKeeper connections in this
configuration, and believe we've tracked the "leak" down to the
TableOutputFormat class.  Specifically, that class does the following:

  public void setConf(Configuration otherConf) {
    this.conf = HBaseConfiguration.create(otherConf);
    String tableName = this.conf.get(OUTPUT_TABLE);
    String address = this.conf.get(QUORUM_ADDRESS);
    String serverClass = this.conf.get(REGION_SERVER_CLASS);
    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
    try {
      if (address != null) {
        ZKUtil.applyClusterKeyToConf(this.conf, address);
      }
      if (serverClass != null) {
        this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass);
        this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl);
      }
      this.table = new HTable(this.conf, tableName);
      this.table.setAutoFlush(false);
      LOG.info("Created table instance for "  + tableName);
    } catch(IOException e) {
      LOG.error(e);
    }
  }

I believe in previous releases of HBase this was different, but at some
point the code to clone the configuration object (first line of that
method) was added.  Then, in that same method when that code creates the
HTable instance, internally the HTable gets a new connection to ZooKeeper
everytime (since the configuration instance is different.)

I believe I can get around this in my application by creating a custom
TableOutputFormat.  However, can anyone confirm if this is indeed a
problem, or if there is some other way to work around the default
TableOutputFormat class creating a new connection to ZooKeeper every time
it runs?

Thanks,

     -Shawn

Re: Possible ZooKeeper Connection Leak in TableOutputFormat

Posted by Shawn Quinn <sq...@moxiegroup.com>.
Thanks for the quick response Ted.  It does look like HBASE-4508 would
resolve the issue I'm seeing, so I appreciate the pointer.   We're back up
and running ok for now with a custom TableOutputFormat (and custom
TableInputFormat) to work around this issue, but when we're ready to move
to a later version of HBase we'll give that new mechanism a try.

Thanks!

      -Shawn

On Tue, Jan 24, 2012 at 2:39 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for reporting this, Shawn.
>
> Do you want to try out HBASE-4508 which is in HBase 0.90.5 ?
>
> On Tue, Jan 24, 2012 at 9:15 AM, Shawn Quinn <sq...@moxiegroup.com>
> wrote:
>
> > Hello,
> >
> > Our application runs Map/Reduce tasks fairly frequently against HBase
> > (Cloudera distribution 0.90.4), and we're making using of the default
> > org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce
> > step which the TableMapReduceUtil.initTableReducerJob() sets up.  We
> invoke
> > the Map/Reduce tasks via the standard Hadoop Job API, but they're all
> > triggered from the same virtual machine that stays running (so we aren't
> > shutting down the virtual machine after each job runs).  We've been
> > noticing that we've been running out of ZooKeeper connections in this
> > configuration, and believe we've tracked the "leak" down to the
> > TableOutputFormat class.  Specifically, that class does the following:
> >
> >  public void setConf(Configuration otherConf) {
> >    this.conf = HBaseConfiguration.create(otherConf);
> >    String tableName = this.conf.get(OUTPUT_TABLE);
> >    String address = this.conf.get(QUORUM_ADDRESS);
> >    String serverClass = this.conf.get(REGION_SERVER_CLASS);
> >    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
> >    try {
> >      if (address != null) {
> >        ZKUtil.applyClusterKeyToConf(this.conf, address);
> >      }
> >      if (serverClass != null) {
> >        this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass);
> >        this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl);
> >      }
> >      this.table = new HTable(this.conf, tableName);
> >      this.table.setAutoFlush(false);
> >      LOG.info("Created table instance for "  + tableName);
> >    } catch(IOException e) {
> >      LOG.error(e);
> >    }
> >  }
> >
> > I believe in previous releases of HBase this was different, but at some
> > point the code to clone the configuration object (first line of that
> > method) was added.  Then, in that same method when that code creates the
> > HTable instance, internally the HTable gets a new connection to ZooKeeper
> > everytime (since the configuration instance is different.)
> >
> > I believe I can get around this in my application by creating a custom
> > TableOutputFormat.  However, can anyone confirm if this is indeed a
> > problem, or if there is some other way to work around the default
> > TableOutputFormat class creating a new connection to ZooKeeper every time
> > it runs?
> >
> > Thanks,
> >
> >     -Shawn
> >
>

Re: Possible ZooKeeper Connection Leak in TableOutputFormat

Posted by Ted Yu <yu...@gmail.com>.
Thanks for reporting this, Shawn.

Do you want to try out HBASE-4508 which is in HBase 0.90.5 ?

On Tue, Jan 24, 2012 at 9:15 AM, Shawn Quinn <sq...@moxiegroup.com> wrote:

> Hello,
>
> Our application runs Map/Reduce tasks fairly frequently against HBase
> (Cloudera distribution 0.90.4), and we're making using of the default
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce
> step which the TableMapReduceUtil.initTableReducerJob() sets up.  We invoke
> the Map/Reduce tasks via the standard Hadoop Job API, but they're all
> triggered from the same virtual machine that stays running (so we aren't
> shutting down the virtual machine after each job runs).  We've been
> noticing that we've been running out of ZooKeeper connections in this
> configuration, and believe we've tracked the "leak" down to the
> TableOutputFormat class.  Specifically, that class does the following:
>
>  public void setConf(Configuration otherConf) {
>    this.conf = HBaseConfiguration.create(otherConf);
>    String tableName = this.conf.get(OUTPUT_TABLE);
>    String address = this.conf.get(QUORUM_ADDRESS);
>    String serverClass = this.conf.get(REGION_SERVER_CLASS);
>    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
>    try {
>      if (address != null) {
>        ZKUtil.applyClusterKeyToConf(this.conf, address);
>      }
>      if (serverClass != null) {
>        this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass);
>        this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl);
>      }
>      this.table = new HTable(this.conf, tableName);
>      this.table.setAutoFlush(false);
>      LOG.info("Created table instance for "  + tableName);
>    } catch(IOException e) {
>      LOG.error(e);
>    }
>  }
>
> I believe in previous releases of HBase this was different, but at some
> point the code to clone the configuration object (first line of that
> method) was added.  Then, in that same method when that code creates the
> HTable instance, internally the HTable gets a new connection to ZooKeeper
> everytime (since the configuration instance is different.)
>
> I believe I can get around this in my application by creating a custom
> TableOutputFormat.  However, can anyone confirm if this is indeed a
> problem, or if there is some other way to work around the default
> TableOutputFormat class creating a new connection to ZooKeeper every time
> it runs?
>
> Thanks,
>
>     -Shawn
>