You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jianshi Huang <ji...@gmail.com> on 2014/09/17 11:39:46 UTC

Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

I constantly get the following errors when I tried to add splits to a table.

org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
org.apache.hadoop.hbase.NotServingRegionException: Region
grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
484.e7743495366df3c82a8571b36c2bdac3. is not online on
lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
        at


But when I checked the region server (from hbase' webUI), the region is
actually listed there.

What does the error mean actually? How can I solve it?

Currently I'm adding splits single-threaded, and I want to make it
parallel, is there anything I need to be careful about?

Here's the code for adding splits:

  def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit = {
    val admin = new HBaseAdmin(conn)

    try {
      val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
      val regionStartKeys = regions.map(_.getStartKey)
      val splits = splitKeys.diff(regionStartKeys)

      splits.foreach { splitPoint =>
        admin.split(tableName.getBytes("UTF8"), splitPoint)
      }
      // NOTE: important!
      admin.balancer()
    }
    finally {
      admin.close()
    }
  }


Any help is appreciated.

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Jianshi Huang <ji...@gmail.com>.

Yes Esteban, there're very practical reasons to do the pre-split
dynamically.

Jianshi

On Thu, Sep 18, 2014 at 1:41 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Hi Jianshi,
>
> Is there any reason why you need to split dynamically the table? Users
> usually pre-split their tables with a specific number of splits or they
> pick a region split policy that fits their needs:
>
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.html
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.html
>
> or they have the options to implement their own. See for some details
> http://hbase.apache.org/book/regions.arch.html#arch.region.split
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Sep 17, 2014 at 5:06 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Split is an async operation. When you call it, and the call returns, it
> > does not mean that the region has been created yet.
> >
> > So either you wait for a while (using Thread.sleep) or check for the
> number
> > of regions in a loop and until they have increased to the value you want
> > and then access the region. The former is not a good idea, though you can
> > try it out just to make sure that this is indeed the issue.
> >
> > What am I suggesting is something like (pseudo code):
> >
> > while(new#regions > old#regions)
> > {
> >    new#regions = admin.getLatest#regions
> > }
> >
> > Regards,
> > Shahab
> >
> > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <ji...@gmail.com>
> > wrote:
> >
> > > I constantly get the following errors when I tried to add splits to a
> > > table.
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > >         at
> > >
> > >
> > > But when I checked the region server (from hbase' webUI), the region is
> > > actually listed there.
> > >
> > > What does the error mean actually? How can I solve it?
> > >
> > > Currently I'm adding splits single-threaded, and I want to make it
> > > parallel, is there anything I need to be careful about?
> > >
> > > Here's the code for adding splits:
> > >
> > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit
> = {
> > >     val admin = new HBaseAdmin(conn)
> > >
> > >     try {
> > >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> > >       val regionStartKeys = regions.map(_.getStartKey)
> > >       val splits = splitKeys.diff(regionStartKeys)
> > >
> > >       splits.foreach { splitPoint =>
> > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > >       }
> > >       // NOTE: important!
> > >       admin.balancer()
> > >     }
> > >     finally {
> > >       admin.close()
> > >     }
> > >   }
> > >
> > >
> > > Any help is appreciated.
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Esteban Gutierrez <es...@cloudera.com>.

Hi Jianshi,

Is there any reason why you need to split dynamically the table? Users
usually pre-split their tables with a specific number of splits or they
pick a region split policy that fits their needs:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.html
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.html

or they have the options to implement their own. See for some details
http://hbase.apache.org/book/regions.arch.html#arch.region.split

cheers,
esteban.


--
Cloudera, Inc.


On Wed, Sep 17, 2014 at 5:06 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> Split is an async operation. When you call it, and the call returns, it
> does not mean that the region has been created yet.
>
> So either you wait for a while (using Thread.sleep) or check for the number
> of regions in a loop and until they have increased to the value you want
> and then access the region. The former is not a good idea, though you can
> try it out just to make sure that this is indeed the issue.
>
> What am I suggesting is something like (pseudo code):
>
> while(new#regions > old#regions)
> {
>    new#regions = admin.getLatest#regions
> }
>
> Regards,
> Shahab
>
> On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
> > I constantly get the following errors when I tried to add splits to a
> > table.
> >
> >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > org.apache.hadoop.hbase.NotServingRegionException: Region
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> >         at
> >
> >
> > But when I checked the region server (from hbase' webUI), the region is
> > actually listed there.
> >
> > What does the error mean actually? How can I solve it?
> >
> > Currently I'm adding splits single-threaded, and I want to make it
> > parallel, is there anything I need to be careful about?
> >
> > Here's the code for adding splits:
> >
> >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit = {
> >     val admin = new HBaseAdmin(conn)
> >
> >     try {
> >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> >       val regionStartKeys = regions.map(_.getStartKey)
> >       val splits = splitKeys.diff(regionStartKeys)
> >
> >       splits.foreach { splitPoint =>
> >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> >       }
> >       // NOTE: important!
> >       admin.balancer()
> >     }
> >     finally {
> >       admin.close()
> >     }
> >   }
> >
> >
> > Any help is appreciated.
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Jianshi Huang <ji...@gmail.com>.

You rock Ted, I would also add synchronous addSplits as well, there's no
good reason multiple splits has to be done sequentially.

I also checked createTable, and I trace the code here and lost track...

    executeCallable(new MasterCallable<Void>(getConnection()) {
      @Override
      public Void call() throws ServiceException {
        CreateTableRequest request =
RequestConverter.buildCreateTableRequest(desc, splitKeys);
        master.createTable(null, request);
        return null;
      }
    });

So what happened in the handler of createTableRequest? Which part of code
should I check?

Jianshi


On Thu, Sep 18, 2014 at 2:09 AM, Ted Yu <yu...@gmail.com> wrote:

> Jianshi:
> See HBASE-11608 Add synchronous split
>
> bq. createTable does something special?
>
> Yes. See this in HBaseAdmin:
>
>   public void createTable(final HTableDescriptor desc, byte [][] splitKeys)
>
> On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
> > I see Shahab, async makes sense, but I prefer that the HBase client does
> > the retry for me, and let me specify a timeout parameter.
> >
> > One question, does that mean adding multiple splits into one region has
> to
> > be done sequentially? How can I add region splits in parallel? Does
> > createTable does something special?
> >
> >
> > Jianshi
> >
> >
> > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Split is an async operation. When you call it, and the call returns, it
> > > does not mean that the region has been created yet.
> > >
> > > So either you wait for a while (using Thread.sleep) or check for the
> > number
> > > of regions in a loop and until they have increased to the value you
> want
> > > and then access the region. The former is not a good idea, though you
> can
> > > try it out just to make sure that this is indeed the issue.
> > >
> > > What am I suggesting is something like (pseudo code):
> > >
> > > while(new#regions > old#regions)
> > > {
> > >    new#regions = admin.getLatest#regions
> > > }
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> jianshi.huang@gmail.com>
> > > wrote:
> > >
> > > > I constantly get the following errors when I tried to add splits to a
> > > > table.
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > >         at
> > > >
> > > >
> > > > But when I checked the region server (from hbase' webUI), the region
> is
> > > > actually listed there.
> > > >
> > > > What does the error mean actually? How can I solve it?
> > > >
> > > > Currently I'm adding splits single-threaded, and I want to make it
> > > > parallel, is there anything I need to be careful about?
> > > >
> > > > Here's the code for adding splits:
> > > >
> > > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit
> > = {
> > > >     val admin = new HBaseAdmin(conn)
> > > >
> > > >     try {
> > > >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > >       val splits = splitKeys.diff(regionStartKeys)
> > > >
> > > >       splits.foreach { splitPoint =>
> > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > >       }
> > > >       // NOTE: important!
> > > >       admin.balancer()
> > > >     }
> > > >     finally {
> > > >       admin.close()
> > > >     }
> > > >   }
> > > >
> > > >
> > > > Any help is appreciated.
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Jianshi Huang <ji...@gmail.com>.

Thanks Esteban for the suggestion.

For case 2) KeyPrefixRegionSplitPolicy won't be enough I think as we're
constantly adding new types so the #types is unknown at the beginning, and
when there's a new type of data, it will add pre-splits [type|00, type|01,
..., type|FF] to the table. Data is ingested one type after another so if
there's no auto-splits, ingestion will be too slow.

For case 1) I thought about binning, however it makes scans in
tableInputFormat more complicated. I think auto pre-splits can solve it so
currently a sampling process is run to compute the splitKeys for every ts
data to be ingested.

Jianshi


On Thu, Sep 18, 2014 at 3:19 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Thanks Jianshi for that helpful information,
>
> I think for use case 1) it depends on the data ingestion rate when the
> regions need to split. The synchronous split operation makes some sense
> there  if you want the regions to contain specific time ranges and/or
> number of records.
>
> For use case 2) I think is a good match for the KeyPrefixRegionSplitPolicy
> or DelimitedKeyPrefixRegionSplitPolicy. Since the regions will be split
> based on the <type> if type length is fixed or if the type is of varying
> length but delimited with |
>
> On a second thought, it might be even possible to solve 1) with those
> prefix based split policies if you use a prefix for your key that also
> varies monotonically or can be passed by the client when it has reached
> some threshold, e.g. after writing X billion data points, use prefix 001
> and next Y billion data rows use prefix 002 or something like that.
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Sep 17, 2014 at 11:53 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
> > Hi Esteban,
> >
> > Two reasons to split dynamically,
> >
> > 1) I have a column family that stores timeseries data for mapreduce
> tasks,
> > and the rowkey is monotonically increasing to make scanning easier.
> >
> > 2) (a better reason), I'm storing multiple types of data in the same
> table,
> > and I have about 500TB of data in total. That's many billions of rows and
> > many thousands of regions. I want to make sure ingesting one type of data
> > won't touch every region which will cause a lot of fragments and merge
> > operations, the rowkey is designed as <type>|<hash>|<id>.
> >
> > So either way I would want a dynamic split in my design.
> >
> > Jianshi
> >
> >
> > On Thu, Sep 18, 2014 at 2:39 AM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Jianshi,
> > >
> > > The retry is not an expected behavior that the client should be doing.
> In
> > > fact you don't want your clients to issue admin operations to the
> cluster
> > > ;)
> > >
> > > Shahab's option is the best alternative by polling when the number of
> > > regions has changed in the table you want to modify the splits
> > dynamically.
> > > The JIRA that Ted suggested requires modification in the core table
> > > operations to support sync operations and requires some major work to
> do
> > it
> > > right. Ted's alternative to create the splits at table creation time is
> > the
> > > best option if you can pre-split IMHO.
> > >
> > > If you could elaborate more on the practical reasons you mention to
> > create
> > > synchronously those new regions that would be great for us. Maybe its
> > > related to multi-tenancy but I'm just guessing :)
> > >
> > > esteban.
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Wed, Sep 17, 2014 at 11:09 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Jianshi:
> > > > See HBASE-11608 Add synchronous split
> > > >
> > > > bq. createTable does something special?
> > > >
> > > > Yes. See this in HBaseAdmin:
> > > >
> > > >   public void createTable(final HTableDescriptor desc, byte [][]
> > > splitKeys)
> > > >
> > > > On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <
> > jianshi.huang@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I see Shahab, async makes sense, but I prefer that the HBase client
> > > does
> > > > > the retry for me, and let me specify a timeout parameter.
> > > > >
> > > > > One question, does that mean adding multiple splits into one region
> > has
> > > > to
> > > > > be done sequentially? How can I add region splits in parallel? Does
> > > > > createTable does something special?
> > > > >
> > > > >
> > > > > Jianshi
> > > > >
> > > > >
> > > > > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Split is an async operation. When you call it, and the call
> > returns,
> > > it
> > > > > > does not mean that the region has been created yet.
> > > > > >
> > > > > > So either you wait for a while (using Thread.sleep) or check for
> > the
> > > > > number
> > > > > > of regions in a loop and until they have increased to the value
> you
> > > > want
> > > > > > and then access the region. The former is not a good idea, though
> > you
> > > > can
> > > > > > try it out just to make sure that this is indeed the issue.
> > > > > >
> > > > > > What am I suggesting is something like (pseudo code):
> > > > > >
> > > > > > while(new#regions > old#regions)
> > > > > > {
> > > > > >    new#regions = admin.getLatest#regions
> > > > > > }
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> > > > jianshi.huang@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I constantly get the following errors when I tried to add
> splits
> > > to a
> > > > > > > table.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > > > But when I checked the region server (from hbase' webUI), the
> > > region
> > > > is
> > > > > > > actually listed there.
> > > > > > >
> > > > > > > What does the error mean actually? How can I solve it?
> > > > > > >
> > > > > > > Currently I'm adding splits single-threaded, and I want to make
> > it
> > > > > > > parallel, is there anything I need to be careful about?
> > > > > > >
> > > > > > > Here's the code for adding splits:
> > > > > > >
> > > > > > >   def addSplits(tableName: String, splitKeys:
> Seq[Array[Byte]]):
> > > Unit
> > > > > = {
> > > > > > >     val admin = new HBaseAdmin(conn)
> > > > > > >
> > > > > > >     try {
> > > > > > >       val regions =
> > > admin.getTableRegions(tableName.getBytes("UTF8"))
> > > > > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > > > > >       val splits = splitKeys.diff(regionStartKeys)
> > > > > > >
> > > > > > >       splits.foreach { splitPoint =>
> > > > > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > > > > >       }
> > > > > > >       // NOTE: important!
> > > > > > >       admin.balancer()
> > > > > > >     }
> > > > > > >     finally {
> > > > > > >       admin.close()
> > > > > > >     }
> > > > > > >   }
> > > > > > >
> > > > > > >
> > > > > > > Any help is appreciated.
> > > > > > >
> > > > > > > --
> > > > > > > Jianshi Huang
> > > > > > >
> > > > > > > LinkedIn: jianshi
> > > > > > > Twitter: @jshuang
> > > > > > > Github & Blog: http://huangjs.github.com/
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Esteban Gutierrez <es...@cloudera.com>.

Thanks Jianshi for that helpful information,

I think for use case 1) it depends on the data ingestion rate when the
regions need to split. The synchronous split operation makes some sense
there  if you want the regions to contain specific time ranges and/or
number of records.

For use case 2) I think is a good match for the KeyPrefixRegionSplitPolicy
or DelimitedKeyPrefixRegionSplitPolicy. Since the regions will be split
based on the <type> if type length is fixed or if the type is of varying
length but delimited with |

On a second thought, it might be even possible to solve 1) with those
prefix based split policies if you use a prefix for your key that also
varies monotonically or can be passed by the client when it has reached
some threshold, e.g. after writing X billion data points, use prefix 001
and next Y billion data rows use prefix 002 or something like that.

cheers,
esteban.


--
Cloudera, Inc.


On Wed, Sep 17, 2014 at 11:53 AM, Jianshi Huang <ji...@gmail.com>
wrote:

> Hi Esteban,
>
> Two reasons to split dynamically,
>
> 1) I have a column family that stores timeseries data for mapreduce tasks,
> and the rowkey is monotonically increasing to make scanning easier.
>
> 2) (a better reason), I'm storing multiple types of data in the same table,
> and I have about 500TB of data in total. That's many billions of rows and
> many thousands of regions. I want to make sure ingesting one type of data
> won't touch every region which will cause a lot of fragments and merge
> operations, the rowkey is designed as <type>|<hash>|<id>.
>
> So either way I would want a dynamic split in my design.
>
> Jianshi
>
>
> On Thu, Sep 18, 2014 at 2:39 AM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Jianshi,
> >
> > The retry is not an expected behavior that the client should be doing. In
> > fact you don't want your clients to issue admin operations to the cluster
> > ;)
> >
> > Shahab's option is the best alternative by polling when the number of
> > regions has changed in the table you want to modify the splits
> dynamically.
> > The JIRA that Ted suggested requires modification in the core table
> > operations to support sync operations and requires some major work to do
> it
> > right. Ted's alternative to create the splits at table creation time is
> the
> > best option if you can pre-split IMHO.
> >
> > If you could elaborate more on the practical reasons you mention to
> create
> > synchronously those new regions that would be great for us. Maybe its
> > related to multi-tenancy but I'm just guessing :)
> >
> > esteban.
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Wed, Sep 17, 2014 at 11:09 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Jianshi:
> > > See HBASE-11608 Add synchronous split
> > >
> > > bq. createTable does something special?
> > >
> > > Yes. See this in HBaseAdmin:
> > >
> > >   public void createTable(final HTableDescriptor desc, byte [][]
> > splitKeys)
> > >
> > > On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <
> jianshi.huang@gmail.com
> > >
> > > wrote:
> > >
> > > > I see Shahab, async makes sense, but I prefer that the HBase client
> > does
> > > > the retry for me, and let me specify a timeout parameter.
> > > >
> > > > One question, does that mean adding multiple splits into one region
> has
> > > to
> > > > be done sequentially? How can I add region splits in parallel? Does
> > > > createTable does something special?
> > > >
> > > >
> > > > Jianshi
> > > >
> > > >
> > > > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Split is an async operation. When you call it, and the call
> returns,
> > it
> > > > > does not mean that the region has been created yet.
> > > > >
> > > > > So either you wait for a while (using Thread.sleep) or check for
> the
> > > > number
> > > > > of regions in a loop and until they have increased to the value you
> > > want
> > > > > and then access the region. The former is not a good idea, though
> you
> > > can
> > > > > try it out just to make sure that this is indeed the issue.
> > > > >
> > > > > What am I suggesting is something like (pseudo code):
> > > > >
> > > > > while(new#regions > old#regions)
> > > > > {
> > > > >    new#regions = admin.getLatest#regions
> > > > > }
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> > > jianshi.huang@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I constantly get the following errors when I tried to add splits
> > to a
> > > > > > table.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > > > >
> > > > >
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > > > >         at
> > > > > >
> > > > > >
> > > > > > But when I checked the region server (from hbase' webUI), the
> > region
> > > is
> > > > > > actually listed there.
> > > > > >
> > > > > > What does the error mean actually? How can I solve it?
> > > > > >
> > > > > > Currently I'm adding splits single-threaded, and I want to make
> it
> > > > > > parallel, is there anything I need to be careful about?
> > > > > >
> > > > > > Here's the code for adding splits:
> > > > > >
> > > > > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]):
> > Unit
> > > > = {
> > > > > >     val admin = new HBaseAdmin(conn)
> > > > > >
> > > > > >     try {
> > > > > >       val regions =
> > admin.getTableRegions(tableName.getBytes("UTF8"))
> > > > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > > > >       val splits = splitKeys.diff(regionStartKeys)
> > > > > >
> > > > > >       splits.foreach { splitPoint =>
> > > > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > > > >       }
> > > > > >       // NOTE: important!
> > > > > >       admin.balancer()
> > > > > >     }
> > > > > >     finally {
> > > > > >       admin.close()
> > > > > >     }
> > > > > >   }
> > > > > >
> > > > > >
> > > > > > Any help is appreciated.
> > > > > >
> > > > > > --
> > > > > > Jianshi Huang
> > > > > >
> > > > > > LinkedIn: jianshi
> > > > > > Twitter: @jshuang
> > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Jianshi Huang <ji...@gmail.com>.

Hi Esteban,

Two reasons to split dynamically,

1) I have a column family that stores timeseries data for mapreduce tasks,
and the rowkey is monotonically increasing to make scanning easier.

2) (a better reason), I'm storing multiple types of data in the same table,
and I have about 500TB of data in total. That's many billions of rows and
many thousands of regions. I want to make sure ingesting one type of data
won't touch every region which will cause a lot of fragments and merge
operations, the rowkey is designed as <type>|<hash>|<id>.

So either way I would want a dynamic split in my design.

Jianshi


On Thu, Sep 18, 2014 at 2:39 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Jianshi,
>
> The retry is not an expected behavior that the client should be doing. In
> fact you don't want your clients to issue admin operations to the cluster
> ;)
>
> Shahab's option is the best alternative by polling when the number of
> regions has changed in the table you want to modify the splits dynamically.
> The JIRA that Ted suggested requires modification in the core table
> operations to support sync operations and requires some major work to do it
> right. Ted's alternative to create the splits at table creation time is the
> best option if you can pre-split IMHO.
>
> If you could elaborate more on the practical reasons you mention to create
> synchronously those new regions that would be great for us. Maybe its
> related to multi-tenancy but I'm just guessing :)
>
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Sep 17, 2014 at 11:09 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Jianshi:
> > See HBASE-11608 Add synchronous split
> >
> > bq. createTable does something special?
> >
> > Yes. See this in HBaseAdmin:
> >
> >   public void createTable(final HTableDescriptor desc, byte [][]
> splitKeys)
> >
> > On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <jianshi.huang@gmail.com
> >
> > wrote:
> >
> > > I see Shahab, async makes sense, but I prefer that the HBase client
> does
> > > the retry for me, and let me specify a timeout parameter.
> > >
> > > One question, does that mean adding multiple splits into one region has
> > to
> > > be done sequentially? How can I add region splits in parallel? Does
> > > createTable does something special?
> > >
> > >
> > > Jianshi
> > >
> > >
> > > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > Split is an async operation. When you call it, and the call returns,
> it
> > > > does not mean that the region has been created yet.
> > > >
> > > > So either you wait for a while (using Thread.sleep) or check for the
> > > number
> > > > of regions in a loop and until they have increased to the value you
> > want
> > > > and then access the region. The former is not a good idea, though you
> > can
> > > > try it out just to make sure that this is indeed the issue.
> > > >
> > > > What am I suggesting is something like (pseudo code):
> > > >
> > > > while(new#regions > old#regions)
> > > > {
> > > >    new#regions = admin.getLatest#regions
> > > > }
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> > jianshi.huang@gmail.com>
> > > > wrote:
> > > >
> > > > > I constantly get the following errors when I tried to add splits
> to a
> > > > > table.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > > >
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > > >         at
> > > > >
> > > > >
> > > > > But when I checked the region server (from hbase' webUI), the
> region
> > is
> > > > > actually listed there.
> > > > >
> > > > > What does the error mean actually? How can I solve it?
> > > > >
> > > > > Currently I'm adding splits single-threaded, and I want to make it
> > > > > parallel, is there anything I need to be careful about?
> > > > >
> > > > > Here's the code for adding splits:
> > > > >
> > > > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]):
> Unit
> > > = {
> > > > >     val admin = new HBaseAdmin(conn)
> > > > >
> > > > >     try {
> > > > >       val regions =
> admin.getTableRegions(tableName.getBytes("UTF8"))
> > > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > > >       val splits = splitKeys.diff(regionStartKeys)
> > > > >
> > > > >       splits.foreach { splitPoint =>
> > > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > > >       }
> > > > >       // NOTE: important!
> > > > >       admin.balancer()
> > > > >     }
> > > > >     finally {
> > > > >       admin.close()
> > > > >     }
> > > > >   }
> > > > >
> > > > >
> > > > > Any help is appreciated.
> > > > >
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Esteban Gutierrez <es...@cloudera.com>.

Jianshi,

The retry is not an expected behavior that the client should be doing. In
fact you don't want your clients to issue admin operations to the cluster ;)

Shahab's option is the best alternative by polling when the number of
regions has changed in the table you want to modify the splits dynamically.
The JIRA that Ted suggested requires modification in the core table
operations to support sync operations and requires some major work to do it
right. Ted's alternative to create the splits at table creation time is the
best option if you can pre-split IMHO.

If you could elaborate more on the practical reasons you mention to create
synchronously those new regions that would be great for us. Maybe its
related to multi-tenancy but I'm just guessing :)

esteban.


--
Cloudera, Inc.


On Wed, Sep 17, 2014 at 11:09 AM, Ted Yu <yu...@gmail.com> wrote:

> Jianshi:
> See HBASE-11608 Add synchronous split
>
> bq. createTable does something special?
>
> Yes. See this in HBaseAdmin:
>
>   public void createTable(final HTableDescriptor desc, byte [][] splitKeys)
>
> On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
> > I see Shahab, async makes sense, but I prefer that the HBase client does
> > the retry for me, and let me specify a timeout parameter.
> >
> > One question, does that mean adding multiple splits into one region has
> to
> > be done sequentially? How can I add region splits in parallel? Does
> > createTable does something special?
> >
> >
> > Jianshi
> >
> >
> > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Split is an async operation. When you call it, and the call returns, it
> > > does not mean that the region has been created yet.
> > >
> > > So either you wait for a while (using Thread.sleep) or check for the
> > number
> > > of regions in a loop and until they have increased to the value you
> want
> > > and then access the region. The former is not a good idea, though you
> can
> > > try it out just to make sure that this is indeed the issue.
> > >
> > > What am I suggesting is something like (pseudo code):
> > >
> > > while(new#regions > old#regions)
> > > {
> > >    new#regions = admin.getLatest#regions
> > > }
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> jianshi.huang@gmail.com>
> > > wrote:
> > >
> > > > I constantly get the following errors when I tried to add splits to a
> > > > table.
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > >         at
> > > >
> > > >
> > > > But when I checked the region server (from hbase' webUI), the region
> is
> > > > actually listed there.
> > > >
> > > > What does the error mean actually? How can I solve it?
> > > >
> > > > Currently I'm adding splits single-threaded, and I want to make it
> > > > parallel, is there anything I need to be careful about?
> > > >
> > > > Here's the code for adding splits:
> > > >
> > > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit
> > = {
> > > >     val admin = new HBaseAdmin(conn)
> > > >
> > > >     try {
> > > >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > >       val splits = splitKeys.diff(regionStartKeys)
> > > >
> > > >       splits.foreach { splitPoint =>
> > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > >       }
> > > >       // NOTE: important!
> > > >       admin.balancer()
> > > >     }
> > > >     finally {
> > > >       admin.close()
> > > >     }
> > > >   }
> > > >
> > > >
> > > > Any help is appreciated.
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Ted Yu <yu...@gmail.com>.

Jianshi:
See HBASE-11608 Add synchronous split

bq. createTable does something special?

Yes. See this in HBaseAdmin:

  public void createTable(final HTableDescriptor desc, byte [][] splitKeys)

On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <ji...@gmail.com>
wrote:

> I see Shahab, async makes sense, but I prefer that the HBase client does
> the retry for me, and let me specify a timeout parameter.
>
> One question, does that mean adding multiple splits into one region has to
> be done sequentially? How can I add region splits in parallel? Does
> createTable does something special?
>
>
> Jianshi
>
>
> On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Split is an async operation. When you call it, and the call returns, it
> > does not mean that the region has been created yet.
> >
> > So either you wait for a while (using Thread.sleep) or check for the
> number
> > of regions in a loop and until they have increased to the value you want
> > and then access the region. The former is not a good idea, though you can
> > try it out just to make sure that this is indeed the issue.
> >
> > What am I suggesting is something like (pseudo code):
> >
> > while(new#regions > old#regions)
> > {
> >    new#regions = admin.getLatest#regions
> > }
> >
> > Regards,
> > Shahab
> >
> > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <ji...@gmail.com>
> > wrote:
> >
> > > I constantly get the following errors when I tried to add splits to a
> > > table.
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > >         at
> > >
> > >
> > > But when I checked the region server (from hbase' webUI), the region is
> > > actually listed there.
> > >
> > > What does the error mean actually? How can I solve it?
> > >
> > > Currently I'm adding splits single-threaded, and I want to make it
> > > parallel, is there anything I need to be careful about?
> > >
> > > Here's the code for adding splits:
> > >
> > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit
> = {
> > >     val admin = new HBaseAdmin(conn)
> > >
> > >     try {
> > >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> > >       val regionStartKeys = regions.map(_.getStartKey)
> > >       val splits = splitKeys.diff(regionStartKeys)
> > >
> > >       splits.foreach { splitPoint =>
> > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > >       }
> > >       // NOTE: important!
> > >       admin.balancer()
> > >     }
> > >     finally {
> > >       admin.close()
> > >     }
> > >   }
> > >
> > >
> > > Any help is appreciated.
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Jianshi Huang <ji...@gmail.com>.

I see Shahab, async makes sense, but I prefer that the HBase client does
the retry for me, and let me specify a timeout parameter.

One question, does that mean adding multiple splits into one region has to
be done sequentially? How can I add region splits in parallel? Does
createTable does something special?


Jianshi


On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Split is an async operation. When you call it, and the call returns, it
> does not mean that the region has been created yet.
>
> So either you wait for a while (using Thread.sleep) or check for the number
> of regions in a loop and until they have increased to the value you want
> and then access the region. The former is not a good idea, though you can
> try it out just to make sure that this is indeed the issue.
>
> What am I suggesting is something like (pseudo code):
>
> while(new#regions > old#regions)
> {
>    new#regions = admin.getLatest#regions
> }
>
> Regards,
> Shahab
>
> On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
> > I constantly get the following errors when I tried to add splits to a
> > table.
> >
> >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > org.apache.hadoop.hbase.NotServingRegionException: Region
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> >         at
> >
> >
> > But when I checked the region server (from hbase' webUI), the region is
> > actually listed there.
> >
> > What does the error mean actually? How can I solve it?
> >
> > Currently I'm adding splits single-threaded, and I want to make it
> > parallel, is there anything I need to be careful about?
> >
> > Here's the code for adding splits:
> >
> >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit = {
> >     val admin = new HBaseAdmin(conn)
> >
> >     try {
> >       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
> >       val regionStartKeys = regions.map(_.getStartKey)
> >       val splits = splitKeys.diff(regionStartKeys)
> >
> >       splits.foreach { splitPoint =>
> >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> >       }
> >       // NOTE: important!
> >       admin.balancer()
> >     }
> >     finally {
> >       admin.close()
> >     }
> >   }
> >
> >
> > Any help is appreciated.
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?

Posted by Shahab Yunus <sh...@gmail.com>.

Split is an async operation. When you call it, and the call returns, it
does not mean that the region has been created yet.

So either you wait for a while (using Thread.sleep) or check for the number
of regions in a loop and until they have increased to the value you want
and then access the region. The former is not a good idea, though you can
try it out just to make sure that this is indeed the issue.

What am I suggesting is something like (pseudo code):

while(new#regions > old#regions)
{
   new#regions = admin.getLatest#regions
}

Regards,
Shahab

On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <ji...@gmail.com>
wrote:

> I constantly get the following errors when I tried to add splits to a
> table.
>
>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> org.apache.hadoop.hbase.NotServingRegionException: Region
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
>         at
>
>
> But when I checked the region server (from hbase' webUI), the region is
> actually listed there.
>
> What does the error mean actually? How can I solve it?
>
> Currently I'm adding splits single-threaded, and I want to make it
> parallel, is there anything I need to be careful about?
>
> Here's the code for adding splits:
>
>   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]): Unit = {
>     val admin = new HBaseAdmin(conn)
>
>     try {
>       val regions = admin.getTableRegions(tableName.getBytes("UTF8"))
>       val regionStartKeys = regions.map(_.getStartKey)
>       val splits = splitKeys.diff(regionStartKeys)
>
>       splits.foreach { splitPoint =>
>         admin.split(tableName.getBytes("UTF8"), splitPoint)
>       }
>       // NOTE: important!
>       admin.balancer()
>     }
>     finally {
>       admin.close()
>     }
>   }
>
>
> Any help is appreciated.
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>