You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marc Limotte <ms...@gmail.com> on 2011/03/06 01:22:21 UTC

Hbck errors

We had an issue a day ago with some OOME's on the region servers.  The
master shutdown ok, but most of the RegionServers didn't and so eventually
had to kill -9 them.  Brought it all back up and ran a major compaction to
change the hbase block size.  This seemed to work, but now we have an
inconsistency which is preventing bulk loads from continuing.

hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
*Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
*
* *
hbck also notes that this region is offline:

*11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
offline, split, parent, ignoring.*

Looking in .META. I see that the region is indeed offline, and appears to be
split:

info:regioninfo                           timestamp=1299301154675 ...
OFFLINE => true,
info:splitA                                  timestamp=1299283401019
info:splitB                                  timestamp=1299283401019
(full .META. row below)

So, I'm guessing that it was in the midst of splitting and did not complete.

How can I recover from this situation?

thanks,
Marc

----------- .META. output ----------------

hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advert""
COLUMN
CELL

 info:regioninfo                              timestamp=1299301154675,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DS
                                              NR - Filesharing
ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
\x01starttime\x021295175600,1297185243218.6147d3
                                              696ba9db3a85e3afd08d0bc59a.',
STARTKEY => 'advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
                                              domain\x02thejesperbay.com
\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
'advertiser\x02OpenX PSA\x01coun
                                              try\x02United Arab
Emirates\x01publisher\
x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
entertainment\x01publishe
                                              r_tag\x02mmx.arts and
entertainment\x01starttime\x021295877600', ENCODED =>
6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
                                              IT => true, TABLE => {{NAME =>
'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0'
                                              , VERSIONS => '1', COMPRESSION
=> 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
BLOCKCACHE => 'true'},
                                              {NAME => 'topn', BLOOMFILTER
=> 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
TTL => '2147483647', BLOCK
                                              SIZE => '1048576', IN_MEMORY
=> 'false', BLOCKCACHE =>
'true'}]}}
 info:server                                  timestamp=1299181144063,
value=ip-10-17-24-121.ec2.internal:60020

 info:serverstartcode                         timestamp=1299181144063,
value=1299180905510

 info:splitA                                  timestamp=1299283401019,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DS
                                              NR - Filesharing
ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
\x01starttime\x021295175600,1299283399612.3b278f
                                              1b0ea78af239409efc4f0b2a3d.',
STARTKEY => 'advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
                                              domain\x02thejesperbay.com
\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
'advertiser\x02OpenX PSA\x01coun
                                              try\x02Taiwan\x01domain\
x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
entertainment\x01publisher_tag\x02\x01starttime\x0212
                                              96910800', ENCODED =>
3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
FAMILIES => [{NAME => 'metrics', BLO
                                              OMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_
                                              MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
'0', VERSIONS => '1', COMPR
                                              ESSION => 'GZ', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
 info:splitB                                  timestamp=1299283401019,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Taiwan\x01domain\x02kanzh

ongguo.com\x01advertiser_tag\x02mmx.arts
and
entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
                                              2a363812d068792311d3a9b.',
STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
x02kanzhongguo.com\x01advertiser_ta
                                              g\x02mmx.arts and
entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
'advertiser\x02OpenX PSA\x01country\x0
                                              2United Arab
Emirates\x01publisher\
x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
entertainment\x01publisher_tag\
                                              x02mmx.arts and
entertainment\x01starttime\x021295877600', ENCODED =>
9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
                                              _event_v2', FAMILIES => [{NAME
=> 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'1', COMPRESSION => 'GZ'
                                              , TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'topn', BLOOMFILTER => 'NONE',
                                              REPLICATION_SCOPE => '0',
VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', B
                                              LOCKCACHE =>
'true'}]}}

5 row(s) in 0.3960 seconds



------------ hbck -details output -------------
...
ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
on HDFS, but not listed in META or deployed on any region server.
ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
on HDFS, but not listed in META or deployed on any region server.
*11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
offline, split, parent, ignoring.
*ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
on HDFS, but not listed in META or deployed on any region server.
*Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
E*RROR: Found inconsistency in table opx_ad_event_v2
Summary:
  -ROOT- is okay.
    Number of regions: 1
    Deployed on:  ip-10-17-24-121.ec2.internal:60020
  .META. is okay.
    Number of regions: 1
    Deployed on:  ip-10-17-5-252.ec2.internal:60020
...
Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
Table opx_ad_event_v2 is inconsistent.
    Number of regions: 1612
    Deployed on: ...
4 inconsistencies detected.
Status: INCONSISTENT

Re: Hbck errors

Posted by Stack <sa...@gmail.com>.
Are the daughter regions in the meta table?  Are both there?   Are they both online?

You could try enabling the parent region



On Mar 5, 2011, at 16:22, Marc Limotte <ms...@gmail.com> wrote:

> We had an issue a day ago with some OOME's on the region servers.  The
> master shutdown ok, but most of the RegionServers didn't and so eventually
> had to kill -9 them.  Brought it all back up and ran a major compaction to
> change the hbase block size.  This seemed to work, but now we have an
> inconsistency which is preventing bulk loads from continuing.
> 
> hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> *
> * *
> hbck also notes that this region is offline:
> 
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.*
> 
> Looking in .META. I see that the region is indeed offline, and appears to be
> split:
> 
> info:regioninfo                           timestamp=1299301154675 ...
> OFFLINE => true,
> info:splitA                                  timestamp=1299283401019
> info:splitB                                  timestamp=1299283401019
> (full .META. row below)
> 
> So, I'm guessing that it was in the midst of splitting and did not complete.
> 
> How can I recover from this situation?
> 
> thanks,
> Marc
> 
> ----------- .META. output ----------------
> 
> hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advert""
> COLUMN
> CELL
> 
> info:regioninfo                              timestamp=1299301154675,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
>                                              NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3
>                                              696ba9db3a85e3afd08d0bc59a.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
>                                              domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
>                                              try\x02United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publishe
>                                              r_tag\x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
>                                              IT => true, TABLE => {{NAME =>
> 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0'
>                                              , VERSIONS => '1', COMPRESSION
> => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'},
>                                              {NAME => 'topn', BLOOMFILTER
> => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
> TTL => '2147483647', BLOCK
>                                              SIZE => '1048576', IN_MEMORY
> => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:server                                  timestamp=1299181144063,
> value=ip-10-17-24-121.ec2.internal:60020
> 
> info:serverstartcode                         timestamp=1299181144063,
> value=1299180905510
> 
> info:splitA                                  timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
>                                              NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1299283399612.3b278f
>                                              1b0ea78af239409efc4f0b2a3d.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
>                                              domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
>                                              try\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x0212
>                                              96910800', ENCODED =>
> 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> FAMILIES => [{NAME => 'metrics', BLO
>                                              OMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_
>                                              MEMORY => 'false', BLOCKCACHE
> => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0', VERSIONS => '1', COMPR
>                                              ESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:splitB                                  timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Taiwan\x01domain\x02kanzh
> 
> ongguo.com\x01advertiser_tag\x02mmx.arts
> and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
>                                              2a363812d068792311d3a9b.',
> STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_ta
>                                              g\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01country\x0
>                                              2United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\
>                                              x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
>                                              _event_v2', FAMILIES => [{NAME
> => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> '1', COMPRESSION => 'GZ'
>                                              , TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'topn', BLOOMFILTER => 'NONE',
>                                              REPLICATION_SCOPE => '0',
> VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', B
>                                              LOCKCACHE =>
> 'true'}]}}
> 
> 5 row(s) in 0.3960 seconds
> 
> 
> 
> ------------ hbck -details output -------------
> ...
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> on HDFS, but not listed in META or deployed on any region server.
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> on HDFS, but not listed in META or deployed on any region server.
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.
> *ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> on HDFS, but not listed in META or deployed on any region server.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> E*RROR: Found inconsistency in table opx_ad_event_v2
> Summary:
>  -ROOT- is okay.
>    Number of regions: 1
>    Deployed on:  ip-10-17-24-121.ec2.internal:60020
>  .META. is okay.
>    Number of regions: 1
>    Deployed on:  ip-10-17-5-252.ec2.internal:60020
> ...
> Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> Table opx_ad_event_v2 is inconsistent.
>    Number of regions: 1612
>    Deployed on: ...
> 4 inconsistencies detected.
> Status: INCONSISTENT

Re: Hbck errors

Posted by 茅旭峰 <m9...@gmail.com>.
Hi Marc,

Can you give some clues/links about how to manipulate the .META.
in details? For example, how to build an encoded region name, for
filling up the holes in .META., and how to assign the hole region then
compact it. Finally, I also tried check_meta.rb, it does not work for me.
Any suggestion would be highly appreciated! Thanks in advance!

Mao Xu-Feng

On Mon, Mar 7, 2011 at 2:08 AM, Marc Limotte <ms...@gmail.com> wrote:

> We have resolved the issue.  Details follow as "lessons learned".
>
> Yes, Stack, the info:splitA/B columns followed, as part of the offline'd
> parents .META. row.  But the regions that splitA/B point to *did not
> exist*in .META.
>
> Also, after the original email, we checked in HDFS and found the parent
> region directory with 4 data files.  And also directories for each daughter
> region (with 4 small files each--- presumably references to the original).
>
> So, it looks like (not totally sure about the exact order, but something
> like):
>
>   1. split started on region A
>   2. region A was offlined
>   3. The daughter regions were created in HDFS with the reference files
>   4. .META. was updated for region A
>   5. **** server crashed
>
> So, the new daughter entries were never added to .META.
>
> We first tried to online region A with the shell command "assign'.
>  Figuring
> that hbase would just find and split region A again.  This seemed to have
> no
> effect... not sure why, maybe because region A already had splitA/B
> entries?  Region A remained offline.  We also tried to force it to split
> region A, using the shell command "split".  Again no effect.
>
> Finally we tried to manually complete the split that had started.  Peter
> manually inserted the two daughter regions into .META.  We then tried to
> force a compact from the shell, this failed with a NSRE.  So we onlined
> region A with the "assign"  command-- it worked this time.  And now we seem
> to be up again, compact works, data loads work, hbck checks out!
>
> As a side note, hbck gave me some good feedback to help investigate the
> problem; although the "-fix" didn't help in this case.  It would be nice if
> there was a tool or shell command to create a region given name, hdfs-path,
> start and end keys.
>
> Also, check_meta.rb threw me off track, because it did not detect any holes
> when they did in fact exist.  This made me discount the most obvious
> scenario, since I believed there were no holes.  Looking at the source for
> bin/check_meta.rb, I see the issue:
>
> if oldHRI.isOffline() && Bytes.equals(oldHRI.getStartKey(),
> hri.getStartKey())
>  # Presume offlined parent
> elsif Bytes.equals(oldHRI.getEndKey(), hri.getStartKey())
>  # Start key of next matches end key of previous
> ...
>
> When checking for holes, it does not properly account for offline regions.
> The first condition doesn't apply because oldHRI.start != hri.start.  The
> second condition does apply (oldHRI.end = hri.start) and so it continues on
> thinking there is no "problem" here.  Instead, I think the second condition
> should be:
>
> ...
> elsif *!oldHRI.isOffline() && *Bytes.equals(oldHRI.getEndKey(),
> hri.getStartKey())
>  # Start key of next matches end key of previous
> ...
>
> Marc
>
>
> On Sun, Mar 6, 2011 at 9:18 AM, Stack <st...@duboce.net> wrote:
>
> > So, yeah Marc, what are the rows that follow the ones you post below?
> > Are they the info:splitA and info:splitB or something else?
> > Thanks,
> > St.Ack
> >
> > On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com>
> wrote:
> > > We had an issue a day ago with some OOME's on the region servers.  The
> > > master shutdown ok, but most of the RegionServers didn't and so
> > eventually
> > > had to kill -9 them.  Brought it all back up and ran a major compaction
> > to
> > > change the hbase block size.  This seemed to work, but now we have an
> > > inconsistency which is preventing bulk loads from continuing.
> > >
> > > hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > *
> > > * *
> > > hbck also notes that this region is offline:
> > >
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.*
> > >
> > > Looking in .META. I see that the region is indeed offline, and appears
> to
> > be
> > > split:
> > >
> > > info:regioninfo                           timestamp=1299301154675 ...
> > > OFFLINE => true,
> > > info:splitA                                  timestamp=1299283401019
> > > info:splitB                                  timestamp=1299283401019
> > > (full .META. row below)
> > >
> > > So, I'm guessing that it was in the midst of splitting and did not
> > complete.
> > >
> > > How can I recover from this situation?
> > >
> > > thanks,
> > > Marc
> > >
> > > ----------- .META. output ----------------
> > >
> > > hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advert""
> > > COLUMN
> > > CELL
> > >
> > >  info:regioninfo                              timestamp=1299301154675,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > >                                              NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1297185243218.6147d3
> > >
> >  696ba9db3a85e3afd08d0bc59a.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > >                                              domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > >                                              try\x02United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publishe
> > >                                              r_tag\x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> > >                                              IT => true, TABLE =>
> {{NAME
> > =>
> > > 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER =>
> > 'NONE',
> > > REPLICATION_SCOPE => '0'
> > >                                              , VERSIONS => '1',
> > COMPRESSION
> > > => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY =>
> > 'false',
> > > BLOCKCACHE => 'true'},
> > >                                              {NAME => 'topn',
> BLOOMFILTER
> > > => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS =>
> > '1',
> > > TTL => '2147483647', BLOCK
> > >                                              SIZE => '1048576',
> IN_MEMORY
> > > => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > >  info:server                                  timestamp=1299181144063,
> > > value=ip-10-17-24-121.ec2.internal:60020
> > >
> > >  info:serverstartcode                         timestamp=1299181144063,
> > > value=1299180905510
> > >
> > >  info:splitA                                  timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > >                                              NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1299283399612.3b278f
> > >
> >  1b0ea78af239409efc4f0b2a3d.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > >                                              domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > >                                              try\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x0212
> > >                                              96910800', ENCODED =>
> > > 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> > > FAMILIES => [{NAME => 'metrics', BLO
> > >                                              OMFILTER => 'NONE',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_
> > >                                              MEMORY => 'false',
> > BLOCKCACHE
> > > => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
> =>
> > > '0', VERSIONS => '1', COMPR
> > >                                              ESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > >  info:splitB                                  timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Taiwan\x01domain\x02kanzh
> > >
> > > ongguo.com\x01advertiser_tag\x02mmx.arts
> > > and
> > >
> >
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> > >                                              2a363812d068792311d3a9b.',
> > > STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_ta
> > >                                              g\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01country\x0
> > >                                              2United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publisher_tag\
> > >                                              x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> > >                                              _event_v2', FAMILIES =>
> > [{NAME
> > > => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
> > =>
> > > '1', COMPRESSION => 'GZ'
> > >                                              , TTL => '2147483647',
> > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
> {NAME
> > =>
> > > 'topn', BLOOMFILTER => 'NONE',
> > >                                              REPLICATION_SCOPE => '0',
> > > VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> > > '65536', IN_MEMORY => 'false', B
> > >                                              LOCKCACHE =>
> > > 'true'}]}}
> > >
> > > 5 row(s) in 0.3960 seconds
> > >
> > >
> > >
> > > ------------ hbck -details output -------------
> > > ...
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> > > on HDFS, but not listed in META or deployed on any region server.
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.
> > > *ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > E*RROR: Found inconsistency in table opx_ad_event_v2
> > > Summary:
> > >  -ROOT- is okay.
> > >    Number of regions: 1
> > >    Deployed on:  ip-10-17-24-121.ec2.internal:60020
> > >  .META. is okay.
> > >    Number of regions: 1
> > >    Deployed on:  ip-10-17-5-252.ec2.internal:60020
> > > ...
> > > Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > Table opx_ad_event_v2 is inconsistent.
> > >    Number of regions: 1612
> > >    Deployed on: ...
> > > 4 inconsistencies detected.
> > > Status: INCONSISTENT
> > >
> >
>

Re: Hbck errors

Posted by Adam Phelps <am...@opendns.com>.
On 3/21/11 10:13 PM, Stack wrote:
> On Mon, Mar 21, 2011 at 7:19 PM, Adam Phelps<am...@opendns.com>  wrote:
>> It looks like we've come up against a problem that looks identical to the
>> one you described.  How did you go about manually inserting the two child
>> regions?
>>
>
> You know the daughter regions because they should be listed when you
> look at the parent in .META.  It should have info:splitA and
> info:splitB columns with the daughters listed.  Take the encoded name
> of the daughters.  Look in hdfs.  Are the regions there?  If so,
> insert regions named the same as those in info:splitA and info:splitB.
>   Take the parent region for the template making the HRegionInfo.
>
> Poke around in bin/*rb scripts to see examples of reading HRegionInfo,
> amending it, and insert into .META.

We attempted to do this via the "put" command in the shell, however when 
we then try to read the entry we get a VersionMismatchException:

ERROR: org.apache.hadoop.io.VersionMismatchException: null
Backtrace: VersionedWritable.java:46:in 
`org.apache.hadoop.io.VersionedWritable.readFields'
            HRegionInfo.java:625:in 
`org.apache.hadoop.hbase.HRegionInfo.readFields'
            Writables.java:105:in 
`org.apache.hadoop.hbase.util.Writables.getWritable'
            Writables.java:75:in 
`org.apache.hadoop.hbase.util.Writables.getWritable'
            Writables.java:119:in 
`org.apache.hadoop.hbase.util.Writables.getHRegionInfo'
            Writables.java:130:in 
`org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull'

Looking through the code (we're using CDH3B4) it looks like the version 
in HRegionInfo is hardcoded to 0, whereas the version used by Put (in 
Put.java) is hardcoded to 1.

Is there an alternative means of adding an entry for the child regions? 
  I've looked at the raw data in /hbase/.META. however it looks to be 
binary data and so I'm hesitant to try editing it that way if at all 
possible.

- Adam

Re: Hbck errors

Posted by Stack <st...@duboce.net>.
On Mon, Mar 21, 2011 at 7:19 PM, Adam Phelps <am...@opendns.com> wrote:
> It looks like we've come up against a problem that looks identical to the
> one you described.  How did you go about manually inserting the two child
> regions?
>

You know the daughter regions because they should be listed when you
look at the parent in .META.  It should have info:splitA and
info:splitB columns with the daughters listed.  Take the encoded name
of the daughters.  Look in hdfs.  Are the regions there?  If so,
insert regions named the same as those in info:splitA and info:splitB.
 Take the parent region for the template making the HRegionInfo.

Poke around in bin/*rb scripts to see examples of reading HRegionInfo,
amending it, and insert into .META.

> Our current thought on fixing it is to use the hbase shell to remove the
> entries for the child regions and rewrite the region's entry such that
> OFFLINE => false and SPLIT => false (ie both currently true) but we're not
> sure if thats a good solution.
>

You could change the split flag to true and then try onlining parent
again (try calling assign).  That might get it back up.  Before doing
this though, you should remove daughters from hdfs if they are present
(see above for how to figure daughter regions -- or go to the
regionserver that was hosting parent and find the split message, it'll
list the daughters).

St.Ack

Re: Hbck errors

Posted by Adam Phelps <am...@opendns.com>.
On 3/6/11 10:08 AM, Marc Limotte wrote:
>     1. split started on region A
>     2. region A was offlined
>     3. The daughter regions were created in HDFS with the reference files
>     4. .META. was updated for region A
>     5. **** server crashed
>
> So, the new daughter entries were never added to .META.
>
> We first tried to online region A with the shell command "assign'.  Figuring
> that hbase would just find and split region A again.  This seemed to have no
> effect... not sure why, maybe because region A already had splitA/B
> entries?  Region A remained offline.  We also tried to force it to split
> region A, using the shell command "split".  Again no effect.
>
> Finally we tried to manually complete the split that had started.  Peter
> manually inserted the two daughter regions into .META.  We then tried to
> force a compact from the shell, this failed with a NSRE.  So we onlined
> region A with the "assign"  command-- it worked this time.  And now we seem
> to be up again, compact works, data loads work, hbck checks out!

It looks like we've come up against a problem that looks identical to 
the one you described.  How did you go about manually inserting the two 
child regions?

Our current thought on fixing it is to use the hbase shell to remove the 
entries for the child regions and rewrite the region's entry such that 
OFFLINE => false and SPLIT => false (ie both currently true) but we're 
not sure if thats a good solution.

*** .META. info for the problem region ***

  domains,1932334:2011/02/18/03:com.photobucket.i654,1 
column=info:regioninfo, timestamp=1300387322414, value=REGION => {NAME 
=> 
'domains,1932334:2011/02/18/03:com.photobucket.i654,1299792156289.3824e8b8310176b
  299792156289.3824e8b8310176b6f3c2a1d3f3e708dc. 
6f3c2a1d3f3e708dc.', STARTKEY => 
'1932334:2011/02/18/03:com.photobucket.i654', ENDKEY => 
'1933201:2011/03/02/09:org.wikipedia.af', ENCODED => 3824e8b831017
 
6b6f3c2a1d3f3e708dc, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 
'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER => 'NONE', 
REPLICATION_SCO
                                                       PE => '0', 
COMPRESSION => 'LZO', VERSIONS => '1', TTL => '1000000000', BLOCKSIZE => 
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
  domains,1932334:2011/02/18/03:com.photobucket.i654,1 
column=info:server, timestamp=1300757176451, 
value=s8.sjc.opendns.com:60020 

  299792156289.3824e8b8310176b6f3c2a1d3f3e708dc. 
 

  domains,1932334:2011/02/18/03:com.photobucket.i654,1 
column=info:serverstartcode, timestamp=1300757176451, 
value=1300752817197 

  299792156289.3824e8b8310176b6f3c2a1d3f3e708dc. 
 

  domains,1932334:2011/02/18/03:com.photobucket.i654,1 
column=info:splitA, timestamp=1300387322414, value=REGION => {NAME => 
'domains,1932334:2011/02/18/03:com.photobucket.i654,1300387311068.3fbd783ab2a3de505fd
  299792156289.3824e8b8310176b6f3c2a1d3f3e708dc.       5607748d82ec7.', 
STARTKEY => '1932334:2011/02/18/03:com.photobucket.i654', ENDKEY => 
'1932968:2010/11/10/12:com.twitter', ENCODED => 3fbd783ab2a3de505fd560
                                                       7748d82ec7, TABLE 
=> {{NAME => 'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER => 
'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS
                                                        => '1', TTL => 
'1000000000', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 
'true'}]}}
  domains,1932334:2011/02/18/03:com.photobucket.i654,1 
column=info:splitB, timestamp=1300387322414, value=REGION => {NAME => 
'domains,1932968:2010/11/10/12:com.twitter,1300387311068.6e95a3361da531a57b5883014c04
  299792156289.3824e8b8310176b6f3c2a1d3f3e708dc.       7cdc.', STARTKEY 
=> '1932968:2010/11/10/12:com.twitter', ENDKEY => 
'1933201:2011/03/02/09:org.wikipedia.af', ENCODED => 
6e95a3361da531a57b5883014c047cdc, T
                                                       ABLE => {{NAME => 
'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER => 'NONE', 
REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL
                                                       => '1000000000', 
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}


- Adam

Re: Hbck errors

Posted by Marc Limotte <ms...@gmail.com>.
We have resolved the issue.  Details follow as "lessons learned".

Yes, Stack, the info:splitA/B columns followed, as part of the offline'd
parents .META. row.  But the regions that splitA/B point to *did not
exist*in .META.

Also, after the original email, we checked in HDFS and found the parent
region directory with 4 data files.  And also directories for each daughter
region (with 4 small files each--- presumably references to the original).

So, it looks like (not totally sure about the exact order, but something
like):

   1. split started on region A
   2. region A was offlined
   3. The daughter regions were created in HDFS with the reference files
   4. .META. was updated for region A
   5. **** server crashed

So, the new daughter entries were never added to .META.

We first tried to online region A with the shell command "assign'.  Figuring
that hbase would just find and split region A again.  This seemed to have no
effect... not sure why, maybe because region A already had splitA/B
entries?  Region A remained offline.  We also tried to force it to split
region A, using the shell command "split".  Again no effect.

Finally we tried to manually complete the split that had started.  Peter
manually inserted the two daughter regions into .META.  We then tried to
force a compact from the shell, this failed with a NSRE.  So we onlined
region A with the "assign"  command-- it worked this time.  And now we seem
to be up again, compact works, data loads work, hbck checks out!

As a side note, hbck gave me some good feedback to help investigate the
problem; although the "-fix" didn't help in this case.  It would be nice if
there was a tool or shell command to create a region given name, hdfs-path,
start and end keys.

Also, check_meta.rb threw me off track, because it did not detect any holes
when they did in fact exist.  This made me discount the most obvious
scenario, since I believed there were no holes.  Looking at the source for
bin/check_meta.rb, I see the issue:

if oldHRI.isOffline() && Bytes.equals(oldHRI.getStartKey(),
hri.getStartKey())
  # Presume offlined parent
elsif Bytes.equals(oldHRI.getEndKey(), hri.getStartKey())
  # Start key of next matches end key of previous
...

When checking for holes, it does not properly account for offline regions.
The first condition doesn't apply because oldHRI.start != hri.start.  The
second condition does apply (oldHRI.end = hri.start) and so it continues on
thinking there is no "problem" here.  Instead, I think the second condition
should be:

...
elsif *!oldHRI.isOffline() && *Bytes.equals(oldHRI.getEndKey(),
hri.getStartKey())
  # Start key of next matches end key of previous
...

Marc


On Sun, Mar 6, 2011 at 9:18 AM, Stack <st...@duboce.net> wrote:

> So, yeah Marc, what are the rows that follow the ones you post below?
> Are they the info:splitA and info:splitB or something else?
> Thanks,
> St.Ack
>
> On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com> wrote:
> > We had an issue a day ago with some OOME's on the region servers.  The
> > master shutdown ok, but most of the RegionServers didn't and so
> eventually
> > had to kill -9 them.  Brought it all back up and ran a major compaction
> to
> > change the hbase block size.  This seemed to work, but now we have an
> > inconsistency which is preventing bulk loads from continuing.
> >
> > hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
> > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > *
> > * *
> > hbck also notes that this region is offline:
> >
> > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > offline, split, parent, ignoring.*
> >
> > Looking in .META. I see that the region is indeed offline, and appears to
> be
> > split:
> >
> > info:regioninfo                           timestamp=1299301154675 ...
> > OFFLINE => true,
> > info:splitA                                  timestamp=1299283401019
> > info:splitB                                  timestamp=1299283401019
> > (full .META. row below)
> >
> > So, I'm guessing that it was in the midst of splitting and did not
> complete.
> >
> > How can I recover from this situation?
> >
> > thanks,
> > Marc
> >
> > ----------- .META. output ----------------
> >
> > hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advert""
> > COLUMN
> > CELL
> >
> >  info:regioninfo                              timestamp=1299301154675,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DS
> >                                              NR - Filesharing
> > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > \x01starttime\x021295175600,1297185243218.6147d3
> >
>  696ba9db3a85e3afd08d0bc59a.',
> > STARTKEY => 'advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> >                                              domain\x02thejesperbay.com
> > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01coun
> >                                              try\x02United Arab
> > Emirates\x01publisher\
> > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > entertainment\x01publishe
> >                                              r_tag\x02mmx.arts and
> > entertainment\x01starttime\x021295877600', ENCODED =>
> > 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> >                                              IT => true, TABLE => {{NAME
> =>
> > 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER =>
> 'NONE',
> > REPLICATION_SCOPE => '0'
> >                                              , VERSIONS => '1',
> COMPRESSION
> > => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY =>
> 'false',
> > BLOCKCACHE => 'true'},
> >                                              {NAME => 'topn', BLOOMFILTER
> > => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS =>
> '1',
> > TTL => '2147483647', BLOCK
> >                                              SIZE => '1048576', IN_MEMORY
> > => 'false', BLOCKCACHE =>
> > 'true'}]}}
> >  info:server                                  timestamp=1299181144063,
> > value=ip-10-17-24-121.ec2.internal:60020
> >
> >  info:serverstartcode                         timestamp=1299181144063,
> > value=1299180905510
> >
> >  info:splitA                                  timestamp=1299283401019,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DS
> >                                              NR - Filesharing
> > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > \x01starttime\x021295175600,1299283399612.3b278f
> >
>  1b0ea78af239409efc4f0b2a3d.',
> > STARTKEY => 'advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> >                                              domain\x02thejesperbay.com
> > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01coun
> >                                              try\x02Taiwan\x01domain\
> > x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> > entertainment\x01publisher_tag\x02\x01starttime\x0212
> >                                              96910800', ENCODED =>
> > 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> > FAMILIES => [{NAME => 'metrics', BLO
> >                                              OMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_
> >                                              MEMORY => 'false',
> BLOCKCACHE
> > => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> > '0', VERSIONS => '1', COMPR
> >                                              ESSION => 'GZ', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > 'true'}]}}
> >  info:splitB                                  timestamp=1299283401019,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Taiwan\x01domain\x02kanzh
> >
> > ongguo.com\x01advertiser_tag\x02mmx.arts
> > and
> >
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> >                                              2a363812d068792311d3a9b.',
> > STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> > x02kanzhongguo.com\x01advertiser_ta
> >                                              g\x02mmx.arts and
> > entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01country\x0
> >                                              2United Arab
> > Emirates\x01publisher\
> > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > entertainment\x01publisher_tag\
> >                                              x02mmx.arts and
> > entertainment\x01starttime\x021295877600', ENCODED =>
> > 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> >                                              _event_v2', FAMILIES =>
> [{NAME
> > => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
> =>
> > '1', COMPRESSION => 'GZ'
> >                                              , TTL => '2147483647',
> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME
> =>
> > 'topn', BLOOMFILTER => 'NONE',
> >                                              REPLICATION_SCOPE => '0',
> > VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> > '65536', IN_MEMORY => 'false', B
> >                                              LOCKCACHE =>
> > 'true'}]}}
> >
> > 5 row(s) in 0.3960 seconds
> >
> >
> >
> > ------------ hbck -details output -------------
> > ...
> > ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> > on HDFS, but not listed in META or deployed on any region server.
> > ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> > on HDFS, but not listed in META or deployed on any region server.
> > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > offline, split, parent, ignoring.
> > *ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> > on HDFS, but not listed in META or deployed on any region server.
> > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > E*RROR: Found inconsistency in table opx_ad_event_v2
> > Summary:
> >  -ROOT- is okay.
> >    Number of regions: 1
> >    Deployed on:  ip-10-17-24-121.ec2.internal:60020
> >  .META. is okay.
> >    Number of regions: 1
> >    Deployed on:  ip-10-17-5-252.ec2.internal:60020
> > ...
> > Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > Table opx_ad_event_v2 is inconsistent.
> >    Number of regions: 1612
> >    Deployed on: ...
> > 4 inconsistencies detected.
> > Status: INCONSISTENT
> >
>

Re: Hbck errors

Posted by Stack <st...@duboce.net>.
So, yeah Marc, what are the rows that follow the ones you post below?
Are they the info:splitA and info:splitB or something else?
Thanks,
St.Ack

On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com> wrote:
> We had an issue a day ago with some OOME's on the region servers.  The
> master shutdown ok, but most of the RegionServers didn't and so eventually
> had to kill -9 them.  Brought it all back up and ran a major compaction to
> change the hbase block size.  This seemed to work, but now we have an
> inconsistency which is preventing bulk loads from continuing.
>
> hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> *
> * *
> hbck also notes that this region is offline:
>
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.*
>
> Looking in .META. I see that the region is indeed offline, and appears to be
> split:
>
> info:regioninfo                           timestamp=1299301154675 ...
> OFFLINE => true,
> info:splitA                                  timestamp=1299283401019
> info:splitB                                  timestamp=1299283401019
> (full .META. row below)
>
> So, I'm guessing that it was in the midst of splitting and did not complete.
>
> How can I recover from this situation?
>
> thanks,
> Marc
>
> ----------- .META. output ----------------
>
> hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advert""
> COLUMN
> CELL
>
>  info:regioninfo                              timestamp=1299301154675,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
>                                              NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3
>                                              696ba9db3a85e3afd08d0bc59a.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
>                                              domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
>                                              try\x02United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publishe
>                                              r_tag\x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
>                                              IT => true, TABLE => {{NAME =>
> 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0'
>                                              , VERSIONS => '1', COMPRESSION
> => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'},
>                                              {NAME => 'topn', BLOOMFILTER
> => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
> TTL => '2147483647', BLOCK
>                                              SIZE => '1048576', IN_MEMORY
> => 'false', BLOCKCACHE =>
> 'true'}]}}
>  info:server                                  timestamp=1299181144063,
> value=ip-10-17-24-121.ec2.internal:60020
>
>  info:serverstartcode                         timestamp=1299181144063,
> value=1299180905510
>
>  info:splitA                                  timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
>                                              NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1299283399612.3b278f
>                                              1b0ea78af239409efc4f0b2a3d.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
>                                              domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
>                                              try\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x0212
>                                              96910800', ENCODED =>
> 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> FAMILIES => [{NAME => 'metrics', BLO
>                                              OMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_
>                                              MEMORY => 'false', BLOCKCACHE
> => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0', VERSIONS => '1', COMPR
>                                              ESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}]}}
>  info:splitB                                  timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Taiwan\x01domain\x02kanzh
>
> ongguo.com\x01advertiser_tag\x02mmx.arts
> and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
>                                              2a363812d068792311d3a9b.',
> STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_ta
>                                              g\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01country\x0
>                                              2United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\
>                                              x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
>                                              _event_v2', FAMILIES => [{NAME
> => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> '1', COMPRESSION => 'GZ'
>                                              , TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'topn', BLOOMFILTER => 'NONE',
>                                              REPLICATION_SCOPE => '0',
> VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', B
>                                              LOCKCACHE =>
> 'true'}]}}
>
> 5 row(s) in 0.3960 seconds
>
>
>
> ------------ hbck -details output -------------
> ...
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> on HDFS, but not listed in META or deployed on any region server.
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> on HDFS, but not listed in META or deployed on any region server.
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.
> *ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> on HDFS, but not listed in META or deployed on any region server.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> E*RROR: Found inconsistency in table opx_ad_event_v2
> Summary:
>  -ROOT- is okay.
>    Number of regions: 1
>    Deployed on:  ip-10-17-24-121.ec2.internal:60020
>  .META. is okay.
>    Number of regions: 1
>    Deployed on:  ip-10-17-5-252.ec2.internal:60020
> ...
> Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> Table opx_ad_event_v2 is inconsistent.
>    Number of regions: 1612
>    Deployed on: ...
> 4 inconsistencies detected.
> Status: INCONSISTENT
>