You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marc Limotte <ms...@gmail.com> on 2011/03/06 01:22:21 UTC
Hbck errors
We had an issue a day ago with some OOME's on the region servers. The
master shutdown ok, but most of the RegionServers didn't and so eventually
had to kill -9 them. Brought it all back up and ran a major compaction to
change the hbase block size. This seemed to work, but now we have an
inconsistency which is preventing bulk loads from continuing.
hbase hbck -details finds an inconsistency. I tried -fix, but no help.
*Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
*
* *
hbck also notes that this region is offline:
*11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
offline, split, parent, ignoring.*
Looking in .META. I see that the region is indeed offline, and appears to be
split:
info:regioninfo timestamp=1299301154675 ...
OFFLINE => true,
info:splitA timestamp=1299283401019
info:splitB timestamp=1299283401019
(full .META. row below)
So, I'm guessing that it was in the midst of splitting and did not complete.
How can I recover from this situation?
thanks,
Marc
----------- .META. output ----------------
hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advert""
COLUMN
CELL
info:regioninfo timestamp=1299301154675,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DS
NR - Filesharing
ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
\x01starttime\x021295175600,1297185243218.6147d3
696ba9db3a85e3afd08d0bc59a.',
STARTKEY => 'advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
domain\x02thejesperbay.com
\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
'advertiser\x02OpenX PSA\x01coun
try\x02United Arab
Emirates\x01publisher\
x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
entertainment\x01publishe
r_tag\x02mmx.arts and
entertainment\x01starttime\x021295877600', ENCODED =>
6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
IT => true, TABLE => {{NAME =>
'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0'
, VERSIONS => '1', COMPRESSION
=> 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
BLOCKCACHE => 'true'},
{NAME => 'topn', BLOOMFILTER
=> 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
TTL => '2147483647', BLOCK
SIZE => '1048576', IN_MEMORY
=> 'false', BLOCKCACHE =>
'true'}]}}
info:server timestamp=1299181144063,
value=ip-10-17-24-121.ec2.internal:60020
info:serverstartcode timestamp=1299181144063,
value=1299180905510
info:splitA timestamp=1299283401019,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DS
NR - Filesharing
ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
\x01starttime\x021295175600,1299283399612.3b278f
1b0ea78af239409efc4f0b2a3d.',
STARTKEY => 'advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
domain\x02thejesperbay.com
\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
'advertiser\x02OpenX PSA\x01coun
try\x02Taiwan\x01domain\
x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
entertainment\x01publisher_tag\x02\x01starttime\x0212
96910800', ENCODED =>
3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
FAMILIES => [{NAME => 'metrics', BLO
OMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_
MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
'0', VERSIONS => '1', COMPR
ESSION => 'GZ', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
info:splitB timestamp=1299283401019,
value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Taiwan\x01domain\x02kanzh
ongguo.com\x01advertiser_tag\x02mmx.arts
and
entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
2a363812d068792311d3a9b.',
STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
x02kanzhongguo.com\x01advertiser_ta
g\x02mmx.arts and
entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
'advertiser\x02OpenX PSA\x01country\x0
2United Arab
Emirates\x01publisher\
x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
entertainment\x01publisher_tag\
x02mmx.arts and
entertainment\x01starttime\x021295877600', ENCODED =>
9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
_event_v2', FAMILIES => [{NAME
=> 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'1', COMPRESSION => 'GZ'
, TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'topn', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0',
VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', B
LOCKCACHE =>
'true'}]}}
5 row(s) in 0.3960 seconds
------------ hbck -details output -------------
...
ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
on HDFS, but not listed in META or deployed on any region server.
ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
on HDFS, but not listed in META or deployed on any region server.
*11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
opx_ad_event_v2,advertiser\x02OpenX
PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
offline, split, parent, ignoring.
*ERROR: Region
hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
on HDFS, but not listed in META or deployed on any region server.
*Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
E*RROR: Found inconsistency in table opx_ad_event_v2
Summary:
-ROOT- is okay.
Number of regions: 1
Deployed on: ip-10-17-24-121.ec2.internal:60020
.META. is okay.
Number of regions: 1
Deployed on: ip-10-17-5-252.ec2.internal:60020
...
Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
Table opx_ad_event_v2 is inconsistent.
Number of regions: 1612
Deployed on: ...
4 inconsistencies detected.
Status: INCONSISTENT
Re: Hbck errors
Posted by Stack <sa...@gmail.com>.
Are the daughter regions in the meta table? Are both there? Are they both online?
You could try enabling the parent region
On Mar 5, 2011, at 16:22, Marc Limotte <ms...@gmail.com> wrote:
> We had an issue a day ago with some OOME's on the region servers. The
> master shutdown ok, but most of the RegionServers didn't and so eventually
> had to kill -9 them. Brought it all back up and ran a major compaction to
> change the hbase block size. This seemed to work, but now we have an
> inconsistency which is preventing bulk loads from continuing.
>
> hbase hbck -details finds an inconsistency. I tried -fix, but no help.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> *
> * *
> hbck also notes that this region is offline:
>
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.*
>
> Looking in .META. I see that the region is indeed offline, and appears to be
> split:
>
> info:regioninfo timestamp=1299301154675 ...
> OFFLINE => true,
> info:splitA timestamp=1299283401019
> info:splitB timestamp=1299283401019
> (full .META. row below)
>
> So, I'm guessing that it was in the midst of splitting and did not complete.
>
> How can I recover from this situation?
>
> thanks,
> Marc
>
> ----------- .META. output ----------------
>
> hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advert""
> COLUMN
> CELL
>
> info:regioninfo timestamp=1299301154675,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
> NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3
> 696ba9db3a85e3afd08d0bc59a.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
> try\x02United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publishe
> r_tag\x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> IT => true, TABLE => {{NAME =>
> 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0'
> , VERSIONS => '1', COMPRESSION
> => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'},
> {NAME => 'topn', BLOOMFILTER
> => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
> TTL => '2147483647', BLOCK
> SIZE => '1048576', IN_MEMORY
> => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:server timestamp=1299181144063,
> value=ip-10-17-24-121.ec2.internal:60020
>
> info:serverstartcode timestamp=1299181144063,
> value=1299180905510
>
> info:splitA timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
> NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1299283399612.3b278f
> 1b0ea78af239409efc4f0b2a3d.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
> try\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x0212
> 96910800', ENCODED =>
> 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> FAMILIES => [{NAME => 'metrics', BLO
> OMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_
> MEMORY => 'false', BLOCKCACHE
> => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0', VERSIONS => '1', COMPR
> ESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:splitB timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Taiwan\x01domain\x02kanzh
>
> ongguo.com\x01advertiser_tag\x02mmx.arts
> and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> 2a363812d068792311d3a9b.',
> STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_ta
> g\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01country\x0
> 2United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\
> x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> _event_v2', FAMILIES => [{NAME
> => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> '1', COMPRESSION => 'GZ'
> , TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'topn', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0',
> VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', B
> LOCKCACHE =>
> 'true'}]}}
>
> 5 row(s) in 0.3960 seconds
>
>
>
> ------------ hbck -details output -------------
> ...
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> on HDFS, but not listed in META or deployed on any region server.
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> on HDFS, but not listed in META or deployed on any region server.
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.
> *ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> on HDFS, but not listed in META or deployed on any region server.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> E*RROR: Found inconsistency in table opx_ad_event_v2
> Summary:
> -ROOT- is okay.
> Number of regions: 1
> Deployed on: ip-10-17-24-121.ec2.internal:60020
> .META. is okay.
> Number of regions: 1
> Deployed on: ip-10-17-5-252.ec2.internal:60020
> ...
> Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> Table opx_ad_event_v2 is inconsistent.
> Number of regions: 1612
> Deployed on: ...
> 4 inconsistencies detected.
> Status: INCONSISTENT
Re: Hbck errors
Posted by 茅旭峰 <m9...@gmail.com>.
Hi Marc,
Can you give some clues/links about how to manipulate the .META.
in details? For example, how to build an encoded region name, for
filling up the holes in .META., and how to assign the hole region then
compact it. Finally, I also tried check_meta.rb, it does not work for me.
Any suggestion would be highly appreciated! Thanks in advance!
Mao Xu-Feng
On Mon, Mar 7, 2011 at 2:08 AM, Marc Limotte <ms...@gmail.com> wrote:
> We have resolved the issue. Details follow as "lessons learned".
>
> Yes, Stack, the info:splitA/B columns followed, as part of the offline'd
> parents .META. row. But the regions that splitA/B point to *did not
> exist*in .META.
>
> Also, after the original email, we checked in HDFS and found the parent
> region directory with 4 data files. And also directories for each daughter
> region (with 4 small files each--- presumably references to the original).
>
> So, it looks like (not totally sure about the exact order, but something
> like):
>
> 1. split started on region A
> 2. region A was offlined
> 3. The daughter regions were created in HDFS with the reference files
> 4. .META. was updated for region A
> 5. **** server crashed
>
> So, the new daughter entries were never added to .META.
>
> We first tried to online region A with the shell command "assign'.
> Figuring
> that hbase would just find and split region A again. This seemed to have
> no
> effect... not sure why, maybe because region A already had splitA/B
> entries? Region A remained offline. We also tried to force it to split
> region A, using the shell command "split". Again no effect.
>
> Finally we tried to manually complete the split that had started. Peter
> manually inserted the two daughter regions into .META. We then tried to
> force a compact from the shell, this failed with a NSRE. So we onlined
> region A with the "assign" command-- it worked this time. And now we seem
> to be up again, compact works, data loads work, hbck checks out!
>
> As a side note, hbck gave me some good feedback to help investigate the
> problem; although the "-fix" didn't help in this case. It would be nice if
> there was a tool or shell command to create a region given name, hdfs-path,
> start and end keys.
>
> Also, check_meta.rb threw me off track, because it did not detect any holes
> when they did in fact exist. This made me discount the most obvious
> scenario, since I believed there were no holes. Looking at the source for
> bin/check_meta.rb, I see the issue:
>
> if oldHRI.isOffline() && Bytes.equals(oldHRI.getStartKey(),
> hri.getStartKey())
> # Presume offlined parent
> elsif Bytes.equals(oldHRI.getEndKey(), hri.getStartKey())
> # Start key of next matches end key of previous
> ...
>
> When checking for holes, it does not properly account for offline regions.
> The first condition doesn't apply because oldHRI.start != hri.start. The
> second condition does apply (oldHRI.end = hri.start) and so it continues on
> thinking there is no "problem" here. Instead, I think the second condition
> should be:
>
> ...
> elsif *!oldHRI.isOffline() && *Bytes.equals(oldHRI.getEndKey(),
> hri.getStartKey())
> # Start key of next matches end key of previous
> ...
>
> Marc
>
>
> On Sun, Mar 6, 2011 at 9:18 AM, Stack <st...@duboce.net> wrote:
>
> > So, yeah Marc, what are the rows that follow the ones you post below?
> > Are they the info:splitA and info:splitB or something else?
> > Thanks,
> > St.Ack
> >
> > On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com>
> wrote:
> > > We had an issue a day ago with some OOME's on the region servers. The
> > > master shutdown ok, but most of the RegionServers didn't and so
> > eventually
> > > had to kill -9 them. Brought it all back up and ran a major compaction
> > to
> > > change the hbase block size. This seemed to work, but now we have an
> > > inconsistency which is preventing bulk loads from continuing.
> > >
> > > hbase hbck -details finds an inconsistency. I tried -fix, but no help.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > *
> > > * *
> > > hbck also notes that this region is offline:
> > >
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.*
> > >
> > > Looking in .META. I see that the region is indeed offline, and appears
> to
> > be
> > > split:
> > >
> > > info:regioninfo timestamp=1299301154675 ...
> > > OFFLINE => true,
> > > info:splitA timestamp=1299283401019
> > > info:splitB timestamp=1299283401019
> > > (full .META. row below)
> > >
> > > So, I'm guessing that it was in the midst of splitting and did not
> > complete.
> > >
> > > How can I recover from this situation?
> > >
> > > thanks,
> > > Marc
> > >
> > > ----------- .META. output ----------------
> > >
> > > hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advert""
> > > COLUMN
> > > CELL
> > >
> > > info:regioninfo timestamp=1299301154675,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > > NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1297185243218.6147d3
> > >
> > 696ba9db3a85e3afd08d0bc59a.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > > domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > > try\x02United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publishe
> > > r_tag\x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> > > IT => true, TABLE =>
> {{NAME
> > =>
> > > 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER =>
> > 'NONE',
> > > REPLICATION_SCOPE => '0'
> > > , VERSIONS => '1',
> > COMPRESSION
> > > => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY =>
> > 'false',
> > > BLOCKCACHE => 'true'},
> > > {NAME => 'topn',
> BLOOMFILTER
> > > => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS =>
> > '1',
> > > TTL => '2147483647', BLOCK
> > > SIZE => '1048576',
> IN_MEMORY
> > > => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > > info:server timestamp=1299181144063,
> > > value=ip-10-17-24-121.ec2.internal:60020
> > >
> > > info:serverstartcode timestamp=1299181144063,
> > > value=1299180905510
> > >
> > > info:splitA timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > > NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1299283399612.3b278f
> > >
> > 1b0ea78af239409efc4f0b2a3d.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > > domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > > try\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x0212
> > > 96910800', ENCODED =>
> > > 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> > > FAMILIES => [{NAME => 'metrics', BLO
> > > OMFILTER => 'NONE',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_
> > > MEMORY => 'false',
> > BLOCKCACHE
> > > => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
> =>
> > > '0', VERSIONS => '1', COMPR
> > > ESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > > info:splitB timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Taiwan\x01domain\x02kanzh
> > >
> > > ongguo.com\x01advertiser_tag\x02mmx.arts
> > > and
> > >
> >
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> > > 2a363812d068792311d3a9b.',
> > > STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_ta
> > > g\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01country\x0
> > > 2United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publisher_tag\
> > > x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> > > _event_v2', FAMILIES =>
> > [{NAME
> > > => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
> > =>
> > > '1', COMPRESSION => 'GZ'
> > > , TTL => '2147483647',
> > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
> {NAME
> > =>
> > > 'topn', BLOOMFILTER => 'NONE',
> > > REPLICATION_SCOPE => '0',
> > > VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> > > '65536', IN_MEMORY => 'false', B
> > > LOCKCACHE =>
> > > 'true'}]}}
> > >
> > > 5 row(s) in 0.3960 seconds
> > >
> > >
> > >
> > > ------------ hbck -details output -------------
> > > ...
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> > > on HDFS, but not listed in META or deployed on any region server.
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.
> > > *ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > E*RROR: Found inconsistency in table opx_ad_event_v2
> > > Summary:
> > > -ROOT- is okay.
> > > Number of regions: 1
> > > Deployed on: ip-10-17-24-121.ec2.internal:60020
> > > .META. is okay.
> > > Number of regions: 1
> > > Deployed on: ip-10-17-5-252.ec2.internal:60020
> > > ...
> > > Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > Table opx_ad_event_v2 is inconsistent.
> > > Number of regions: 1612
> > > Deployed on: ...
> > > 4 inconsistencies detected.
> > > Status: INCONSISTENT
> > >
> >
>
Re: Hbck errors
Posted by Adam Phelps <am...@opendns.com>.
On 3/21/11 10:13 PM, Stack wrote:
> On Mon, Mar 21, 2011 at 7:19 PM, Adam Phelps<am...@opendns.com> wrote:
>> It looks like we've come up against a problem that looks identical to the
>> one you described. How did you go about manually inserting the two child
>> regions?
>>
>
> You know the daughter regions because they should be listed when you
> look at the parent in .META. It should have info:splitA and
> info:splitB columns with the daughters listed. Take the encoded name
> of the daughters. Look in hdfs. Are the regions there? If so,
> insert regions named the same as those in info:splitA and info:splitB.
> Take the parent region for the template making the HRegionInfo.
>
> Poke around in bin/*rb scripts to see examples of reading HRegionInfo,
> amending it, and insert into .META.
We attempted to do this via the "put" command in the shell, however when
we then try to read the entry we get a VersionMismatchException:
ERROR: org.apache.hadoop.io.VersionMismatchException: null
Backtrace: VersionedWritable.java:46:in
`org.apache.hadoop.io.VersionedWritable.readFields'
HRegionInfo.java:625:in
`org.apache.hadoop.hbase.HRegionInfo.readFields'
Writables.java:105:in
`org.apache.hadoop.hbase.util.Writables.getWritable'
Writables.java:75:in
`org.apache.hadoop.hbase.util.Writables.getWritable'
Writables.java:119:in
`org.apache.hadoop.hbase.util.Writables.getHRegionInfo'
Writables.java:130:in
`org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull'
Looking through the code (we're using CDH3B4) it looks like the version
in HRegionInfo is hardcoded to 0, whereas the version used by Put (in
Put.java) is hardcoded to 1.
Is there an alternative means of adding an entry for the child regions?
I've looked at the raw data in /hbase/.META. however it looks to be
binary data and so I'm hesitant to try editing it that way if at all
possible.
- Adam
Re: Hbck errors
Posted by Stack <st...@duboce.net>.
On Mon, Mar 21, 2011 at 7:19 PM, Adam Phelps <am...@opendns.com> wrote:
> It looks like we've come up against a problem that looks identical to the
> one you described. How did you go about manually inserting the two child
> regions?
>
You know the daughter regions because they should be listed when you
look at the parent in .META. It should have info:splitA and
info:splitB columns with the daughters listed. Take the encoded name
of the daughters. Look in hdfs. Are the regions there? If so,
insert regions named the same as those in info:splitA and info:splitB.
Take the parent region for the template making the HRegionInfo.
Poke around in bin/*rb scripts to see examples of reading HRegionInfo,
amending it, and insert into .META.
> Our current thought on fixing it is to use the hbase shell to remove the
> entries for the child regions and rewrite the region's entry such that
> OFFLINE => false and SPLIT => false (ie both currently true) but we're not
> sure if thats a good solution.
>
You could change the split flag to true and then try onlining parent
again (try calling assign). That might get it back up. Before doing
this though, you should remove daughters from hdfs if they are present
(see above for how to figure daughter regions -- or go to the
regionserver that was hosting parent and find the split message, it'll
list the daughters).
St.Ack
Re: Hbck errors
Posted by Adam Phelps <am...@opendns.com>.
On 3/6/11 10:08 AM, Marc Limotte wrote:
> 1. split started on region A
> 2. region A was offlined
> 3. The daughter regions were created in HDFS with the reference files
> 4. .META. was updated for region A
> 5. **** server crashed
>
> So, the new daughter entries were never added to .META.
>
> We first tried to online region A with the shell command "assign'. Figuring
> that hbase would just find and split region A again. This seemed to have no
> effect... not sure why, maybe because region A already had splitA/B
> entries? Region A remained offline. We also tried to force it to split
> region A, using the shell command "split". Again no effect.
>
> Finally we tried to manually complete the split that had started. Peter
> manually inserted the two daughter regions into .META. We then tried to
> force a compact from the shell, this failed with a NSRE. So we onlined
> region A with the "assign" command-- it worked this time. And now we seem
> to be up again, compact works, data loads work, hbck checks out!
It looks like we've come up against a problem that looks identical to
the one you described. How did you go about manually inserting the two
child regions?
Our current thought on fixing it is to use the hbase shell to remove the
entries for the child regions and rewrite the region's entry such that
OFFLINE => false and SPLIT => false (ie both currently true) but we're
not sure if thats a good solution.
*** .META. info for the problem region ***
domains,1932334:2011/02/18/03:com.photobucket.i654,1
column=info:regioninfo, timestamp=1300387322414, value=REGION => {NAME
=>
'domains,1932334:2011/02/18/03:com.photobucket.i654,1299792156289.3824e8b8310176b
299792156289.3824e8b8310176b6f3c2a1d3f3e708dc.
6f3c2a1d3f3e708dc.', STARTKEY =>
'1932334:2011/02/18/03:com.photobucket.i654', ENDKEY =>
'1933201:2011/03/02/09:org.wikipedia.af', ENCODED => 3824e8b831017
6b6f3c2a1d3f3e708dc, OFFLINE => true, SPLIT => true, TABLE => {{NAME =>
'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER => 'NONE',
REPLICATION_SCO
PE => '0',
COMPRESSION => 'LZO', VERSIONS => '1', TTL => '1000000000', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
domains,1932334:2011/02/18/03:com.photobucket.i654,1
column=info:server, timestamp=1300757176451,
value=s8.sjc.opendns.com:60020
299792156289.3824e8b8310176b6f3c2a1d3f3e708dc.
domains,1932334:2011/02/18/03:com.photobucket.i654,1
column=info:serverstartcode, timestamp=1300757176451,
value=1300752817197
299792156289.3824e8b8310176b6f3c2a1d3f3e708dc.
domains,1932334:2011/02/18/03:com.photobucket.i654,1
column=info:splitA, timestamp=1300387322414, value=REGION => {NAME =>
'domains,1932334:2011/02/18/03:com.photobucket.i654,1300387311068.3fbd783ab2a3de505fd
299792156289.3824e8b8310176b6f3c2a1d3f3e708dc. 5607748d82ec7.',
STARTKEY => '1932334:2011/02/18/03:com.photobucket.i654', ENDKEY =>
'1932968:2010/11/10/12:com.twitter', ENCODED => 3fbd783ab2a3de505fd560
7748d82ec7, TABLE
=> {{NAME => 'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS
=> '1', TTL =>
'1000000000', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
domains,1932334:2011/02/18/03:com.photobucket.i654,1
column=info:splitB, timestamp=1300387322414, value=REGION => {NAME =>
'domains,1932968:2010/11/10/12:com.twitter,1300387311068.6e95a3361da531a57b5883014c04
299792156289.3824e8b8310176b6f3c2a1d3f3e708dc. 7cdc.', STARTKEY
=> '1932968:2010/11/10/12:com.twitter', ENDKEY =>
'1933201:2011/03/02/09:org.wikipedia.af', ENCODED =>
6e95a3361da531a57b5883014c047cdc, T
ABLE => {{NAME =>
'domains', FAMILIES => [{NAME => 'handling', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL
=> '1000000000',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- Adam
Re: Hbck errors
Posted by Marc Limotte <ms...@gmail.com>.
We have resolved the issue. Details follow as "lessons learned".
Yes, Stack, the info:splitA/B columns followed, as part of the offline'd
parents .META. row. But the regions that splitA/B point to *did not
exist*in .META.
Also, after the original email, we checked in HDFS and found the parent
region directory with 4 data files. And also directories for each daughter
region (with 4 small files each--- presumably references to the original).
So, it looks like (not totally sure about the exact order, but something
like):
1. split started on region A
2. region A was offlined
3. The daughter regions were created in HDFS with the reference files
4. .META. was updated for region A
5. **** server crashed
So, the new daughter entries were never added to .META.
We first tried to online region A with the shell command "assign'. Figuring
that hbase would just find and split region A again. This seemed to have no
effect... not sure why, maybe because region A already had splitA/B
entries? Region A remained offline. We also tried to force it to split
region A, using the shell command "split". Again no effect.
Finally we tried to manually complete the split that had started. Peter
manually inserted the two daughter regions into .META. We then tried to
force a compact from the shell, this failed with a NSRE. So we onlined
region A with the "assign" command-- it worked this time. And now we seem
to be up again, compact works, data loads work, hbck checks out!
As a side note, hbck gave me some good feedback to help investigate the
problem; although the "-fix" didn't help in this case. It would be nice if
there was a tool or shell command to create a region given name, hdfs-path,
start and end keys.
Also, check_meta.rb threw me off track, because it did not detect any holes
when they did in fact exist. This made me discount the most obvious
scenario, since I believed there were no holes. Looking at the source for
bin/check_meta.rb, I see the issue:
if oldHRI.isOffline() && Bytes.equals(oldHRI.getStartKey(),
hri.getStartKey())
# Presume offlined parent
elsif Bytes.equals(oldHRI.getEndKey(), hri.getStartKey())
# Start key of next matches end key of previous
...
When checking for holes, it does not properly account for offline regions.
The first condition doesn't apply because oldHRI.start != hri.start. The
second condition does apply (oldHRI.end = hri.start) and so it continues on
thinking there is no "problem" here. Instead, I think the second condition
should be:
...
elsif *!oldHRI.isOffline() && *Bytes.equals(oldHRI.getEndKey(),
hri.getStartKey())
# Start key of next matches end key of previous
...
Marc
On Sun, Mar 6, 2011 at 9:18 AM, Stack <st...@duboce.net> wrote:
> So, yeah Marc, what are the rows that follow the ones you post below?
> Are they the info:splitA and info:splitB or something else?
> Thanks,
> St.Ack
>
> On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com> wrote:
> > We had an issue a day ago with some OOME's on the region servers. The
> > master shutdown ok, but most of the RegionServers didn't and so
> eventually
> > had to kill -9 them. Brought it all back up and ran a major compaction
> to
> > change the hbase block size. This seemed to work, but now we have an
> > inconsistency which is preventing bulk loads from continuing.
> >
> > hbase hbck -details finds an inconsistency. I tried -fix, but no help.
> > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > *
> > * *
> > hbck also notes that this region is offline:
> >
> > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > offline, split, parent, ignoring.*
> >
> > Looking in .META. I see that the region is indeed offline, and appears to
> be
> > split:
> >
> > info:regioninfo timestamp=1299301154675 ...
> > OFFLINE => true,
> > info:splitA timestamp=1299283401019
> > info:splitB timestamp=1299283401019
> > (full .META. row below)
> >
> > So, I'm guessing that it was in the midst of splitting and did not
> complete.
> >
> > How can I recover from this situation?
> >
> > thanks,
> > Marc
> >
> > ----------- .META. output ----------------
> >
> > hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advert""
> > COLUMN
> > CELL
> >
> > info:regioninfo timestamp=1299301154675,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DS
> > NR - Filesharing
> > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > \x01starttime\x021295175600,1297185243218.6147d3
> >
> 696ba9db3a85e3afd08d0bc59a.',
> > STARTKEY => 'advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > domain\x02thejesperbay.com
> > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01coun
> > try\x02United Arab
> > Emirates\x01publisher\
> > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > entertainment\x01publishe
> > r_tag\x02mmx.arts and
> > entertainment\x01starttime\x021295877600', ENCODED =>
> > 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> > IT => true, TABLE => {{NAME
> =>
> > 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER =>
> 'NONE',
> > REPLICATION_SCOPE => '0'
> > , VERSIONS => '1',
> COMPRESSION
> > => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY =>
> 'false',
> > BLOCKCACHE => 'true'},
> > {NAME => 'topn', BLOOMFILTER
> > => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS =>
> '1',
> > TTL => '2147483647', BLOCK
> > SIZE => '1048576', IN_MEMORY
> > => 'false', BLOCKCACHE =>
> > 'true'}]}}
> > info:server timestamp=1299181144063,
> > value=ip-10-17-24-121.ec2.internal:60020
> >
> > info:serverstartcode timestamp=1299181144063,
> > value=1299180905510
> >
> > info:splitA timestamp=1299283401019,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DS
> > NR - Filesharing
> > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > \x01starttime\x021295175600,1299283399612.3b278f
> >
> 1b0ea78af239409efc4f0b2a3d.',
> > STARTKEY => 'advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > domain\x02thejesperbay.com
> > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01coun
> > try\x02Taiwan\x01domain\
> > x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> > entertainment\x01publisher_tag\x02\x01starttime\x0212
> > 96910800', ENCODED =>
> > 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> > FAMILIES => [{NAME => 'metrics', BLO
> > OMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_
> > MEMORY => 'false',
> BLOCKCACHE
> > => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> > '0', VERSIONS => '1', COMPR
> > ESSION => 'GZ', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > 'true'}]}}
> > info:splitB timestamp=1299283401019,
> > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Taiwan\x01domain\x02kanzh
> >
> > ongguo.com\x01advertiser_tag\x02mmx.arts
> > and
> >
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> > 2a363812d068792311d3a9b.',
> > STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> > x02kanzhongguo.com\x01advertiser_ta
> > g\x02mmx.arts and
> > entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> > 'advertiser\x02OpenX PSA\x01country\x0
> > 2United Arab
> > Emirates\x01publisher\
> > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > entertainment\x01publisher_tag\
> > x02mmx.arts and
> > entertainment\x01starttime\x021295877600', ENCODED =>
> > 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> > _event_v2', FAMILIES =>
> [{NAME
> > => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
> =>
> > '1', COMPRESSION => 'GZ'
> > , TTL => '2147483647',
> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME
> =>
> > 'topn', BLOOMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0',
> > VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> > '65536', IN_MEMORY => 'false', B
> > LOCKCACHE =>
> > 'true'}]}}
> >
> > 5 row(s) in 0.3960 seconds
> >
> >
> >
> > ------------ hbck -details output -------------
> > ...
> > ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> > on HDFS, but not listed in META or deployed on any region server.
> > ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> > on HDFS, but not listed in META or deployed on any region server.
> > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > opx_ad_event_v2,advertiser\x02OpenX
> > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> ROW\x01domain\
> > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > offline, split, parent, ignoring.
> > *ERROR: Region
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> > on HDFS, but not listed in META or deployed on any region server.
> > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > E*RROR: Found inconsistency in table opx_ad_event_v2
> > Summary:
> > -ROOT- is okay.
> > Number of regions: 1
> > Deployed on: ip-10-17-24-121.ec2.internal:60020
> > .META. is okay.
> > Number of regions: 1
> > Deployed on: ip-10-17-5-252.ec2.internal:60020
> > ...
> > Chain of regions in table opx_ad_event_v2 is broken; edges does not
> contain
> > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > Table opx_ad_event_v2 is inconsistent.
> > Number of regions: 1612
> > Deployed on: ...
> > 4 inconsistencies detected.
> > Status: INCONSISTENT
> >
>
Re: Hbck errors
Posted by Stack <st...@duboce.net>.
So, yeah Marc, what are the rows that follow the ones you post below?
Are they the info:splitA and info:splitB or something else?
Thanks,
St.Ack
On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com> wrote:
> We had an issue a day ago with some OOME's on the region servers. The
> master shutdown ok, but most of the RegionServers didn't and so eventually
> had to kill -9 them. Brought it all back up and ran a major compaction to
> change the hbase block size. This seemed to work, but now we have an
> inconsistency which is preventing bulk loads from continuing.
>
> hbase hbck -details finds an inconsistency. I tried -fix, but no help.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> *
> * *
> hbck also notes that this region is offline:
>
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.*
>
> Looking in .META. I see that the region is indeed offline, and appears to be
> split:
>
> info:regioninfo timestamp=1299301154675 ...
> OFFLINE => true,
> info:splitA timestamp=1299283401019
> info:splitB timestamp=1299283401019
> (full .META. row below)
>
> So, I'm guessing that it was in the midst of splitting and did not complete.
>
> How can I recover from this situation?
>
> thanks,
> Marc
>
> ----------- .META. output ----------------
>
> hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advert""
> COLUMN
> CELL
>
> info:regioninfo timestamp=1299301154675,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
> NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1297185243218.6147d3
> 696ba9db3a85e3afd08d0bc59a.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
> try\x02United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publishe
> r_tag\x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> IT => true, TABLE => {{NAME =>
> 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0'
> , VERSIONS => '1', COMPRESSION
> => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'},
> {NAME => 'topn', BLOOMFILTER
> => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
> TTL => '2147483647', BLOCK
> SIZE => '1048576', IN_MEMORY
> => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:server timestamp=1299181144063,
> value=ip-10-17-24-121.ec2.internal:60020
>
> info:serverstartcode timestamp=1299181144063,
> value=1299180905510
>
> info:splitA timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DS
> NR - Filesharing
> ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> \x01starttime\x021295175600,1299283399612.3b278f
> 1b0ea78af239409efc4f0b2a3d.',
> STARTKEY => 'advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> domain\x02thejesperbay.com
> \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01coun
> try\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x0212
> 96910800', ENCODED =>
> 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> FAMILIES => [{NAME => 'metrics', BLO
> OMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_
> MEMORY => 'false', BLOCKCACHE
> => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0', VERSIONS => '1', COMPR
> ESSION => 'GZ', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}]}}
> info:splitB timestamp=1299283401019,
> value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Taiwan\x01domain\x02kanzh
>
> ongguo.com\x01advertiser_tag\x02mmx.arts
> and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> 2a363812d068792311d3a9b.',
> STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> x02kanzhongguo.com\x01advertiser_ta
> g\x02mmx.arts and
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY =>
> 'advertiser\x02OpenX PSA\x01country\x0
> 2United Arab
> Emirates\x01publisher\
> x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts and
> entertainment\x01publisher_tag\
> x02mmx.arts and
> entertainment\x01starttime\x021295877600', ENCODED =>
> 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> _event_v2', FAMILIES => [{NAME
> => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> '1', COMPRESSION => 'GZ'
> , TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'topn', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0',
> VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', B
> LOCKCACHE =>
> 'true'}]}}
>
> 5 row(s) in 0.3960 seconds
>
>
>
> ------------ hbck -details output -------------
> ...
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> on HDFS, but not listed in META or deployed on any region server.
> ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> on HDFS, but not listed in META or deployed on any region server.
> *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> opx_ad_event_v2,advertiser\x02OpenX
> PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01domain\
> x02thejesperbay.com\x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> offline, split, parent, ignoring.
> *ERROR: Region
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> on HDFS, but not listed in META or deployed on any region server.
> *Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> E*RROR: Found inconsistency in table opx_ad_event_v2
> Summary:
> -ROOT- is okay.
> Number of regions: 1
> Deployed on: ip-10-17-24-121.ec2.internal:60020
> .META. is okay.
> Number of regions: 1
> Deployed on: ip-10-17-5-252.ec2.internal:60020
> ...
> Chain of regions in table opx_ad_event_v2 is broken; edges does not contain
> advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> Table opx_ad_event_v2 is inconsistent.
> Number of regions: 1612
> Deployed on: ...
> 4 inconsistencies detected.
> Status: INCONSISTENT
>