You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 茅旭峰 <m9...@gmail.com> on 2011/04/16 05:52:52 UTC

Re: Hbck errors

Hi Marc,

Can you give some clues/links about how to manipulate the .META.
in details? For example, how to build an encoded region name, for
filling up the holes in .META., and how to assign the hole region then
compact it. Finally, I also tried check_meta.rb, it does not work for me.
Any suggestion would be highly appreciated! Thanks in advance!

Mao Xu-Feng

On Mon, Mar 7, 2011 at 2:08 AM, Marc Limotte <ms...@gmail.com> wrote:

> We have resolved the issue.  Details follow as "lessons learned".
>
> Yes, Stack, the info:splitA/B columns followed, as part of the offline'd
> parents .META. row.  But the regions that splitA/B point to *did not
> exist*in .META.
>
> Also, after the original email, we checked in HDFS and found the parent
> region directory with 4 data files.  And also directories for each daughter
> region (with 4 small files each--- presumably references to the original).
>
> So, it looks like (not totally sure about the exact order, but something
> like):
>
>   1. split started on region A
>   2. region A was offlined
>   3. The daughter regions were created in HDFS with the reference files
>   4. .META. was updated for region A
>   5. **** server crashed
>
> So, the new daughter entries were never added to .META.
>
> We first tried to online region A with the shell command "assign'.
>  Figuring
> that hbase would just find and split region A again.  This seemed to have
> no
> effect... not sure why, maybe because region A already had splitA/B
> entries?  Region A remained offline.  We also tried to force it to split
> region A, using the shell command "split".  Again no effect.
>
> Finally we tried to manually complete the split that had started.  Peter
> manually inserted the two daughter regions into .META.  We then tried to
> force a compact from the shell, this failed with a NSRE.  So we onlined
> region A with the "assign"  command-- it worked this time.  And now we seem
> to be up again, compact works, data loads work, hbck checks out!
>
> As a side note, hbck gave me some good feedback to help investigate the
> problem; although the "-fix" didn't help in this case.  It would be nice if
> there was a tool or shell command to create a region given name, hdfs-path,
> start and end keys.
>
> Also, check_meta.rb threw me off track, because it did not detect any holes
> when they did in fact exist.  This made me discount the most obvious
> scenario, since I believed there were no holes.  Looking at the source for
> bin/check_meta.rb, I see the issue:
>
> if oldHRI.isOffline() && Bytes.equals(oldHRI.getStartKey(),
> hri.getStartKey())
>  # Presume offlined parent
> elsif Bytes.equals(oldHRI.getEndKey(), hri.getStartKey())
>  # Start key of next matches end key of previous
> ...
>
> When checking for holes, it does not properly account for offline regions.
> The first condition doesn't apply because oldHRI.start != hri.start.  The
> second condition does apply (oldHRI.end = hri.start) and so it continues on
> thinking there is no "problem" here.  Instead, I think the second condition
> should be:
>
> ...
> elsif *!oldHRI.isOffline() && *Bytes.equals(oldHRI.getEndKey(),
> hri.getStartKey())
>  # Start key of next matches end key of previous
> ...
>
> Marc
>
>
> On Sun, Mar 6, 2011 at 9:18 AM, Stack <st...@duboce.net> wrote:
>
> > So, yeah Marc, what are the rows that follow the ones you post below?
> > Are they the info:splitA and info:splitB or something else?
> > Thanks,
> > St.Ack
> >
> > On Sat, Mar 5, 2011 at 4:22 PM, Marc Limotte <ms...@gmail.com>
> wrote:
> > > We had an issue a day ago with some OOME's on the region servers.  The
> > > master shutdown ok, but most of the RegionServers didn't and so
> > eventually
> > > had to kill -9 them.  Brought it all back up and ran a major compaction
> > to
> > > change the hbase block size.  This seemed to work, but now we have an
> > > inconsistency which is preventing bulk loads from continuing.
> > >
> > > hbase hbck -details finds an inconsistency.  I tried -fix, but no help.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > *
> > > * *
> > > hbck also notes that this region is offline:
> > >
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.*
> > >
> > > Looking in .META. I see that the region is indeed offline, and appears
> to
> > be
> > > split:
> > >
> > > info:regioninfo                           timestamp=1299301154675 ...
> > > OFFLINE => true,
> > > info:splitA                                  timestamp=1299283401019
> > > info:splitB                                  timestamp=1299283401019
> > > (full .META. row below)
> > >
> > > So, I'm guessing that it was in the midst of splitting and did not
> > complete.
> > >
> > > How can I recover from this situation?
> > >
> > > thanks,
> > > Marc
> > >
> > > ----------- .META. output ----------------
> > >
> > > hbase(main):001:0> get '.META.' , "opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advert""
> > > COLUMN
> > > CELL
> > >
> > >  info:regioninfo                              timestamp=1299301154675,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > >                                              NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1297185243218.6147d3
> > >
> >  696ba9db3a85e3afd08d0bc59a.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > >                                              domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > >                                              try\x02United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publishe
> > >                                              r_tag\x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 6147d3696ba9db3a85e3afd08d0bc59a, OFFLINE => true, SPL
> > >                                              IT => true, TABLE =>
> {{NAME
> > =>
> > > 'opx_ad_event_v2', FAMILIES => [{NAME => 'metrics', BLOOMFILTER =>
> > 'NONE',
> > > REPLICATION_SCOPE => '0'
> > >                                              , VERSIONS => '1',
> > COMPRESSION
> > > => 'GZ', TTL => '2147483647', BLOCKSIZE => '1048576', IN_MEMORY =>
> > 'false',
> > > BLOCKCACHE => 'true'},
> > >                                              {NAME => 'topn',
> BLOOMFILTER
> > > => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS =>
> > '1',
> > > TTL => '2147483647', BLOCK
> > >                                              SIZE => '1048576',
> IN_MEMORY
> > > => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > >  info:server                                  timestamp=1299181144063,
> > > value=ip-10-17-24-121.ec2.internal:60020
> > >
> > >  info:serverstartcode                         timestamp=1299181144063,
> > > value=1299180905510
> > >
> > >  info:splitA                                  timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DS
> > >                                              NR - Filesharing
> > > ROW\x01domain\x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> > > \x01starttime\x021295175600,1299283399612.3b278f
> > >
> >  1b0ea78af239409efc4f0b2a3d.',
> > > STARTKEY => 'advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing ROW\x01
> > >                                              domain\
> x02thejesperbay.com
> > > \x01advertiser_tag\x02mmx.travel\x01starttime\x021295175600', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01coun
> > >                                              try\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_tag\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x0212
> > >                                              96910800', ENCODED =>
> > > 3b278f1b0ea78af239409efc4f0b2a3d, TABLE => {{NAME => 'opx_ad_event_v2',
> > > FAMILIES => [{NAME => 'metrics', BLO
> > >                                              OMFILTER => 'NONE',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_
> > >                                              MEMORY => 'false',
> > BLOCKCACHE
> > > => 'true'}, {NAME => 'topn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
> =>
> > > '0', VERSIONS => '1', COMPR
> > >                                              ESSION => 'GZ', TTL =>
> > > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > > 'true'}]}}
> > >  info:splitB                                  timestamp=1299283401019,
> > > value=REGION => {NAME => 'opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Taiwan\x01domain\x02kanzh
> > >
> > > ongguo.com\x01advertiser_tag\x02mmx.arts
> > > and
> > >
> >
> entertainment\x01publisher_tag\x02\x01starttime\x021296910800,1299283399612.9d4164620
> > >                                              2a363812d068792311d3a9b.',
> > > STARTKEY => 'advertiser\x02OpenX PSA\x01country\x02Taiwan\x01domain\
> > > x02kanzhongguo.com\x01advertiser_ta
> > >                                              g\x02mmx.arts and
> > > entertainment\x01publisher_tag\x02\x01starttime\x021296910800', ENDKEY
> =>
> > > 'advertiser\x02OpenX PSA\x01country\x0
> > >                                              2United Arab
> > > Emirates\x01publisher\
> > > x02www.sixbillionsecrets.com/\x01advertiser_tag\x02mmx.arts<
> http://x02www.sixbillionsecrets.com/%5Cx01advertiser_tag%5Cx02mmx.arts>and
> > > entertainment\x01publisher_tag\
> > >                                              x02mmx.arts and
> > > entertainment\x01starttime\x021295877600', ENCODED =>
> > > 9d41646202a363812d068792311d3a9b, TABLE => {{NAME => 'opx_ad
> > >                                              _event_v2', FAMILIES =>
> > [{NAME
> > > => 'metrics', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS
> > =>
> > > '1', COMPRESSION => 'GZ'
> > >                                              , TTL => '2147483647',
> > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
> {NAME
> > =>
> > > 'topn', BLOOMFILTER => 'NONE',
> > >                                              REPLICATION_SCOPE => '0',
> > > VERSIONS => '1', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE =>
> > > '65536', IN_MEMORY => 'false', B
> > >                                              LOCKCACHE =>
> > > 'true'}]}}
> > >
> > > 5 row(s) in 0.3960 seconds
> > >
> > >
> > >
> > > ------------ hbck -details output -------------
> > > ...
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/34a0ffe60da97431a809f0ffe8e5328a
> > > on HDFS, but not listed in META or deployed on any region server.
> > > ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/3b278f1b0ea78af239409efc4f0b2a3d
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *11/03/05 23:33:16 DEBUG util.HBaseFsck: Region
> > > opx_ad_event_v2,advertiser\x02OpenX
> > > PSA\x01country\x02Serbia\x01publisher\x02DSNR - Filesharing
> > ROW\x01domain\
> > > x02thejesperbay.com\x01advertiser_tag\x02mmx.travel
> >
> \x01starttime\x021295175600,1297185243218.6147d3696ba9db3a85e3afd08d0bc59a.
> > > offline, split, parent, ignoring.
> > > *ERROR: Region
> > >
> >
> hdfs://ip-10-17-5-253.ec2.internal:9000/hbase/opx_ad_event_v2/9d41646202a363812d068792311d3a9b
> > > on HDFS, but not listed in META or deployed on any region server.
> > > *Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > E*RROR: Found inconsistency in table opx_ad_event_v2
> > > Summary:
> > >  -ROOT- is okay.
> > >    Number of regions: 1
> > >    Deployed on:  ip-10-17-24-121.ec2.internal:60020
> > >  .META. is okay.
> > >    Number of regions: 1
> > >    Deployed on:  ip-10-17-5-252.ec2.internal:60020
> > > ...
> > > Chain of regions in table opx_ad_event_v2 is broken; edges does not
> > contain
> > > advertiser^BOpenX PSA^Acountry^BSerbia^Apublisher^BDSNR - Filesharing
> > >
> >
> ROW^Adomain^Bthejesperbay.com^Aadvertiser_tag^Bmmx.travel^Astarttime^B1295175600
> > > Table opx_ad_event_v2 is inconsistent.
> > >    Number of regions: 1612
> > >    Deployed on: ...
> > > 4 inconsistencies detected.
> > > Status: INCONSISTENT
> > >
> >
>