You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Waterson <wa...@maubi.net> on 2012/12/09 05:30:55 UTC

hbase corruption - missing region files in HDFS

Hello!  I've gotten myself into trouble where I'm missing files on HDFS that HBase thinks ought to be there.  In particular, running "hbase hbck" yields the below message: two regions are "not deployed on any region server" (because there is no file in HDFS for the region), and "there is a hole in the region chain".

(FWIW, I suspect that this problem is due to a recent incident where we ran the cluster out of disk space.)

I'm running 0.92.1, and have been staggering around trying to figure out what procedure I ought to use to correct the problem, but my Google-fu is too poor to have yielded results.  Any pointers would be appreciated!

thanks,
chris




ERROR: Region referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. not deployed on any region server.
ERROR: Region referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8. not deployed on any region server.
ERROR: There is a hole in the region chain between com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579 and com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.  You need to create a new regioninfo and region dir in hdfs to plug the hole.


Re: hbase corruption - missing region files in HDFS

Posted by Chris Waterson <wa...@maubi.net>.
You bet; see below.  It's a Scala script, and will run as-is if you've got Scala installed.  It should be easy to translate to Java, however.

chris




#!/bin/sh
exec scala -cp `hbase classpath` $0 $@
!#

// Creates a file "/tmp/hfile.dat" that's an empty HFile.
import org.apache.hadoop.conf.Configuration                                                                   
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.io.hfile.HFile

object HFileTool {
  def main(args:Array[String]) = {
    val conf = new Configuration
    val path = new Path("file:///tmp/hfile.dat")
    val writer = HFile.getWriterFactory(conf).createWriter(path.getFileSystem(conf), path)
    writer.close
  }
}


On Dec 10, 2012, at 10:07 AM, Tom Brown <to...@gmail.com> wrote:

> Chris,
> 
> I really appreciate your detailed fix description!  I've run into
> similar problems (due to old hardware and bad sectors) and could never
> figure out how to fix a broken table. Hbck always seemed to just make
> things worse until I would give up and recreate the table.
> 
> Can you publish your utility that you used to create valid/empty HFiles?
> 
> --Tom
> 
> On Sun, Dec 9, 2012 at 6:08 PM, Kevin O'dell <ke...@cloudera.com> wrote:
>> Chris,
>> 
>> Thank you for the very descriptive update.
>> 
>> On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <wa...@maubi.net> wrote:
>> 
>>> Well, I upgraded to 0.92.2, since the version I was running on (0.92.1)
>>> didn't have those options for "hbck".
>>> 
>>> That helped.
>>> 
>>> It took me a while to realize that I had to make the root filesystem
>>> writable so that "hbck
>>> -repair" could create itself a directory.  So, once that was done, it at
>>> least ran through to completion.
>>> 
>>> But the problem persisted in that there were blocks in META that didn't
>>> exist on the filesystem.  One poor region server was assigned the sad task
>>> of attempting to open the non-existent directory, which it slavishly
>>> reattempted again and again, filling its log with FileNotFoundException
>>> stack traces.
>>> 
>>> For example,
>>> 
>>> 2012-12-09 00:14:33,315 ERROR
>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
>>> of
>>> region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
>>> java.io.FileNotFoundException: File does not exist:
>>> /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84
>>> etc., etc.
>>> 
>>> In particular, the directory above "/hbase/referrers/2cb553...c63" simply
>>> did not exist at all in HDFS.
>>> 
>>> So I took matters into my own hands and created the missing
>>> "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and
>>> attempted to create a zero-length file "331fd9...e84".  This changed the
>>> firehose of exceptions from FileNotFoundException to CorruptHFileException.
>>> 
>>> So, I wrote a small program to emit a valid, empty HFile, and proceeded to
>>> place these files at whatever places in HDFS that a FileNotFoundException
>>> was being thrown.  After creating three or four of them, the exceptions
>>> stopped.
>>> 
>>> I then ran "hbck -repair" again, and upon completion it declared victory.
>>> 
>>> Again, I suspect that I got myself into this problem because I ran a
>>> machine out of disk space.  It's likely that most folks are more clever
>>> than me, and so this problem hasn't arisen before. :)
>>> 
>>> 
>>> 
>>> 
>>> On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <ke...@cloudera.com>
>>> wrote:
>>> 
>>>> can you run hbase hbck -fixMeta -fixAssignments
>>>> 
>>>> This should assign those region servers and fix the hole.
>>>> 
>>>> On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <wa...@maubi.net>
>>> wrote:
>>>> 
>>>>> Hello!  I've gotten myself into trouble where I'm missing files on HDFS
>>>>> that HBase thinks ought to be there.  In particular, running "hbase
>>> hbck"
>>>>> yields the below message: two regions are "not deployed on any region
>>>>> server" (because there is no file in HDFS for the region), and "there
>>> is a
>>>>> hole in the region chain".
>>>>> 
>>>>> (FWIW, I suspect that this problem is due to a recent incident where we
>>>>> ran the cluster out of disk space.)
>>>>> 
>>>>> I'm running 0.92.1, and have been staggering around trying to figure out
>>>>> what procedure I ought to use to correct the problem, but my Google-fu
>>> is
>>>>> too poor to have yielded results.  Any pointers would be appreciated!
>>>>> 
>>>>> thanks,
>>>>> chris
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ERROR: Region
>>>>> 
>>> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
>>>>> not deployed on any region server.
>>>>> ERROR: Region
>>>>> 
>>> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8.
>>>>> not deployed on any region server.
>>>>> ERROR: There is a hole in the region chain between
>>>>> 
>>> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579
>>>>> and
>>>>> 
>>> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.
>>>>> You need to create a new regioninfo and region dir in hdfs to plug the
>>>>> hole.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kevin O'Dell
>>>> Customer Operations Engineer, Cloudera
>>> 
>>> 
>> 
>> 
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera


Re: hbase corruption - missing region files in HDFS

Posted by Tom Brown <to...@gmail.com>.
Chris,

I really appreciate your detailed fix description!  I've run into
similar problems (due to old hardware and bad sectors) and could never
figure out how to fix a broken table. Hbck always seemed to just make
things worse until I would give up and recreate the table.

Can you publish your utility that you used to create valid/empty HFiles?

--Tom

On Sun, Dec 9, 2012 at 6:08 PM, Kevin O'dell <ke...@cloudera.com> wrote:
> Chris,
>
> Thank you for the very descriptive update.
>
> On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <wa...@maubi.net> wrote:
>
>> Well, I upgraded to 0.92.2, since the version I was running on (0.92.1)
>> didn't have those options for "hbck".
>>
>> That helped.
>>
>> It took me a while to realize that I had to make the root filesystem
>> writable so that "hbck
>> -repair" could create itself a directory.  So, once that was done, it at
>> least ran through to completion.
>>
>> But the problem persisted in that there were blocks in META that didn't
>> exist on the filesystem.  One poor region server was assigned the sad task
>> of attempting to open the non-existent directory, which it slavishly
>> reattempted again and again, filling its log with FileNotFoundException
>> stack traces.
>>
>> For example,
>>
>> 2012-12-09 00:14:33,315 ERROR
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
>> of
>> region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
>> java.io.FileNotFoundException: File does not exist:
>> /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84
>> etc., etc.
>>
>> In particular, the directory above "/hbase/referrers/2cb553...c63" simply
>> did not exist at all in HDFS.
>>
>> So I took matters into my own hands and created the missing
>> "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and
>> attempted to create a zero-length file "331fd9...e84".  This changed the
>> firehose of exceptions from FileNotFoundException to CorruptHFileException.
>>
>> So, I wrote a small program to emit a valid, empty HFile, and proceeded to
>> place these files at whatever places in HDFS that a FileNotFoundException
>> was being thrown.  After creating three or four of them, the exceptions
>> stopped.
>>
>> I then ran "hbck -repair" again, and upon completion it declared victory.
>>
>> Again, I suspect that I got myself into this problem because I ran a
>> machine out of disk space.  It's likely that most folks are more clever
>> than me, and so this problem hasn't arisen before. :)
>>
>>
>>
>>
>> On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <ke...@cloudera.com>
>> wrote:
>>
>> > can you run hbase hbck -fixMeta -fixAssignments
>> >
>> > This should assign those region servers and fix the hole.
>> >
>> > On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <wa...@maubi.net>
>> wrote:
>> >
>> >> Hello!  I've gotten myself into trouble where I'm missing files on HDFS
>> >> that HBase thinks ought to be there.  In particular, running "hbase
>> hbck"
>> >> yields the below message: two regions are "not deployed on any region
>> >> server" (because there is no file in HDFS for the region), and "there
>> is a
>> >> hole in the region chain".
>> >>
>> >> (FWIW, I suspect that this problem is due to a recent incident where we
>> >> ran the cluster out of disk space.)
>> >>
>> >> I'm running 0.92.1, and have been staggering around trying to figure out
>> >> what procedure I ought to use to correct the problem, but my Google-fu
>> is
>> >> too poor to have yielded results.  Any pointers would be appreciated!
>> >>
>> >> thanks,
>> >> chris
>> >>
>> >>
>> >>
>> >>
>> >> ERROR: Region
>> >>
>> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
>> >> not deployed on any region server.
>> >> ERROR: Region
>> >>
>> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8.
>> >> not deployed on any region server.
>> >> ERROR: There is a hole in the region chain between
>> >>
>> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579
>> >> and
>> >>
>> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.
>> >> You need to create a new regioninfo and region dir in hdfs to plug the
>> >> hole.
>> >>
>> >>
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Customer Operations Engineer, Cloudera
>>
>>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera

Re: hbase corruption - missing region files in HDFS

Posted by Kevin O'dell <ke...@cloudera.com>.
Chris,

Thank you for the very descriptive update.

On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <wa...@maubi.net> wrote:

> Well, I upgraded to 0.92.2, since the version I was running on (0.92.1)
> didn't have those options for "hbck".
>
> That helped.
>
> It took me a while to realize that I had to make the root filesystem
> writable so that "hbck
> -repair" could create itself a directory.  So, once that was done, it at
> least ran through to completion.
>
> But the problem persisted in that there were blocks in META that didn't
> exist on the filesystem.  One poor region server was assigned the sad task
> of attempting to open the non-existent directory, which it slavishly
> reattempted again and again, filling its log with FileNotFoundException
> stack traces.
>
> For example,
>
> 2012-12-09 00:14:33,315 ERROR
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
> of
> region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
> java.io.FileNotFoundException: File does not exist:
> /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84
> etc., etc.
>
> In particular, the directory above "/hbase/referrers/2cb553...c63" simply
> did not exist at all in HDFS.
>
> So I took matters into my own hands and created the missing
> "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and
> attempted to create a zero-length file "331fd9...e84".  This changed the
> firehose of exceptions from FileNotFoundException to CorruptHFileException.
>
> So, I wrote a small program to emit a valid, empty HFile, and proceeded to
> place these files at whatever places in HDFS that a FileNotFoundException
> was being thrown.  After creating three or four of them, the exceptions
> stopped.
>
> I then ran "hbck -repair" again, and upon completion it declared victory.
>
> Again, I suspect that I got myself into this problem because I ran a
> machine out of disk space.  It's likely that most folks are more clever
> than me, and so this problem hasn't arisen before. :)
>
>
>
>
> On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <ke...@cloudera.com>
> wrote:
>
> > can you run hbase hbck -fixMeta -fixAssignments
> >
> > This should assign those region servers and fix the hole.
> >
> > On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <wa...@maubi.net>
> wrote:
> >
> >> Hello!  I've gotten myself into trouble where I'm missing files on HDFS
> >> that HBase thinks ought to be there.  In particular, running "hbase
> hbck"
> >> yields the below message: two regions are "not deployed on any region
> >> server" (because there is no file in HDFS for the region), and "there
> is a
> >> hole in the region chain".
> >>
> >> (FWIW, I suspect that this problem is due to a recent incident where we
> >> ran the cluster out of disk space.)
> >>
> >> I'm running 0.92.1, and have been staggering around trying to figure out
> >> what procedure I ought to use to correct the problem, but my Google-fu
> is
> >> too poor to have yielded results.  Any pointers would be appreciated!
> >>
> >> thanks,
> >> chris
> >>
> >>
> >>
> >>
> >> ERROR: Region
> >>
> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
> >> not deployed on any region server.
> >> ERROR: Region
> >>
> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8.
> >> not deployed on any region server.
> >> ERROR: There is a hole in the region chain between
> >>
> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579
> >> and
> >>
> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.
> >> You need to create a new regioninfo and region dir in hdfs to plug the
> >> hole.
> >>
> >>
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
>
>


-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: hbase corruption - missing region files in HDFS

Posted by Chris Waterson <wa...@maubi.net>.
Well, I upgraded to 0.92.2, since the version I was running on (0.92.1) didn't have those options for "hbck".

That helped.

It took me a while to realize that I had to make the root filesystem writable so that "hbck 
-repair" could create itself a directory.  So, once that was done, it at least ran through to completion.

But the problem persisted in that there were blocks in META that didn't exist on the filesystem.  One poor region server was assigned the sad task of attempting to open the non-existent directory, which it slavishly reattempted again and again, filling its log with FileNotFoundException stack traces.

For example,

2012-12-09 00:14:33,315 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
java.io.FileNotFoundException: File does not exist: /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84
etc., etc.

In particular, the directory above "/hbase/referrers/2cb553...c63" simply did not exist at all in HDFS. 

So I took matters into my own hands and created the missing "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and attempted to create a zero-length file "331fd9...e84".  This changed the firehose of exceptions from FileNotFoundException to CorruptHFileException.

So, I wrote a small program to emit a valid, empty HFile, and proceeded to place these files at whatever places in HDFS that a FileNotFoundException was being thrown.  After creating three or four of them, the exceptions stopped.

I then ran "hbck -repair" again, and upon completion it declared victory.

Again, I suspect that I got myself into this problem because I ran a machine out of disk space.  It's likely that most folks are more clever than me, and so this problem hasn't arisen before. :)




On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <ke...@cloudera.com> wrote:

> can you run hbase hbck -fixMeta -fixAssignments
> 
> This should assign those region servers and fix the hole.
> 
> On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <wa...@maubi.net> wrote:
> 
>> Hello!  I've gotten myself into trouble where I'm missing files on HDFS
>> that HBase thinks ought to be there.  In particular, running "hbase hbck"
>> yields the below message: two regions are "not deployed on any region
>> server" (because there is no file in HDFS for the region), and "there is a
>> hole in the region chain".
>> 
>> (FWIW, I suspect that this problem is due to a recent incident where we
>> ran the cluster out of disk space.)
>> 
>> I'm running 0.92.1, and have been staggering around trying to figure out
>> what procedure I ought to use to correct the problem, but my Google-fu is
>> too poor to have yielded results.  Any pointers would be appreciated!
>> 
>> thanks,
>> chris
>> 
>> 
>> 
>> 
>> ERROR: Region
>> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
>> not deployed on any region server.
>> ERROR: Region
>> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8.
>> not deployed on any region server.
>> ERROR: There is a hole in the region chain between
>> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579
>> and
>> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.
>> You need to create a new regioninfo and region dir in hdfs to plug the
>> hole.
>> 
>> 
> 
> 
> -- 
> Kevin O'Dell
> Customer Operations Engineer, Cloudera


Re: hbase corruption - missing region files in HDFS

Posted by Kevin O'dell <ke...@cloudera.com>.
can you run hbase hbck -fixMeta -fixAssignments

This should assign those region servers and fix the hole.

On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <wa...@maubi.net> wrote:

> Hello!  I've gotten myself into trouble where I'm missing files on HDFS
> that HBase thinks ought to be there.  In particular, running "hbase hbck"
> yields the below message: two regions are "not deployed on any region
> server" (because there is no file in HDFS for the region), and "there is a
> hole in the region chain".
>
> (FWIW, I suspect that this problem is due to a recent incident where we
> ran the cluster out of disk space.)
>
> I'm running 0.92.1, and have been staggering around trying to figure out
> what procedure I ought to use to correct the problem, but my Google-fu is
> too poor to have yielded results.  Any pointers would be appreciated!
>
> thanks,
> chris
>
>
>
>
> ERROR: Region
> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7.
> not deployed on any region server.
> ERROR: Region
> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8.
> not deployed on any region server.
> ERROR: There is a hole in the region chain between
> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579
> and
> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951.
>  You need to create a new regioninfo and region dir in hdfs to plug the
> hole.
>
>


-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera