You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ellimilial K <el...@googlemail.com> on 2015/02/03 19:04:15 UTC

HBase all files corrupt / missing blocks

We have recently experienced some issues with our namenodes in HA
arrangement and had to recreate namenode metadata from a backup while some
new data has been pushed to the regions ervers in the meantime. We're on
HBase 98.6.

After launching the cluster again, we have realised that we're missing
~8000/190000 blocks. Looking at fsck output, we can see, for what looks
like a continuous stream of regions:

/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
MISSING 1 blocks of total size 929610 B...
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block blk_1076077966

I did not want to run fsck -delete and hbck complains because the files
would not be allocated to region servers - reporting missing blocks.

The total size of this table is circa 22TB on HDFS and recreating it would
be quite a drag (pushing it from our previous hbase cluster took about a
month). Is there any known way of dealing with such situation?

Mateusz Kaczyński

Re: HBase all files corrupt / missing blocks

Posted by Ellimilial K <el...@googlemail.com>.
Hi Esteban,

I believe the upgrade went fine, i.e. the stack worked for a couple of days
until the main namenode died yesterday (possibly timed out on gc?) the
backup one died(or did not roll) complaining on out of sync errors from
journalnodes. When I restarted journalnodes both namenodes started
reporting no valid fsimage. At that point I tried namenode -recover, to no
avail. Finally I put on a previously backed up snapshot of dfs name
directory from a couple hours earlier and at this point it started
reporting missing / corrupted blocks. Sorry for the non-HBase'y digression.

Thanks,
Mateusz

On 3 February 2015 at 22:45, Esteban Gutierrez <es...@cloudera.com> wrote:

> Hi Mateusz,
>
> Thats interesting, did you started the NN with the right fsimage after the
> upgrade? that might also explain this.
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Tue, Feb 3, 2015 at 2:26 PM, Ellimilial K <el...@googlemail.com>
> wrote:
>
> > That's quite horrible, oh well, thanks for the help!
> >
> > Yes, positive, we started having issues with HA quorum a couple of days
> > after the migration, HBase has constantly been taking ~200 requests a
> > second via stargate, things seemed to work fine.
> >
> > Mateusz
> >
> > On 3 February 2015 at 22:11, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> > wrote:
> >
> > > Those files and related data are most probably lost.... I don't see any
> > > other option than deleting them.
> > >
> > > Are you sure those blocks where not missing before the migration? Did
> you
> > > have any crash over the migration process?
> > >
> > > JM
> > >
> > > 2015-02-03 13:14 GMT-08:00 Ellimilial K <el...@googlemail.com>:
> > >
> > > > Thank you for the responses!
> > > >
> > > > @Jean-Mark
> > > > This comes from fsck /, I see a flood of those going in at least
> > > hundreds,
> > > > for this particular region:
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076062948
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > > > MISSING 1 blocks of total size 52243482 B..
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076077963
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > > > MISSING 1 blocks of total size 6181 B...
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076062891
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > > > MISSING 1 blocks of total size 11747149 B..
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076077964
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > > > MISSING 1 blocks of total size 10431742 B..
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076062900
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > MISSING 1 blocks of total size 929610 B...
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > > blk_1076077966
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > MISSING 1 blocks of total size 119139 B.........
> > > > (...) ending with:
> > > > ..........Status: CORRUPT
> > > >  Total size: 23155170955674 B (Total open files size: 1577 B)
> > > >  Total dirs: 21232
> > > >  Total files: 33311
> > > >  Total symlinks: 0 (Files currently being written: 61)
> > > >  Total blocks (validated): 199618 (avg. block size 115997409 B)
> (Total
> > > open
> > > > file blocks (not validated): 19)
> > > >   ********************************
> > > >   CORRUPT FILES: 8245
> > > >   MISSING BLOCKS: 8245
> > > >   MISSING SIZE: 162010861748 B
> > > >   CORRUPT BLOCKS:  8245
> > > >   ********************************
> > > >  Minimally replicated blocks: 191373 (95.86961 %)
> > > >  Over-replicated blocks: 3241 (1.6236011 %)
> > > >  Under-replicated blocks: 0 (0.0 %)
> > > >  Mis-replicated blocks: 0 (0.0 %)
> > > >  Default replication factor: 3
> > > >  Average block replication: 2.916185
> > > >  Corrupt blocks: 8245
> > > >  Missing replicas: 0 (0.0 %)
> > > >  Number of data-nodes: 17
> > > >  Number of racks: 1
> > > >
> > > > There are 8 files in directories within
> > > > hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I
> imagine
> > > 6/8
> > > > is affected.
> > > > The size of missing blocks differs from 2kb up to ~ 70MB. The table
> > > > concerned had ~3500 regions. All datanodes are up and look like they
> > > report
> > > > correctly so unfortunately no replica lying around.
> > > >
> > > > @esteban I double checked, the volumes seem fine, total HDFS size
> also
> > > > looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> > > > cluster replication if I'm answering the question?),freshly after an
> > > > upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication
> > set
> > > to
> > > > 3.
> > > >
> > > > Many thanks,
> > > > Mateusz
> > > >
> > > > On 3 February 2015 at 20:30, Esteban Gutierrez <esteban@cloudera.com
> >
> > > > wrote:
> > > >
> > > > > Hi Mateusz,
> > > > >
> > > > > As JMS mentioned, is very likely the data is lost, but that type of
> > > > > corruption is usually due some DNs down or data volumes removed for
> > > some
> > > > > reason, have you tried to recover that data from those DNs first?
> > > > >
> > > > > From "for what looks like a continuous stream of regions" sounds
> like
> > > you
> > > > > had a single replica configured for HBase is that the case?
> > > > >
> > > > > esteban.
> > > > >
> > > > > --
> > > > > Cloudera, Inc.
> > > > >
> > > > >
> > > > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Mateusz,
> > > > > >
> > > > > > Data from this HFile is most probably lost. Is the block also
> > > reporting
> > > > > > missing from fsck? Do you have any datanode down which might
> > contain
> > > > this
> > > > > > block? How big is tis HFile? 929610 bytes only? If so, one option
> > > might
> > > > > > just to to delete this HFile.
> > > > > >
> > > > > > How many HFiles are within this region?
> > > > > >
> > > > > > JM
> > > > > >
> > > > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <
> ellimilial@googlemail.com
> > >:
> > > > > >
> > > > > > > We have recently experienced some issues with our namenodes in
> HA
> > > > > > > arrangement and had to recreate namenode metadata from a backup
> > > while
> > > > > > some
> > > > > > > new data has been pushed to the regions ervers in the meantime.
> > > We're
> > > > > on
> > > > > > > HBase 98.6.
> > > > > > >
> > > > > > > After launching the cluster again, we have realised that we're
> > > > missing
> > > > > > > ~8000/190000 blocks. Looking at fsck output, we can see, for
> what
> > > > looks
> > > > > > > like a continuous stream of regions:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > > > > MISSING 1 blocks of total size 929610 B...
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> > > > blk_1076077966
> > > > > > >
> > > > > > > I did not want to run fsck -delete and hbck complains because
> the
> > > > files
> > > > > > > would not be allocated to region servers - reporting missing
> > > blocks.
> > > > > > >
> > > > > > > The total size of this table is circa 22TB on HDFS and
> recreating
> > > it
> > > > > > would
> > > > > > > be quite a drag (pushing it from our previous hbase cluster
> took
> > > > about
> > > > > a
> > > > > > > month). Is there any known way of dealing with such situation?
> > > > > > >
> > > > > > > Mateusz Kaczyński
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Mateusz,

Thats interesting, did you started the NN with the right fsimage after the
upgrade? that might also explain this.

cheers,
esteban.


--
Cloudera, Inc.


On Tue, Feb 3, 2015 at 2:26 PM, Ellimilial K <el...@googlemail.com>
wrote:

> That's quite horrible, oh well, thanks for the help!
>
> Yes, positive, we started having issues with HA quorum a couple of days
> after the migration, HBase has constantly been taking ~200 requests a
> second via stargate, things seemed to work fine.
>
> Mateusz
>
> On 3 February 2015 at 22:11, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
> > Those files and related data are most probably lost.... I don't see any
> > other option than deleting them.
> >
> > Are you sure those blocks where not missing before the migration? Did you
> > have any crash over the migration process?
> >
> > JM
> >
> > 2015-02-03 13:14 GMT-08:00 Ellimilial K <el...@googlemail.com>:
> >
> > > Thank you for the responses!
> > >
> > > @Jean-Mark
> > > This comes from fsck /, I see a flood of those going in at least
> > hundreds,
> > > for this particular region:
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076062948
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > > MISSING 1 blocks of total size 52243482 B..
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076077963
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > > MISSING 1 blocks of total size 6181 B...
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076062891
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > > MISSING 1 blocks of total size 11747149 B..
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076077964
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > > MISSING 1 blocks of total size 10431742 B..
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076062900
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > MISSING 1 blocks of total size 929610 B...
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > > blk_1076077966
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > MISSING 1 blocks of total size 119139 B.........
> > > (...) ending with:
> > > ..........Status: CORRUPT
> > >  Total size: 23155170955674 B (Total open files size: 1577 B)
> > >  Total dirs: 21232
> > >  Total files: 33311
> > >  Total symlinks: 0 (Files currently being written: 61)
> > >  Total blocks (validated): 199618 (avg. block size 115997409 B) (Total
> > open
> > > file blocks (not validated): 19)
> > >   ********************************
> > >   CORRUPT FILES: 8245
> > >   MISSING BLOCKS: 8245
> > >   MISSING SIZE: 162010861748 B
> > >   CORRUPT BLOCKS:  8245
> > >   ********************************
> > >  Minimally replicated blocks: 191373 (95.86961 %)
> > >  Over-replicated blocks: 3241 (1.6236011 %)
> > >  Under-replicated blocks: 0 (0.0 %)
> > >  Mis-replicated blocks: 0 (0.0 %)
> > >  Default replication factor: 3
> > >  Average block replication: 2.916185
> > >  Corrupt blocks: 8245
> > >  Missing replicas: 0 (0.0 %)
> > >  Number of data-nodes: 17
> > >  Number of racks: 1
> > >
> > > There are 8 files in directories within
> > > hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine
> > 6/8
> > > is affected.
> > > The size of missing blocks differs from 2kb up to ~ 70MB. The table
> > > concerned had ~3500 regions. All datanodes are up and look like they
> > report
> > > correctly so unfortunately no replica lying around.
> > >
> > > @esteban I double checked, the volumes seem fine, total HDFS size also
> > > looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> > > cluster replication if I'm answering the question?),freshly after an
> > > upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication
> set
> > to
> > > 3.
> > >
> > > Many thanks,
> > > Mateusz
> > >
> > > On 3 February 2015 at 20:30, Esteban Gutierrez <es...@cloudera.com>
> > > wrote:
> > >
> > > > Hi Mateusz,
> > > >
> > > > As JMS mentioned, is very likely the data is lost, but that type of
> > > > corruption is usually due some DNs down or data volumes removed for
> > some
> > > > reason, have you tried to recover that data from those DNs first?
> > > >
> > > > From "for what looks like a continuous stream of regions" sounds like
> > you
> > > > had a single replica configured for HBase is that the case?
> > > >
> > > > esteban.
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi Mateusz,
> > > > >
> > > > > Data from this HFile is most probably lost. Is the block also
> > reporting
> > > > > missing from fsck? Do you have any datanode down which might
> contain
> > > this
> > > > > block? How big is tis HFile? 929610 bytes only? If so, one option
> > might
> > > > > just to to delete this HFile.
> > > > >
> > > > > How many HFiles are within this region?
> > > > >
> > > > > JM
> > > > >
> > > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <ellimilial@googlemail.com
> >:
> > > > >
> > > > > > We have recently experienced some issues with our namenodes in HA
> > > > > > arrangement and had to recreate namenode metadata from a backup
> > while
> > > > > some
> > > > > > new data has been pushed to the regions ervers in the meantime.
> > We're
> > > > on
> > > > > > HBase 98.6.
> > > > > >
> > > > > > After launching the cluster again, we have realised that we're
> > > missing
> > > > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what
> > > looks
> > > > > > like a continuous stream of regions:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > > > MISSING 1 blocks of total size 929610 B...
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> > > blk_1076077966
> > > > > >
> > > > > > I did not want to run fsck -delete and hbck complains because the
> > > files
> > > > > > would not be allocated to region servers - reporting missing
> > blocks.
> > > > > >
> > > > > > The total size of this table is circa 22TB on HDFS and recreating
> > it
> > > > > would
> > > > > > be quite a drag (pushing it from our previous hbase cluster took
> > > about
> > > > a
> > > > > > month). Is there any known way of dealing with such situation?
> > > > > >
> > > > > > Mateusz Kaczyński
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Ellimilial K <el...@googlemail.com>.
That's quite horrible, oh well, thanks for the help!

Yes, positive, we started having issues with HA quorum a couple of days
after the migration, HBase has constantly been taking ~200 requests a
second via stargate, things seemed to work fine.

Mateusz

On 3 February 2015 at 22:11, Jean-Marc Spaggiari <je...@spaggiari.org>
wrote:

> Those files and related data are most probably lost.... I don't see any
> other option than deleting them.
>
> Are you sure those blocks where not missing before the migration? Did you
> have any crash over the migration process?
>
> JM
>
> 2015-02-03 13:14 GMT-08:00 Ellimilial K <el...@googlemail.com>:
>
> > Thank you for the responses!
> >
> > @Jean-Mark
> > This comes from fsck /, I see a flood of those going in at least
> hundreds,
> > for this particular region:
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062948
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > MISSING 1 blocks of total size 52243482 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077963
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > MISSING 1 blocks of total size 6181 B...
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062891
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > MISSING 1 blocks of total size 11747149 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077964
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > MISSING 1 blocks of total size 10431742 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062900
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > MISSING 1 blocks of total size 929610 B...
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077966
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > MISSING 1 blocks of total size 119139 B.........
> > (...) ending with:
> > ..........Status: CORRUPT
> >  Total size: 23155170955674 B (Total open files size: 1577 B)
> >  Total dirs: 21232
> >  Total files: 33311
> >  Total symlinks: 0 (Files currently being written: 61)
> >  Total blocks (validated): 199618 (avg. block size 115997409 B) (Total
> open
> > file blocks (not validated): 19)
> >   ********************************
> >   CORRUPT FILES: 8245
> >   MISSING BLOCKS: 8245
> >   MISSING SIZE: 162010861748 B
> >   CORRUPT BLOCKS:  8245
> >   ********************************
> >  Minimally replicated blocks: 191373 (95.86961 %)
> >  Over-replicated blocks: 3241 (1.6236011 %)
> >  Under-replicated blocks: 0 (0.0 %)
> >  Mis-replicated blocks: 0 (0.0 %)
> >  Default replication factor: 3
> >  Average block replication: 2.916185
> >  Corrupt blocks: 8245
> >  Missing replicas: 0 (0.0 %)
> >  Number of data-nodes: 17
> >  Number of racks: 1
> >
> > There are 8 files in directories within
> > hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine
> 6/8
> > is affected.
> > The size of missing blocks differs from 2kb up to ~ 70MB. The table
> > concerned had ~3500 regions. All datanodes are up and look like they
> report
> > correctly so unfortunately no replica lying around.
> >
> > @esteban I double checked, the volumes seem fine, total HDFS size also
> > looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> > cluster replication if I'm answering the question?),freshly after an
> > upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set
> to
> > 3.
> >
> > Many thanks,
> > Mateusz
> >
> > On 3 February 2015 at 20:30, Esteban Gutierrez <es...@cloudera.com>
> > wrote:
> >
> > > Hi Mateusz,
> > >
> > > As JMS mentioned, is very likely the data is lost, but that type of
> > > corruption is usually due some DNs down or data volumes removed for
> some
> > > reason, have you tried to recover that data from those DNs first?
> > >
> > > From "for what looks like a continuous stream of regions" sounds like
> you
> > > had a single replica configured for HBase is that the case?
> > >
> > > esteban.
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Mateusz,
> > > >
> > > > Data from this HFile is most probably lost. Is the block also
> reporting
> > > > missing from fsck? Do you have any datanode down which might contain
> > this
> > > > block? How big is tis HFile? 929610 bytes only? If so, one option
> might
> > > > just to to delete this HFile.
> > > >
> > > > How many HFiles are within this region?
> > > >
> > > > JM
> > > >
> > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <el...@googlemail.com>:
> > > >
> > > > > We have recently experienced some issues with our namenodes in HA
> > > > > arrangement and had to recreate namenode metadata from a backup
> while
> > > > some
> > > > > new data has been pushed to the regions ervers in the meantime.
> We're
> > > on
> > > > > HBase 98.6.
> > > > >
> > > > > After launching the cluster again, we have realised that we're
> > missing
> > > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what
> > looks
> > > > > like a continuous stream of regions:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > > MISSING 1 blocks of total size 929610 B...
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> > blk_1076077966
> > > > >
> > > > > I did not want to run fsck -delete and hbck complains because the
> > files
> > > > > would not be allocated to region servers - reporting missing
> blocks.
> > > > >
> > > > > The total size of this table is circa 22TB on HDFS and recreating
> it
> > > > would
> > > > > be quite a drag (pushing it from our previous hbase cluster took
> > about
> > > a
> > > > > month). Is there any known way of dealing with such situation?
> > > > >
> > > > > Mateusz Kaczyński
> > > > >
> > > >
> > >
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Those files and related data are most probably lost.... I don't see any
other option than deleting them.

Are you sure those blocks where not missing before the migration? Did you
have any crash over the migration process?

JM

2015-02-03 13:14 GMT-08:00 Ellimilial K <el...@googlemail.com>:

> Thank you for the responses!
>
> @Jean-Mark
> This comes from fsck /, I see a flood of those going in at least hundreds,
> for this particular region:
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062948
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> MISSING 1 blocks of total size 52243482 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077963
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> MISSING 1 blocks of total size 6181 B...
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062891
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> MISSING 1 blocks of total size 11747149 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077964
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> MISSING 1 blocks of total size 10431742 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062900
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> MISSING 1 blocks of total size 929610 B...
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077966
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> MISSING 1 blocks of total size 119139 B.........
> (...) ending with:
> ..........Status: CORRUPT
>  Total size: 23155170955674 B (Total open files size: 1577 B)
>  Total dirs: 21232
>  Total files: 33311
>  Total symlinks: 0 (Files currently being written: 61)
>  Total blocks (validated): 199618 (avg. block size 115997409 B) (Total open
> file blocks (not validated): 19)
>   ********************************
>   CORRUPT FILES: 8245
>   MISSING BLOCKS: 8245
>   MISSING SIZE: 162010861748 B
>   CORRUPT BLOCKS:  8245
>   ********************************
>  Minimally replicated blocks: 191373 (95.86961 %)
>  Over-replicated blocks: 3241 (1.6236011 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 3
>  Average block replication: 2.916185
>  Corrupt blocks: 8245
>  Missing replicas: 0 (0.0 %)
>  Number of data-nodes: 17
>  Number of racks: 1
>
> There are 8 files in directories within
> hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine 6/8
> is affected.
> The size of missing blocks differs from 2kb up to ~ 70MB. The table
> concerned had ~3500 regions. All datanodes are up and look like they report
> correctly so unfortunately no replica lying around.
>
> @esteban I double checked, the volumes seem fine, total HDFS size also
> looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> cluster replication if I'm answering the question?),freshly after an
> upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set to
> 3.
>
> Many thanks,
> Mateusz
>
> On 3 February 2015 at 20:30, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > Hi Mateusz,
> >
> > As JMS mentioned, is very likely the data is lost, but that type of
> > corruption is usually due some DNs down or data volumes removed for some
> > reason, have you tried to recover that data from those DNs first?
> >
> > From "for what looks like a continuous stream of regions" sounds like you
> > had a single replica configured for HBase is that the case?
> >
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Mateusz,
> > >
> > > Data from this HFile is most probably lost. Is the block also reporting
> > > missing from fsck? Do you have any datanode down which might contain
> this
> > > block? How big is tis HFile? 929610 bytes only? If so, one option might
> > > just to to delete this HFile.
> > >
> > > How many HFiles are within this region?
> > >
> > > JM
> > >
> > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <el...@googlemail.com>:
> > >
> > > > We have recently experienced some issues with our namenodes in HA
> > > > arrangement and had to recreate namenode metadata from a backup while
> > > some
> > > > new data has been pushed to the regions ervers in the meantime. We're
> > on
> > > > HBase 98.6.
> > > >
> > > > After launching the cluster again, we have realised that we're
> missing
> > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what
> looks
> > > > like a continuous stream of regions:
> > > >
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > MISSING 1 blocks of total size 929610 B...
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> blk_1076077966
> > > >
> > > > I did not want to run fsck -delete and hbck complains because the
> files
> > > > would not be allocated to region servers - reporting missing blocks.
> > > >
> > > > The total size of this table is circa 22TB on HDFS and recreating it
> > > would
> > > > be quite a drag (pushing it from our previous hbase cluster took
> about
> > a
> > > > month). Is there any known way of dealing with such situation?
> > > >
> > > > Mateusz Kaczyński
> > > >
> > >
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Ellimilial K <el...@googlemail.com>.
Thank you for the responses!

@Jean-Mark
This comes from fsck /, I see a flood of those going in at least hundreds,
for this particular region:
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076062948
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
MISSING 1 blocks of total size 52243482 B..
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076077963
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
MISSING 1 blocks of total size 6181 B...
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076062891
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
MISSING 1 blocks of total size 11747149 B..
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076077964
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
MISSING 1 blocks of total size 10431742 B..
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076062900
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
MISSING 1 blocks of total size 929610 B...
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
blk_1076077966
/hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
MISSING 1 blocks of total size 119139 B.........
(...) ending with:
..........Status: CORRUPT
 Total size: 23155170955674 B (Total open files size: 1577 B)
 Total dirs: 21232
 Total files: 33311
 Total symlinks: 0 (Files currently being written: 61)
 Total blocks (validated): 199618 (avg. block size 115997409 B) (Total open
file blocks (not validated): 19)
  ********************************
  CORRUPT FILES: 8245
  MISSING BLOCKS: 8245
  MISSING SIZE: 162010861748 B
  CORRUPT BLOCKS:  8245
  ********************************
 Minimally replicated blocks: 191373 (95.86961 %)
 Over-replicated blocks: 3241 (1.6236011 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 3
 Average block replication: 2.916185
 Corrupt blocks: 8245
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 17
 Number of racks: 1

There are 8 files in directories within
hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine 6/8
is affected.
The size of missing blocks differs from 2kb up to ~ 70MB. The table
concerned had ~3500 regions. All datanodes are up and look like they report
correctly so unfortunately no replica lying around.

@esteban I double checked, the volumes seem fine, total HDFS size also
looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
cluster replication if I'm answering the question?),freshly after an
upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set to
3.

Many thanks,
Mateusz

On 3 February 2015 at 20:30, Esteban Gutierrez <es...@cloudera.com> wrote:

> Hi Mateusz,
>
> As JMS mentioned, is very likely the data is lost, but that type of
> corruption is usually due some DNs down or data volumes removed for some
> reason, have you tried to recover that data from those DNs first?
>
> From "for what looks like a continuous stream of regions" sounds like you
> had a single replica configured for HBase is that the case?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Mateusz,
> >
> > Data from this HFile is most probably lost. Is the block also reporting
> > missing from fsck? Do you have any datanode down which might contain this
> > block? How big is tis HFile? 929610 bytes only? If so, one option might
> > just to to delete this HFile.
> >
> > How many HFiles are within this region?
> >
> > JM
> >
> > 2015-02-03 10:04 GMT-08:00 Ellimilial K <el...@googlemail.com>:
> >
> > > We have recently experienced some issues with our namenodes in HA
> > > arrangement and had to recreate namenode metadata from a backup while
> > some
> > > new data has been pushed to the regions ervers in the meantime. We're
> on
> > > HBase 98.6.
> > >
> > > After launching the cluster again, we have realised that we're missing
> > > ~8000/190000 blocks. Looking at fsck output, we can see, for what looks
> > > like a continuous stream of regions:
> > >
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > MISSING 1 blocks of total size 929610 B...
> > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block blk_1076077966
> > >
> > > I did not want to run fsck -delete and hbck complains because the files
> > > would not be allocated to region servers - reporting missing blocks.
> > >
> > > The total size of this table is circa 22TB on HDFS and recreating it
> > would
> > > be quite a drag (pushing it from our previous hbase cluster took about
> a
> > > month). Is there any known way of dealing with such situation?
> > >
> > > Mateusz Kaczyński
> > >
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Mateusz,

As JMS mentioned, is very likely the data is lost, but that type of
corruption is usually due some DNs down or data volumes removed for some
reason, have you tried to recover that data from those DNs first?

>From "for what looks like a continuous stream of regions" sounds like you
had a single replica configured for HBase is that the case?

esteban.

--
Cloudera, Inc.


On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Mateusz,
>
> Data from this HFile is most probably lost. Is the block also reporting
> missing from fsck? Do you have any datanode down which might contain this
> block? How big is tis HFile? 929610 bytes only? If so, one option might
> just to to delete this HFile.
>
> How many HFiles are within this region?
>
> JM
>
> 2015-02-03 10:04 GMT-08:00 Ellimilial K <el...@googlemail.com>:
>
> > We have recently experienced some issues with our namenodes in HA
> > arrangement and had to recreate namenode metadata from a backup while
> some
> > new data has been pushed to the regions ervers in the meantime. We're on
> > HBase 98.6.
> >
> > After launching the cluster again, we have realised that we're missing
> > ~8000/190000 blocks. Looking at fsck output, we can see, for what looks
> > like a continuous stream of regions:
> >
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > MISSING 1 blocks of total size 929610 B...
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block blk_1076077966
> >
> > I did not want to run fsck -delete and hbck complains because the files
> > would not be allocated to region servers - reporting missing blocks.
> >
> > The total size of this table is circa 22TB on HDFS and recreating it
> would
> > be quite a drag (pushing it from our previous hbase cluster took about a
> > month). Is there any known way of dealing with such situation?
> >
> > Mateusz Kaczyński
> >
>

Re: HBase all files corrupt / missing blocks

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Mateusz,

Data from this HFile is most probably lost. Is the block also reporting
missing from fsck? Do you have any datanode down which might contain this
block? How big is tis HFile? 929610 bytes only? If so, one option might
just to to delete this HFile.

How many HFiles are within this region?

JM

2015-02-03 10:04 GMT-08:00 Ellimilial K <el...@googlemail.com>:

> We have recently experienced some issues with our namenodes in HA
> arrangement and had to recreate namenode metadata from a backup while some
> new data has been pushed to the regions ervers in the meantime. We're on
> HBase 98.6.
>
> After launching the cluster again, we have realised that we're missing
> ~8000/190000 blocks. Looking at fsck output, we can see, for what looks
> like a continuous stream of regions:
>
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> MISSING 1 blocks of total size 929610 B...
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block blk_1076077966
>
> I did not want to run fsck -delete and hbck complains because the files
> would not be allocated to region servers - reporting missing blocks.
>
> The total size of this table is circa 22TB on HDFS and recreating it would
> be quite a drag (pushing it from our previous hbase cluster took about a
> month). Is there any known way of dealing with such situation?
>
> Mateusz Kaczyński
>