You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by kiran <ki...@gmail.com> on 2014/02/27 16:53:48 UTC

Hbase data loss scenario

Hi All,

We have been experiencing severe data loss issues from few hours. There are
some wierd things going on in the cluster. We were unable to locate the
data even in hdfs

Hbase version 0.94.1

Here is the wierd things that are going on:

1) Table which was once 1TB has now become 170GB with many of the regions
which we once 7gb are now becoming few MB's. We are no clue  what is
happening at all

2) Table is splitting (or what ever) (100 regions have become 200 regions)
and ours is constantregionsplitpolicy with region size 20gb. I don't know
why it is even spltting

3) HDFS namenode dump size which we periodically backup is decreasing

4) And there is a region chain with start keys and end keys as, I can't
copy paste the exact thing. For example

K1.xxx K2.xyz
K2.xyz K3.xyz,138798010000.xyp
K3.xyz,138798010000.xyp K4.xyq

I have never seen a wierd start key and end key like this. We also suspect
a failed split of a region around 20GB. We looked at logs many times but
unable to get any sense out of it. Please help us out and we can't afford
data loss.

Yesterday, There was an cluster crash of root region but we thought we
sucessfully restored that.But things did n't go that way.... There was a
consitent data loss after that.


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by Andrew Purtell <ap...@apache.org>.
On Fri, Feb 28, 2014 at 1:15 AM, kiran <ki...@gmail.com> wrote:

> Also Initially we though it was human error that some one might have
> deleted hdfs dirs under some regions. But, surprisingy only some columns in
> a column family for a row in the region are lost. If some one has deleted
> entire dir, then entire column family for those region rows should be lost,
> since hbase has stores files for each column family.
>

No, you can delete everything out from underneath HBase yet if it is in the
process of writing files when the deletion happened, the files under
construction will show up when closed, with directory structure recreated
as needed. If there is only an occasional store file here and there
remaining this is actually suggestive of a deletion of HBase data at the
HDFS level by some external process or user.

The NameNode audit records should tell you what actions were taken
involving HBase data directories by which user at what time.
-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hbase data loss scenario

Posted by kiran <ki...@gmail.com>.
Hi Jean,

This is the scenario i am talking about... Let me know if every thing is ok
with this region chain....

Regionname StartKey Endkey
RAW,GpsgQ,1393477054705.defb006868d8191e76a2ae7e9d203419. GpsgQ G7stL
RAW,G7stL,1393477054705.0d123f246312f937e930fc76fc8d4b9c. G7stL HCuv

RAW,HCuv,1393490697926.d1ea022c6fd534aaf89139ca0726cce5. HCuv
HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f.

RAW,HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f.,1393484971849.a71d77c695705d07a121340c07a1cc0c.
HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f.
eE08

The region hashes size in HDFS are "d1ea022c6fd534aaf89139ca0726cce5
d1ea022c6fd534aaf89139ca0726cce5" 600KB

Also Initially we though it was human error that some one might have
deleted hdfs dirs under some regions. But, surprisingy only some columns in
a column family for a row in the region are lost. If some one has deleted
entire dir, then entire column family for those region rows should be lost,
since hbase has stores files for each column family.

We also have rows with millions of columns in the table and some of them
are present and some of them are lost.... and it happened in a some regions
and not across all the table regions. All other tables are good in the
cluster.

On Thu, Feb 27, 2014 at 9:40 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Kiran,
>
> 2 things.
>
> 1) Is there any reason for you to use a so old HBase version? any chance to
> migrate to a more recent one? 0.94.17 is out.
> 2) What do you mean by "I have never seen a wierd start key and end key
> like this"? I don't see anything wrong with what you described. What you
> keys look like? Can you go a get with key beeing "K3.xyz,138798010000.xyp"?
>
> JM
>
>
> 2014-02-27 10:55 GMT-05:00 kiran <ki...@gmail.com>:
>
> > Adding to that there are many regions with 0MB size and have CF's as
> > specified in the table...
> >
> >
> > On Thu, Feb 27, 2014 at 9:23 PM, kiran <ki...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > We have been experiencing severe data loss issues from few hours. There
> > > are some wierd things going on in the cluster. We were unable to locate
> > the
> > > data even in hdfs
> > >
> > > Hbase version 0.94.1
> > >
> > > Here is the wierd things that are going on:
> > >
> > > 1) Table which was once 1TB has now become 170GB with many of the
> regions
> > > which we once 7gb are now becoming few MB's. We are no clue  what is
> > > happening at all
> > >
> > > 2) Table is splitting (or what ever) (100 regions have become 200
> > regions)
> > > and ours is constantregionsplitpolicy with region size 20gb. I don't
> know
> > > why it is even spltting
> > >
> > > 3) HDFS namenode dump size which we periodically backup is decreasing
> > >
> > > 4) And there is a region chain with start keys and end keys as, I can't
> > > copy paste the exact thing. For example
> > >
> > > K1.xxx K2.xyz
> > > K2.xyz K3.xyz,138798010000.xyp
> > > K3.xyz,138798010000.xyp K4.xyq
> > >
> > > I have never seen a wierd start key and end key like this. We also
> > suspect
> > > a failed split of a region around 20GB. We looked at logs many times
> but
> > > unable to get any sense out of it. Please help us out and we can't
> afford
> > > data loss.
> > >
> > > Yesterday, There was an cluster crash of root region but we thought we
> > > sucessfully restored that.But things did n't go that way.... There was
> a
> > > consitent data loss after that.
> > >
> > >
> > > --
> > > Thank you
> > > Kiran Sarvabhotla
> > >
> > > -----Even a correct decision is wrong when it is taken late
> > >
> > >
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
>



-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Kiran,

2 things.

1) Is there any reason for you to use a so old HBase version? any chance to
migrate to a more recent one? 0.94.17 is out.
2) What do you mean by "I have never seen a wierd start key and end key
like this"? I don't see anything wrong with what you described. What you
keys look like? Can you go a get with key beeing "K3.xyz,138798010000.xyp"?

JM


2014-02-27 10:55 GMT-05:00 kiran <ki...@gmail.com>:

> Adding to that there are many regions with 0MB size and have CF's as
> specified in the table...
>
>
> On Thu, Feb 27, 2014 at 9:23 PM, kiran <ki...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > We have been experiencing severe data loss issues from few hours. There
> > are some wierd things going on in the cluster. We were unable to locate
> the
> > data even in hdfs
> >
> > Hbase version 0.94.1
> >
> > Here is the wierd things that are going on:
> >
> > 1) Table which was once 1TB has now become 170GB with many of the regions
> > which we once 7gb are now becoming few MB's. We are no clue  what is
> > happening at all
> >
> > 2) Table is splitting (or what ever) (100 regions have become 200
> regions)
> > and ours is constantregionsplitpolicy with region size 20gb. I don't know
> > why it is even spltting
> >
> > 3) HDFS namenode dump size which we periodically backup is decreasing
> >
> > 4) And there is a region chain with start keys and end keys as, I can't
> > copy paste the exact thing. For example
> >
> > K1.xxx K2.xyz
> > K2.xyz K3.xyz,138798010000.xyp
> > K3.xyz,138798010000.xyp K4.xyq
> >
> > I have never seen a wierd start key and end key like this. We also
> suspect
> > a failed split of a region around 20GB. We looked at logs many times but
> > unable to get any sense out of it. Please help us out and we can't afford
> > data loss.
> >
> > Yesterday, There was an cluster crash of root region but we thought we
> > sucessfully restored that.But things did n't go that way.... There was a
> > consitent data loss after that.
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
> >
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>

Re: Hbase data loss scenario

Posted by kiran <ki...@gmail.com>.
Adding to that there are many regions with 0MB size and have CF's as
specified in the table...


On Thu, Feb 27, 2014 at 9:23 PM, kiran <ki...@gmail.com> wrote:

> Hi All,
>
> We have been experiencing severe data loss issues from few hours. There
> are some wierd things going on in the cluster. We were unable to locate the
> data even in hdfs
>
> Hbase version 0.94.1
>
> Here is the wierd things that are going on:
>
> 1) Table which was once 1TB has now become 170GB with many of the regions
> which we once 7gb are now becoming few MB's. We are no clue  what is
> happening at all
>
> 2) Table is splitting (or what ever) (100 regions have become 200 regions)
> and ours is constantregionsplitpolicy with region size 20gb. I don't know
> why it is even spltting
>
> 3) HDFS namenode dump size which we periodically backup is decreasing
>
> 4) And there is a region chain with start keys and end keys as, I can't
> copy paste the exact thing. For example
>
> K1.xxx K2.xyz
> K2.xyz K3.xyz,138798010000.xyp
> K3.xyz,138798010000.xyp K4.xyq
>
> I have never seen a wierd start key and end key like this. We also suspect
> a failed split of a region around 20GB. We looked at logs many times but
> unable to get any sense out of it. Please help us out and we can't afford
> data loss.
>
> Yesterday, There was an cluster crash of root region but we thought we
> sucessfully restored that.But things did n't go that way.... There was a
> consitent data loss after that.
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
>


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by lars hofhansl <la...@apache.org>.
Sounds like lost HFiles. I'd start digging through the HDFS logs.
Which version of Hadoop/HDFS are you using. Any power outages, recently?



________________________________
 From: kiran <ki...@gmail.com>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Friday, February 28, 2014 8:33 AM
Subject: Re: Hbase data loss scenario
 


Hi Lars,

This is very mysterious for us....  Some rows are completely deleted, some of them are partially deleted with one CF present and other CF partially present or not.

The regions which are once around 7gb has turned to few KB's, so it is a major data loss for us... and We suspect they also underwent split. But splitting is our assumption.....we are not sure...



On Fri, Feb 28, 2014 at 1:16 PM, lars hofhansl <la...@apache.org> wrote:

HDFS size can vary and go down (when a compaction happens, and later the old files are collected and deleted).
>So the size in HDFS is not a good measure, unless you lost rows there's no reason to worry.
>
>Can you quantify "consistent data loss"? Did you count rows before and after? Can access any data at all?
>
>-- Lars
>
>
>
>________________________________
> From: kiran <ki...@gmail.com>
>To: user@hbase.apache.org
>Sent: Thursday, February 27, 2014 7:53 AM
>Subject: Hbase data loss scenario
>
>
>
>Hi All,
>
>We have been experiencing severe data loss issues from few hours. There are
>some wierd things going on in the cluster. We were unable to locate the
>data even in hdfs
>
>Hbase version 0.94.1
>
>Here is the wierd things that are going on:
>
>1) Table which was once 1TB has now become 170GB with many of the regions
>which we once 7gb are now becoming few MB's. We are no clue  what is
>happening at all
>
>2) Table is splitting (or what ever) (100 regions have become 200 regions)
>and ours is constantregionsplitpolicy with region size 20gb. I don't know
>why it is even spltting
>
>3) HDFS namenode dump size which we periodically backup is decreasing
>
>4) And there is a region chain with start keys and end keys as, I can't
>copy paste the exact thing. For example
>
>K1.xxx K2.xyz
>K2.xyz K3.xyz,138798010000.xyp
>K3.xyz,138798010000.xyp K4.xyq
>
>I have never seen a wierd start key and end key like this. We also suspect
>a failed split of a region around 20GB. We looked at logs many times but
>unable to get any sense out of it. Please help us out and we can't afford
>data loss.
>
>Yesterday, There was an cluster crash of root region but we thought we
>sucessfully restored that.But things did n't go that way.... There was a
>consitent data loss after that.
>
>
>--
>Thank you
>Kiran Sarvabhotla
>
>-----Even a correct decision is wrong when it is taken late


-- 

Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by kiran <ki...@gmail.com>.
Hi Lars,

This is very mysterious for us....  Some rows are completely deleted, some
of them are partially deleted with one CF present and other CF partially
present or not.

The regions which are once around 7gb has turned to few KB's, so it is a
major data loss for us... and We suspect they also underwent split. But
splitting is our assumption.....we are not sure...


On Fri, Feb 28, 2014 at 1:16 PM, lars hofhansl <la...@apache.org> wrote:

> HDFS size can vary and go down (when a compaction happens, and later the
> old files are collected and deleted).
> So the size in HDFS is not a good measure, unless you lost rows there's no
> reason to worry.
>
> Can you quantify "consistent data loss"? Did you count rows before and
> after? Can access any data at all?
>
> -- Lars
>
>
>
> ________________________________
>  From: kiran <ki...@gmail.com>
> To: user@hbase.apache.org
> Sent: Thursday, February 27, 2014 7:53 AM
> Subject: Hbase data loss scenario
>
>
> Hi All,
>
> We have been experiencing severe data loss issues from few hours. There are
> some wierd things going on in the cluster. We were unable to locate the
> data even in hdfs
>
> Hbase version 0.94.1
>
> Here is the wierd things that are going on:
>
> 1) Table which was once 1TB has now become 170GB with many of the regions
> which we once 7gb are now becoming few MB's. We are no clue  what is
> happening at all
>
> 2) Table is splitting (or what ever) (100 regions have become 200 regions)
> and ours is constantregionsplitpolicy with region size 20gb. I don't know
> why it is even spltting
>
> 3) HDFS namenode dump size which we periodically backup is decreasing
>
> 4) And there is a region chain with start keys and end keys as, I can't
> copy paste the exact thing. For example
>
> K1.xxx K2.xyz
> K2.xyz K3.xyz,138798010000.xyp
> K3.xyz,138798010000.xyp K4.xyq
>
> I have never seen a wierd start key and end key like this. We also suspect
> a failed split of a region around 20GB. We looked at logs many times but
> unable to get any sense out of it. Please help us out and we can't afford
> data loss.
>
> Yesterday, There was an cluster crash of root region but we thought we
> sucessfully restored that.But things did n't go that way.... There was a
> consitent data loss after that.
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>



-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by lars hofhansl <la...@apache.org>.
HDFS size can vary and go down (when a compaction happens, and later the old files are collected and deleted).
So the size in HDFS is not a good measure, unless you lost rows there's no reason to worry.

Can you quantify "consistent data loss"? Did you count rows before and after? Can access any data at all?

-- Lars



________________________________
 From: kiran <ki...@gmail.com>
To: user@hbase.apache.org 
Sent: Thursday, February 27, 2014 7:53 AM
Subject: Hbase data loss scenario
 

Hi All,

We have been experiencing severe data loss issues from few hours. There are
some wierd things going on in the cluster. We were unable to locate the
data even in hdfs

Hbase version 0.94.1

Here is the wierd things that are going on:

1) Table which was once 1TB has now become 170GB with many of the regions
which we once 7gb are now becoming few MB's. We are no clue  what is
happening at all

2) Table is splitting (or what ever) (100 regions have become 200 regions)
and ours is constantregionsplitpolicy with region size 20gb. I don't know
why it is even spltting

3) HDFS namenode dump size which we periodically backup is decreasing

4) And there is a region chain with start keys and end keys as, I can't
copy paste the exact thing. For example

K1.xxx K2.xyz
K2.xyz K3.xyz,138798010000.xyp
K3.xyz,138798010000.xyp K4.xyq

I have never seen a wierd start key and end key like this. We also suspect
a failed split of a region around 20GB. We looked at logs many times but
unable to get any sense out of it. Please help us out and we can't afford
data loss.

Yesterday, There was an cluster crash of root region but we thought we
sucessfully restored that.But things did n't go that way.... There was a
consitent data loss after that.


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by Stack <st...@duboce.net>.
On Thu, Feb 27, 2014 at 10:51 AM, kiran <ki...@gmail.com> wrote:

> Is there any place where hdfs command history is stored on lines
> .bash_history in shell. Since the regions have increased for the table
> about 100 over a night (From 120 to 211)... I am suspecting that some thing
> is wrong from hbase side...
>
>
See namenode log.
St.Ack

Re: Hbase data loss scenario

Posted by kiran <ki...@gmail.com>.
Is there any place where hdfs command history is stored on lines
.bash_history in shell. Since the regions have increased for the table
about 100 over a night (From 120 to 211)... I am suspecting that some thing
is wrong from hbase side...


On Fri, Feb 28, 2014 at 12:07 AM, kiran <ki...@gmail.com> wrote:

> TTL setting is Integer.MAX_VALUE. so it should not be problem.
>
>
> On Thu, Feb 27, 2014 at 11:49 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
>
>> Hi Kiran,
>>
>> Can you check your table TTL setting? Is it possible that the data are
>> expired and purged?
>>
>> Thanks,
>> Jimmy
>>
>>
>>
>> On Thu, Feb 27, 2014 at 10:11 AM, Stack <st...@duboce.net> wrote:
>>
>> > Anything in your logs that might give you a clue?  Master logs?  HDFS
>> > NameNode logs?
>> > St.Ack
>> >
>> >
>> > On Thu, Feb 27, 2014 at 7:53 AM, kiran <ki...@gmail.com>
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > We have been experiencing severe data loss issues from few hours.
>> There
>> > are
>> > > some wierd things going on in the cluster. We were unable to locate
>> the
>> > > data even in hdfs
>> > >
>> > > Hbase version 0.94.1
>> > >
>> > > Here is the wierd things that are going on:
>> > >
>> > > 1) Table which was once 1TB has now become 170GB with many of the
>> regions
>> > > which we once 7gb are now becoming few MB's. We are no clue  what is
>> > > happening at all
>> > >
>> > > 2) Table is splitting (or what ever) (100 regions have become 200
>> > regions)
>> > > and ours is constantregionsplitpolicy with region size 20gb. I don't
>> know
>> > > why it is even spltting
>> > >
>> > > 3) HDFS namenode dump size which we periodically backup is decreasing
>> > >
>> > > 4) And there is a region chain with start keys and end keys as, I
>> can't
>> > > copy paste the exact thing. For example
>> > >
>> > > K1.xxx K2.xyz
>> > > K2.xyz K3.xyz,138798010000.xyp
>> > > K3.xyz,138798010000.xyp K4.xyq
>> > >
>> > > I have never seen a wierd start key and end key like this. We also
>> > suspect
>> > > a failed split of a region around 20GB. We looked at logs many times
>> but
>> > > unable to get any sense out of it. Please help us out and we can't
>> afford
>> > > data loss.
>> > >
>> > > Yesterday, There was an cluster crash of root region but we thought we
>> > > sucessfully restored that.But things did n't go that way.... There
>> was a
>> > > consitent data loss after that.
>> > >
>> > >
>> > > --
>> > > Thank you
>> > > Kiran Sarvabhotla
>> > >
>> > > -----Even a correct decision is wrong when it is taken late
>> > >
>> >
>>
>
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
>


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by kiran <ki...@gmail.com>.
TTL setting is Integer.MAX_VALUE. so it should not be problem.


On Thu, Feb 27, 2014 at 11:49 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> Hi Kiran,
>
> Can you check your table TTL setting? Is it possible that the data are
> expired and purged?
>
> Thanks,
> Jimmy
>
>
>
> On Thu, Feb 27, 2014 at 10:11 AM, Stack <st...@duboce.net> wrote:
>
> > Anything in your logs that might give you a clue?  Master logs?  HDFS
> > NameNode logs?
> > St.Ack
> >
> >
> > On Thu, Feb 27, 2014 at 7:53 AM, kiran <ki...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > We have been experiencing severe data loss issues from few hours. There
> > are
> > > some wierd things going on in the cluster. We were unable to locate the
> > > data even in hdfs
> > >
> > > Hbase version 0.94.1
> > >
> > > Here is the wierd things that are going on:
> > >
> > > 1) Table which was once 1TB has now become 170GB with many of the
> regions
> > > which we once 7gb are now becoming few MB's. We are no clue  what is
> > > happening at all
> > >
> > > 2) Table is splitting (or what ever) (100 regions have become 200
> > regions)
> > > and ours is constantregionsplitpolicy with region size 20gb. I don't
> know
> > > why it is even spltting
> > >
> > > 3) HDFS namenode dump size which we periodically backup is decreasing
> > >
> > > 4) And there is a region chain with start keys and end keys as, I can't
> > > copy paste the exact thing. For example
> > >
> > > K1.xxx K2.xyz
> > > K2.xyz K3.xyz,138798010000.xyp
> > > K3.xyz,138798010000.xyp K4.xyq
> > >
> > > I have never seen a wierd start key and end key like this. We also
> > suspect
> > > a failed split of a region around 20GB. We looked at logs many times
> but
> > > unable to get any sense out of it. Please help us out and we can't
> afford
> > > data loss.
> > >
> > > Yesterday, There was an cluster crash of root region but we thought we
> > > sucessfully restored that.But things did n't go that way.... There was
> a
> > > consitent data loss after that.
> > >
> > >
> > > --
> > > Thank you
> > > Kiran Sarvabhotla
> > >
> > > -----Even a correct decision is wrong when it is taken late
> > >
> >
>



-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Hbase data loss scenario

Posted by Jimmy Xiang <jx...@cloudera.com>.
Hi Kiran,

Can you check your table TTL setting? Is it possible that the data are
expired and purged?

Thanks,
Jimmy



On Thu, Feb 27, 2014 at 10:11 AM, Stack <st...@duboce.net> wrote:

> Anything in your logs that might give you a clue?  Master logs?  HDFS
> NameNode logs?
> St.Ack
>
>
> On Thu, Feb 27, 2014 at 7:53 AM, kiran <ki...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > We have been experiencing severe data loss issues from few hours. There
> are
> > some wierd things going on in the cluster. We were unable to locate the
> > data even in hdfs
> >
> > Hbase version 0.94.1
> >
> > Here is the wierd things that are going on:
> >
> > 1) Table which was once 1TB has now become 170GB with many of the regions
> > which we once 7gb are now becoming few MB's. We are no clue  what is
> > happening at all
> >
> > 2) Table is splitting (or what ever) (100 regions have become 200
> regions)
> > and ours is constantregionsplitpolicy with region size 20gb. I don't know
> > why it is even spltting
> >
> > 3) HDFS namenode dump size which we periodically backup is decreasing
> >
> > 4) And there is a region chain with start keys and end keys as, I can't
> > copy paste the exact thing. For example
> >
> > K1.xxx K2.xyz
> > K2.xyz K3.xyz,138798010000.xyp
> > K3.xyz,138798010000.xyp K4.xyq
> >
> > I have never seen a wierd start key and end key like this. We also
> suspect
> > a failed split of a region around 20GB. We looked at logs many times but
> > unable to get any sense out of it. Please help us out and we can't afford
> > data loss.
> >
> > Yesterday, There was an cluster crash of root region but we thought we
> > sucessfully restored that.But things did n't go that way.... There was a
> > consitent data loss after that.
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
>

Re: Hbase data loss scenario

Posted by Stack <st...@duboce.net>.
Anything in your logs that might give you a clue?  Master logs?  HDFS
NameNode logs?
St.Ack


On Thu, Feb 27, 2014 at 7:53 AM, kiran <ki...@gmail.com> wrote:

> Hi All,
>
> We have been experiencing severe data loss issues from few hours. There are
> some wierd things going on in the cluster. We were unable to locate the
> data even in hdfs
>
> Hbase version 0.94.1
>
> Here is the wierd things that are going on:
>
> 1) Table which was once 1TB has now become 170GB with many of the regions
> which we once 7gb are now becoming few MB's. We are no clue  what is
> happening at all
>
> 2) Table is splitting (or what ever) (100 regions have become 200 regions)
> and ours is constantregionsplitpolicy with region size 20gb. I don't know
> why it is even spltting
>
> 3) HDFS namenode dump size which we periodically backup is decreasing
>
> 4) And there is a region chain with start keys and end keys as, I can't
> copy paste the exact thing. For example
>
> K1.xxx K2.xyz
> K2.xyz K3.xyz,138798010000.xyp
> K3.xyz,138798010000.xyp K4.xyq
>
> I have never seen a wierd start key and end key like this. We also suspect
> a failed split of a region around 20GB. We looked at logs many times but
> unable to get any sense out of it. Please help us out and we can't afford
> data loss.
>
> Yesterday, There was an cluster crash of root region but we thought we
> sucessfully restored that.But things did n't go that way.... There was a
> consitent data loss after that.
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>