You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Tim Robertson <ti...@gmail.com> on 2016/10/12 15:02:31 UTC

Data loss in MOB snapshot and clone?

Hi devs,
[Had a quick chat with Lars G. about this and before opening a Jira I
thought I'd raise it here first]

We have just experienced data loss in HBase 1.0.0-cdh5.4.10.

Before I dig into this further, I'd like to just ask if anyone has seen
this before?

The initial state was a table (tim_test) built with MOB support and a few
10's million rows and 10's billions of cells.

I wanted to rename the table to get this into production and did so as
follows:

  snapshot 'tim_test', 'tim_test-snapshot'
  clone_snapshot 'tim_test-snapshot', 'prod_b_map'

At this stage the application all looked good, and so I continued with:

  delete_snapshot 'tim_test-snapshot'
  disable 'tim_test'
  drop ‘tim_test’

Then things went... awry and data just started dropping out in the app.
Before long, all MOB data seemingly is gone.

The references in the new table MOB folder appear to point to the source
table (e.g.
/hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).

The RS logs full of ERROR like:

2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.regionserver.HStore:
The mob file
d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79ebfa2ddd66b48
could not be found in the locations
[hdfs://ha-nn/hbase/mobdir/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326,
hdfs://ha-nn/hbase/archive/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]

What I don't know is:
 1) was this running a background task to copy the MOB data when the
snapshot was cloned and I just deleted the source before the copy was
complete?
 - or
 2) when running "snapshot and clone" it just references the source MOB
data until a (?) change?
 3) snapshot and clone just doesn't support MOB?

Can anyone shed some light on this easily before I dig into it please?

While this situation exists (at least in 1.0.0) might it be good to get
info about data loss for MOB tables into the snapshot clone docs?

Thanks,
Tim

Re: Data loss in MOB snapshot and clone?

Posted by Tim Robertson <ti...@gmail.com>.
Thanks Jingcheng

You are probably better placed to describe the true problem than me, so
please do create the issue.  I'll try and find time next week to offer a
unit test unless someone gets to it first.





On Fri, Oct 14, 2016 at 12:47 PM, Du, Jingcheng <ji...@intel.com>
wrote:

> Hi Tim,
>
> This should be an issue. I'll file a jira to fix this.
> Some MOB hfiles that are still being flushed are missed in snapshotting.
> For the temporary solution, you can run 'flush tablename' before running
> 'snapshot tablename snapshotname'. This can avoid this issue. Thanks again
> for your findings.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Friday, October 14, 2016 1:54 PM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks for trying that Jingcheng
>
> I'll get time to do some testing next week on this and see if I can come
> up with a reproducible test.
> I can confirm for non-MOB is it all fine, and fields below the MOB
> threshold were not lost in the original process.
>
> Cheers,
> Tim
>
> On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <ji...@intel.com>
> wrote:
>
> > Hi Tim,
> >
> > Normally after the snapshot is cloned/restored, there will be an .link
> > directory (the format is .link-{hfileName}) in the archive directory
> > of the table for both mob and non-mob tables, and the hfile of
> > {hfileName} will be archived to the same directory with the .link
> directory.
> > The hfile won't be deleted by the file cleaner if the .link directory
> > is not empty which means this hfile is still referenced by others. And
> > the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee
> this.
> >
> > I did the same test based on the code in HBase master for both mob and
> > non-mob tables, and data are not lost.
> >
> > Tim, would you mind trying the steps for normal tables to see if the
> > data will be lost? Just one row is enough for the table. Thanks a lot.
> >
> > Regards,
> > Jingcheng
> >
> > -----Original Message-----
> > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > Sent: Thursday, October 13, 2016 4:48 PM
> > To: dev@hbase.apache.org
> > Subject: Re: Data loss in MOB snapshot and clone?
> >
> > Thanks Jingcheng
> >
> > Yes, it just references the source MOB data until MOB compaction.
> >
> > Based on that, I think this really is a critical bug.  It allowed the
> > MOBs to be deleted before that happened, and thus broken references
> > and data loss.  Or am I misunderstanding you please?
> >
> >
> >
> > On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng
> > <ji...@intel.com>
> > wrote:
> >
> > > Hi Tim,
> > >
> > > > was this running a background task to copy the MOB data when the
> > > snapshot was cloned and I just deleted the source before the copy
> > > was complete?
> > > The MOB data can be copied when mob compaction happens. But the MOB
> > > files should not be deleted even if they are not copied and after
> > > the source table is deleted. The archive cleaner should keep them
> > > until all the references are gone. Let me check the code again.
> > >
> > > > when running "snapshot and clone" it just references the source
> > > > MOB data
> > > until a (?) change?
> > > Yes, it just references the source MOB data until MOB compaction.
> > >
> > > > snapshot and clone just doesn't support MOB?
> > > It supports.
> > >
> > > Regards,
> > > Jingcheng
> > >
> > > -----Original Message-----
> > > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > > Sent: Thursday, October 13, 2016 1:56 AM
> > > To: dev@hbase.apache.org
> > > Subject: Re: Data loss in MOB snapshot and clone?
> > >
> > > Thanks - well it is now on the CDH community forum too.
> > >
> > > Jonathan Hsieh pretty much described what I see in his comment on
> > > HBASE-12332
> > > https://issues.apache.org/jira/browse/HBASE-12332?
> > > focusedCommentId=14241478&page=com.atlassian.jira.
> > > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
> > >
> > >
> > >
> > > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com>
> wrote:
> > >
> > > > Hi Tim,,
> > > >
> > > > Just read more details, it may not be related with the issue we
> > > > fixed (mob compaction related).
> > > > I am doing a similar test to see if I can reproduce it.
> > > >
> > > > Thanks,
> > > > Huaxiang
> > > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson
> > > > > <ti...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Thanks Ted, Huaxiang
> > > > >
> > > > > I'll move this to a Cloudera forum and comment back here if it
> > > > > appears unrelated.
> > > > >
> > > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > > > <ma...@cloudera.com>> wrote:
> > > > >
> > > > >> By the way, I forgot the forum link:
> > > > >> http://community.cloudera.com <
> > > > http://community.cloudera.com/> <
> > > > >> http://community.cloudera.com/
> > > > >> <http://community.cloudera.com/>>
> > > > >>
> > > > >> Thanks,
> > > > >> Huaxiang
> > > > >>
> > > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> > > <mailto:
> > > > hsun@cloudera.com>> wrote:
> > > > >>>
> > > > >>> Hi Tim,
> > > > >>>
> > > > >>>   I believe that it runs into an issue which is specific to
> > > > >>> cloudera
> > > > >> release we fixed recently. For details, could you discuss it in
> > > > >> cdh
> > > > forum?
> > > > >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> > > > hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> > > > >> can explain more there.
> > > > >>>
> > > > >>>   Thanks,
> > > > >>>   Huaxiang
> > > > >>>
> > > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com
> <mailto:
> > > > yuzhihong@gmail.com> <mailto:
> > > > >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> > > > >>>>
> > > > >>>> Have you looked at HBASE-16578 ?
> > > > >>>>
> > > > >>>> Cheers
> > > > >>>>
> > > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > > > timrobertson100@gmail.com <ma...@gmail.com>
> > > > >> <mailto:timrobertson100@gmail.com
> > > > >> <ma...@gmail.com>>>
> > > > wrote:
> > > > >>>>>
> > > > >>>>> Hi devs,
> > > > >>>>> [Had a quick chat with Lars G. about this and before opening
> > > > >>>>> a Jira I thought I'd raise it here first]
> > > > >>>>>
> > > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > > > >>>>>
> > > > >>>>> Before I dig into this further, I'd like to just ask if
> > > > >>>>> anyone has
> > > > seen
> > > > >>>>> this before?
> > > > >>>>>
> > > > >>>>> The initial state was a table (tim_test) built with MOB
> > > > >>>>> support and a
> > > > >> few
> > > > >>>>> 10's million rows and 10's billions of cells.
> > > > >>>>>
> > > > >>>>> I wanted to rename the table to get this into production and
> > > > >>>>> did so
> > > > as
> > > > >>>>> follows:
> > > > >>>>>
> > > > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > > > >>>>>
> > > > >>>>> At this stage the application all looked good, and so I
> > > > >>>>> continued
> > > > with:
> > > > >>>>>
> > > > >>>>> delete_snapshot 'tim_test-snapshot'
> > > > >>>>> disable 'tim_test'
> > > > >>>>> drop ‘tim_test’
> > > > >>>>>
> > > > >>>>> Then things went... awry and data just started dropping out
> > > > >>>>> in the
> > > > app.
> > > > >>>>> Before long, all MOB data seemingly is gone.
> > > > >>>>>
> > > > >>>>> The references in the new table MOB folder appear to point
> > > > >>>>> to the
> > > > >> source
> > > > >>>>> table (e.g.
> > > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bd
> > > > >>>>> fe
> > > > >>>>> ed
> > > > >>>>> 2f5f
> > > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > > > 8ae6318dfba2).
> > > > >>>>>
> > > > >>>>> The RS logs full of ERROR like:
> > > > >>>>>
> > > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > > > >> regionserver.HStore:
> > > > >>>>> The mob file
> > > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > > > >> bfa2ddd66b48
> > > > >>>>> could not be found in the locations
> > > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > > > >> <hdfs://ha-nn/hbase/mobdir/
> > > > <hdfs://ha-nn/hbase/mobdir/>
> > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > > >> 6>
> > > > >> ,
> > > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > > <hdfs://ha-nn/hbase/archive/
> > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > > >> 6]
> > > > >> >
> > > > >>>>>
> > > > >>>>> What I don't know is:
> > > > >>>>> 1) was this running a background task to copy the MOB data
> > > > >>>>> when the snapshot was cloned and I just deleted the source
> > > > >>>>> before the copy was complete?
> > > > >>>>> - or
> > > > >>>>> 2) when running "snapshot and clone" it just references the
> > > > >>>>> source
> > > > MOB
> > > > >>>>> data until a (?) change?
> > > > >>>>> 3) snapshot and clone just doesn't support MOB?
> > > > >>>>>
> > > > >>>>> Can anyone shed some light on this easily before I dig into
> > > > >>>>> it
> > > > please?
> > > > >>>>>
> > > > >>>>> While this situation exists (at least in 1.0.0) might it be
> > > > >>>>> good to
> > > > get
> > > > >>>>> info about data loss for MOB tables into the snapshot clone
> docs?
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Tim
> > > >
> > > >
> > >
> >
>

RE: Data loss in MOB snapshot and clone?

Posted by "Du, Jingcheng" <ji...@intel.com>.
Hi Tim,

This should be an issue. I'll file a jira to fix this.
Some MOB hfiles that are still being flushed are missed in snapshotting.
For the temporary solution, you can run 'flush tablename' before running 'snapshot tablename snapshotname'. This can avoid this issue. Thanks again for your findings.

Regards,
Jingcheng

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: Friday, October 14, 2016 1:54 PM
To: dev@hbase.apache.org
Subject: Re: Data loss in MOB snapshot and clone?

Thanks for trying that Jingcheng

I'll get time to do some testing next week on this and see if I can come up with a reproducible test.
I can confirm for non-MOB is it all fine, and fields below the MOB threshold were not lost in the original process.

Cheers,
Tim

On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <ji...@intel.com>
wrote:

> Hi Tim,
>
> Normally after the snapshot is cloned/restored, there will be an .link 
> directory (the format is .link-{hfileName}) in the archive directory 
> of the table for both mob and non-mob tables, and the hfile of 
> {hfileName} will be archived to the same directory with the .link directory.
> The hfile won't be deleted by the file cleaner if the .link directory 
> is not empty which means this hfile is still referenced by others. And 
> the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee this.
>
> I did the same test based on the code in HBase master for both mob and 
> non-mob tables, and data are not lost.
>
> Tim, would you mind trying the steps for normal tables to see if the 
> data will be lost? Just one row is enough for the table. Thanks a lot.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, October 13, 2016 4:48 PM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks Jingcheng
>
> Yes, it just references the source MOB data until MOB compaction.
>
> Based on that, I think this really is a critical bug.  It allowed the 
> MOBs to be deleted before that happened, and thus broken references 
> and data loss.  Or am I misunderstanding you please?
>
>
>
> On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng 
> <ji...@intel.com>
> wrote:
>
> > Hi Tim,
> >
> > > was this running a background task to copy the MOB data when the
> > snapshot was cloned and I just deleted the source before the copy 
> > was complete?
> > The MOB data can be copied when mob compaction happens. But the MOB 
> > files should not be deleted even if they are not copied and after 
> > the source table is deleted. The archive cleaner should keep them 
> > until all the references are gone. Let me check the code again.
> >
> > > when running "snapshot and clone" it just references the source 
> > > MOB data
> > until a (?) change?
> > Yes, it just references the source MOB data until MOB compaction.
> >
> > > snapshot and clone just doesn't support MOB?
> > It supports.
> >
> > Regards,
> > Jingcheng
> >
> > -----Original Message-----
> > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > Sent: Thursday, October 13, 2016 1:56 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Data loss in MOB snapshot and clone?
> >
> > Thanks - well it is now on the CDH community forum too.
> >
> > Jonathan Hsieh pretty much described what I see in his comment on
> > HBASE-12332
> > https://issues.apache.org/jira/browse/HBASE-12332?
> > focusedCommentId=14241478&page=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
> >
> >
> >
> > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:
> >
> > > Hi Tim,,
> > >
> > > Just read more details, it may not be related with the issue we 
> > > fixed (mob compaction related).
> > > I am doing a similar test to see if I can reproduce it.
> > >
> > > Thanks,
> > > Huaxiang
> > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson 
> > > > <ti...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks Ted, Huaxiang
> > > >
> > > > I'll move this to a Cloudera forum and comment back here if it 
> > > > appears unrelated.
> > > >
> > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > > <ma...@cloudera.com>> wrote:
> > > >
> > > >> By the way, I forgot the forum link:
> > > >> http://community.cloudera.com <
> > > http://community.cloudera.com/> <
> > > >> http://community.cloudera.com/ 
> > > >> <http://community.cloudera.com/>>
> > > >>
> > > >> Thanks,
> > > >> Huaxiang
> > > >>
> > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> > <mailto:
> > > hsun@cloudera.com>> wrote:
> > > >>>
> > > >>> Hi Tim,
> > > >>>
> > > >>>   I believe that it runs into an issue which is specific to 
> > > >>> cloudera
> > > >> release we fixed recently. For details, could you discuss it in 
> > > >> cdh
> > > forum?
> > > >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> > > hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> > > >> can explain more there.
> > > >>>
> > > >>>   Thanks,
> > > >>>   Huaxiang
> > > >>>
> > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> > > yuzhihong@gmail.com> <mailto:
> > > >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> > > >>>>
> > > >>>> Have you looked at HBASE-16578 ?
> > > >>>>
> > > >>>> Cheers
> > > >>>>
> > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > > timrobertson100@gmail.com <ma...@gmail.com>
> > > >> <mailto:timrobertson100@gmail.com 
> > > >> <ma...@gmail.com>>>
> > > wrote:
> > > >>>>>
> > > >>>>> Hi devs,
> > > >>>>> [Had a quick chat with Lars G. about this and before opening 
> > > >>>>> a Jira I thought I'd raise it here first]
> > > >>>>>
> > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > > >>>>>
> > > >>>>> Before I dig into this further, I'd like to just ask if 
> > > >>>>> anyone has
> > > seen
> > > >>>>> this before?
> > > >>>>>
> > > >>>>> The initial state was a table (tim_test) built with MOB 
> > > >>>>> support and a
> > > >> few
> > > >>>>> 10's million rows and 10's billions of cells.
> > > >>>>>
> > > >>>>> I wanted to rename the table to get this into production and 
> > > >>>>> did so
> > > as
> > > >>>>> follows:
> > > >>>>>
> > > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > > >>>>>
> > > >>>>> At this stage the application all looked good, and so I 
> > > >>>>> continued
> > > with:
> > > >>>>>
> > > >>>>> delete_snapshot 'tim_test-snapshot'
> > > >>>>> disable 'tim_test'
> > > >>>>> drop ‘tim_test’
> > > >>>>>
> > > >>>>> Then things went... awry and data just started dropping out 
> > > >>>>> in the
> > > app.
> > > >>>>> Before long, all MOB data seemingly is gone.
> > > >>>>>
> > > >>>>> The references in the new table MOB folder appear to point 
> > > >>>>> to the
> > > >> source
> > > >>>>> table (e.g.
> > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bd
> > > >>>>> fe
> > > >>>>> ed
> > > >>>>> 2f5f
> > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > > 8ae6318dfba2).
> > > >>>>>
> > > >>>>> The RS logs full of ERROR like:
> > > >>>>>
> > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > > >> regionserver.HStore:
> > > >>>>> The mob file
> > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > > >> bfa2ddd66b48
> > > >>>>> could not be found in the locations 
> > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > > >> <hdfs://ha-nn/hbase/mobdir/
> > > <hdfs://ha-nn/hbase/mobdir/>
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > >> 6>
> > > >> ,
> > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > <hdfs://ha-nn/hbase/archive/
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > >> 6]
> > > >> >
> > > >>>>>
> > > >>>>> What I don't know is:
> > > >>>>> 1) was this running a background task to copy the MOB data 
> > > >>>>> when the snapshot was cloned and I just deleted the source 
> > > >>>>> before the copy was complete?
> > > >>>>> - or
> > > >>>>> 2) when running "snapshot and clone" it just references the 
> > > >>>>> source
> > > MOB
> > > >>>>> data until a (?) change?
> > > >>>>> 3) snapshot and clone just doesn't support MOB?
> > > >>>>>
> > > >>>>> Can anyone shed some light on this easily before I dig into 
> > > >>>>> it
> > > please?
> > > >>>>>
> > > >>>>> While this situation exists (at least in 1.0.0) might it be 
> > > >>>>> good to
> > > get
> > > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Tim
> > >
> > >
> >
>

Re: Data loss in MOB snapshot and clone?

Posted by Tim Robertson <ti...@gmail.com>.
Thanks for trying that Jingcheng

I'll get time to do some testing next week on this and see if I can come up
with a reproducible test.
I can confirm for non-MOB is it all fine, and fields below the MOB
threshold were not lost in the original process.

Cheers,
Tim

On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <ji...@intel.com>
wrote:

> Hi Tim,
>
> Normally after the snapshot is cloned/restored, there will be an .link
> directory (the format is .link-{hfileName}) in the archive directory of the
> table for both mob and non-mob tables, and the hfile of {hfileName} will be
> archived to the same directory with the .link directory.
> The hfile won't be deleted by the file cleaner if the .link directory is
> not empty which means this hfile is still referenced by others. And the
> cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee this.
>
> I did the same test based on the code in HBase master for both mob and
> non-mob tables, and data are not lost.
>
> Tim, would you mind trying the steps for normal tables to see if the data
> will be lost? Just one row is enough for the table. Thanks a lot.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, October 13, 2016 4:48 PM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks Jingcheng
>
> Yes, it just references the source MOB data until MOB compaction.
>
> Based on that, I think this really is a critical bug.  It allowed the MOBs
> to be deleted before that happened, and thus broken references and data
> loss.  Or am I misunderstanding you please?
>
>
>
> On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng <ji...@intel.com>
> wrote:
>
> > Hi Tim,
> >
> > > was this running a background task to copy the MOB data when the
> > snapshot was cloned and I just deleted the source before the copy was
> > complete?
> > The MOB data can be copied when mob compaction happens. But the MOB
> > files should not be deleted even if they are not copied and after the
> > source table is deleted. The archive cleaner should keep them until
> > all the references are gone. Let me check the code again.
> >
> > > when running "snapshot and clone" it just references the source MOB
> > > data
> > until a (?) change?
> > Yes, it just references the source MOB data until MOB compaction.
> >
> > > snapshot and clone just doesn't support MOB?
> > It supports.
> >
> > Regards,
> > Jingcheng
> >
> > -----Original Message-----
> > From: Tim Robertson [mailto:timrobertson100@gmail.com]
> > Sent: Thursday, October 13, 2016 1:56 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Data loss in MOB snapshot and clone?
> >
> > Thanks - well it is now on the CDH community forum too.
> >
> > Jonathan Hsieh pretty much described what I see in his comment on
> > HBASE-12332
> > https://issues.apache.org/jira/browse/HBASE-12332?
> > focusedCommentId=14241478&page=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
> >
> >
> >
> > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:
> >
> > > Hi Tim,,
> > >
> > > Just read more details, it may not be related with the issue we
> > > fixed (mob compaction related).
> > > I am doing a similar test to see if I can reproduce it.
> > >
> > > Thanks,
> > > Huaxiang
> > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson
> > > > <ti...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks Ted, Huaxiang
> > > >
> > > > I'll move this to a Cloudera forum and comment back here if it
> > > > appears unrelated.
> > > >
> > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > > <ma...@cloudera.com>> wrote:
> > > >
> > > >> By the way, I forgot the forum link:
> > > >> http://community.cloudera.com <
> > > http://community.cloudera.com/> <
> > > >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> > > >>
> > > >> Thanks,
> > > >> Huaxiang
> > > >>
> > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> > <mailto:
> > > hsun@cloudera.com>> wrote:
> > > >>>
> > > >>> Hi Tim,
> > > >>>
> > > >>>   I believe that it runs into an issue which is specific to
> > > >>> cloudera
> > > >> release we fixed recently. For details, could you discuss it in
> > > >> cdh
> > > forum?
> > > >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> > > hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> > > >> can explain more there.
> > > >>>
> > > >>>   Thanks,
> > > >>>   Huaxiang
> > > >>>
> > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> > > yuzhihong@gmail.com> <mailto:
> > > >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> > > >>>>
> > > >>>> Have you looked at HBASE-16578 ?
> > > >>>>
> > > >>>> Cheers
> > > >>>>
> > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > > timrobertson100@gmail.com <ma...@gmail.com>
> > > >> <mailto:timrobertson100@gmail.com
> > > >> <ma...@gmail.com>>>
> > > wrote:
> > > >>>>>
> > > >>>>> Hi devs,
> > > >>>>> [Had a quick chat with Lars G. about this and before opening a
> > > >>>>> Jira I thought I'd raise it here first]
> > > >>>>>
> > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > > >>>>>
> > > >>>>> Before I dig into this further, I'd like to just ask if anyone
> > > >>>>> has
> > > seen
> > > >>>>> this before?
> > > >>>>>
> > > >>>>> The initial state was a table (tim_test) built with MOB
> > > >>>>> support and a
> > > >> few
> > > >>>>> 10's million rows and 10's billions of cells.
> > > >>>>>
> > > >>>>> I wanted to rename the table to get this into production and
> > > >>>>> did so
> > > as
> > > >>>>> follows:
> > > >>>>>
> > > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > > >>>>>
> > > >>>>> At this stage the application all looked good, and so I
> > > >>>>> continued
> > > with:
> > > >>>>>
> > > >>>>> delete_snapshot 'tim_test-snapshot'
> > > >>>>> disable 'tim_test'
> > > >>>>> drop ‘tim_test’
> > > >>>>>
> > > >>>>> Then things went... awry and data just started dropping out in
> > > >>>>> the
> > > app.
> > > >>>>> Before long, all MOB data seemingly is gone.
> > > >>>>>
> > > >>>>> The references in the new table MOB folder appear to point to
> > > >>>>> the
> > > >> source
> > > >>>>> table (e.g.
> > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfe
> > > >>>>> ed
> > > >>>>> 2f5f
> > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > > 8ae6318dfba2).
> > > >>>>>
> > > >>>>> The RS logs full of ERROR like:
> > > >>>>>
> > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > > >> regionserver.HStore:
> > > >>>>> The mob file
> > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > > >> bfa2ddd66b48
> > > >>>>> could not be found in the locations
> > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > > >> <hdfs://ha-nn/hbase/mobdir/
> > > <hdfs://ha-nn/hbase/mobdir/>
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>
> > > >> ,
> > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > <hdfs://ha-nn/hbase/archive/
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > >> >
> > > >>>>>
> > > >>>>> What I don't know is:
> > > >>>>> 1) was this running a background task to copy the MOB data
> > > >>>>> when the snapshot was cloned and I just deleted the source
> > > >>>>> before the copy was complete?
> > > >>>>> - or
> > > >>>>> 2) when running "snapshot and clone" it just references the
> > > >>>>> source
> > > MOB
> > > >>>>> data until a (?) change?
> > > >>>>> 3) snapshot and clone just doesn't support MOB?
> > > >>>>>
> > > >>>>> Can anyone shed some light on this easily before I dig into it
> > > please?
> > > >>>>>
> > > >>>>> While this situation exists (at least in 1.0.0) might it be
> > > >>>>> good to
> > > get
> > > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Tim
> > >
> > >
> >
>

RE: Data loss in MOB snapshot and clone?

Posted by "Du, Jingcheng" <ji...@intel.com>.
Hi Tim,

Normally after the snapshot is cloned/restored, there will be an .link directory (the format is .link-{hfileName}) in the archive directory of the table for both mob and non-mob tables, and the hfile of {hfileName} will be archived to the same directory with the .link directory. 
The hfile won't be deleted by the file cleaner if the .link directory is not empty which means this hfile is still referenced by others. And the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee this.

I did the same test based on the code in HBase master for both mob and non-mob tables, and data are not lost.

Tim, would you mind trying the steps for normal tables to see if the data will be lost? Just one row is enough for the table. Thanks a lot.

Regards,
Jingcheng

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: Thursday, October 13, 2016 4:48 PM
To: dev@hbase.apache.org
Subject: Re: Data loss in MOB snapshot and clone?

Thanks Jingcheng

Yes, it just references the source MOB data until MOB compaction.

Based on that, I think this really is a critical bug.  It allowed the MOBs to be deleted before that happened, and thus broken references and data loss.  Or am I misunderstanding you please?



On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng <ji...@intel.com>
wrote:

> Hi Tim,
>
> > was this running a background task to copy the MOB data when the
> snapshot was cloned and I just deleted the source before the copy was 
> complete?
> The MOB data can be copied when mob compaction happens. But the MOB 
> files should not be deleted even if they are not copied and after the 
> source table is deleted. The archive cleaner should keep them until 
> all the references are gone. Let me check the code again.
>
> > when running "snapshot and clone" it just references the source MOB 
> > data
> until a (?) change?
> Yes, it just references the source MOB data until MOB compaction.
>
> > snapshot and clone just doesn't support MOB?
> It supports.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, October 13, 2016 1:56 AM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks - well it is now on the CDH community forum too.
>
> Jonathan Hsieh pretty much described what I see in his comment on
> HBASE-12332
> https://issues.apache.org/jira/browse/HBASE-12332?
> focusedCommentId=14241478&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
>
>
>
> On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:
>
> > Hi Tim,,
> >
> > Just read more details, it may not be related with the issue we 
> > fixed (mob compaction related).
> > I am doing a similar test to see if I can reproduce it.
> >
> > Thanks,
> > Huaxiang
> > > On Oct 12, 2016, at 10:29 AM, Tim Robertson 
> > > <ti...@gmail.com>
> > wrote:
> > >
> > > Thanks Ted, Huaxiang
> > >
> > > I'll move this to a Cloudera forum and comment back here if it 
> > > appears unrelated.
> > >
> > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > <ma...@cloudera.com>> wrote:
> > >
> > >> By the way, I forgot the forum link: 
> > >> http://community.cloudera.com <
> > http://community.cloudera.com/> <
> > >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> > >>
> > >> Thanks,
> > >> Huaxiang
> > >>
> > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> <mailto:
> > hsun@cloudera.com>> wrote:
> > >>>
> > >>> Hi Tim,
> > >>>
> > >>>   I believe that it runs into an issue which is specific to 
> > >>> cloudera
> > >> release we fixed recently. For details, could you discuss it in 
> > >> cdh
> > forum?
> > >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> > hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> > >> can explain more there.
> > >>>
> > >>>   Thanks,
> > >>>   Huaxiang
> > >>>
> > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> > yuzhihong@gmail.com> <mailto:
> > >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> > >>>>
> > >>>> Have you looked at HBASE-16578 ?
> > >>>>
> > >>>> Cheers
> > >>>>
> > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > timrobertson100@gmail.com <ma...@gmail.com>
> > >> <mailto:timrobertson100@gmail.com 
> > >> <ma...@gmail.com>>>
> > wrote:
> > >>>>>
> > >>>>> Hi devs,
> > >>>>> [Had a quick chat with Lars G. about this and before opening a 
> > >>>>> Jira I thought I'd raise it here first]
> > >>>>>
> > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > >>>>>
> > >>>>> Before I dig into this further, I'd like to just ask if anyone 
> > >>>>> has
> > seen
> > >>>>> this before?
> > >>>>>
> > >>>>> The initial state was a table (tim_test) built with MOB 
> > >>>>> support and a
> > >> few
> > >>>>> 10's million rows and 10's billions of cells.
> > >>>>>
> > >>>>> I wanted to rename the table to get this into production and 
> > >>>>> did so
> > as
> > >>>>> follows:
> > >>>>>
> > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > >>>>>
> > >>>>> At this stage the application all looked good, and so I 
> > >>>>> continued
> > with:
> > >>>>>
> > >>>>> delete_snapshot 'tim_test-snapshot'
> > >>>>> disable 'tim_test'
> > >>>>> drop ‘tim_test’
> > >>>>>
> > >>>>> Then things went... awry and data just started dropping out in 
> > >>>>> the
> > app.
> > >>>>> Before long, all MOB data seemingly is gone.
> > >>>>>
> > >>>>> The references in the new table MOB folder appear to point to 
> > >>>>> the
> > >> source
> > >>>>> table (e.g.
> > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfe
> > >>>>> ed
> > >>>>> 2f5f
> > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > 8ae6318dfba2).
> > >>>>>
> > >>>>> The RS logs full of ERROR like:
> > >>>>>
> > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > >> regionserver.HStore:
> > >>>>> The mob file
> > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > >> bfa2ddd66b48
> > >>>>> could not be found in the locations 
> > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > >> <hdfs://ha-nn/hbase/mobdir/
> > <hdfs://ha-nn/hbase/mobdir/>
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>
> > >> ,
> > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > <hdfs://ha-nn/hbase/archive/
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > >> >
> > >>>>>
> > >>>>> What I don't know is:
> > >>>>> 1) was this running a background task to copy the MOB data 
> > >>>>> when the snapshot was cloned and I just deleted the source 
> > >>>>> before the copy was complete?
> > >>>>> - or
> > >>>>> 2) when running "snapshot and clone" it just references the 
> > >>>>> source
> > MOB
> > >>>>> data until a (?) change?
> > >>>>> 3) snapshot and clone just doesn't support MOB?
> > >>>>>
> > >>>>> Can anyone shed some light on this easily before I dig into it
> > please?
> > >>>>>
> > >>>>> While this situation exists (at least in 1.0.0) might it be 
> > >>>>> good to
> > get
> > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Tim
> >
> >
>

Re: Data loss in MOB snapshot and clone?

Posted by Tim Robertson <ti...@gmail.com>.
Thanks Jingcheng

Yes, it just references the source MOB data until MOB compaction.

Based on that, I think this really is a critical bug.  It allowed the MOBs
to be deleted before that happened, and thus broken references and data
loss.  Or am I misunderstanding you please?



On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng <ji...@intel.com>
wrote:

> Hi Tim,
>
> > was this running a background task to copy the MOB data when the
> snapshot was cloned and I just deleted the source before the copy was
> complete?
> The MOB data can be copied when mob compaction happens. But the MOB files
> should not be deleted even if they are not copied and after the source
> table is deleted. The archive cleaner should keep them until all the
> references are gone. Let me check the code again.
>
> > when running "snapshot and clone" it just references the source MOB data
> until a (?) change?
> Yes, it just references the source MOB data until MOB compaction.
>
> > snapshot and clone just doesn't support MOB?
> It supports.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, October 13, 2016 1:56 AM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks - well it is now on the CDH community forum too.
>
> Jonathan Hsieh pretty much described what I see in his comment on
> HBASE-12332
> https://issues.apache.org/jira/browse/HBASE-12332?
> focusedCommentId=14241478&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
>
>
>
> On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:
>
> > Hi Tim,,
> >
> > Just read more details, it may not be related with the issue we fixed
> > (mob compaction related).
> > I am doing a similar test to see if I can reproduce it.
> >
> > Thanks,
> > Huaxiang
> > > On Oct 12, 2016, at 10:29 AM, Tim Robertson
> > > <ti...@gmail.com>
> > wrote:
> > >
> > > Thanks Ted, Huaxiang
> > >
> > > I'll move this to a Cloudera forum and comment back here if it
> > > appears unrelated.
> > >
> > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> > <ma...@cloudera.com>> wrote:
> > >
> > >> By the way, I forgot the forum link: http://community.cloudera.com
> > >> <
> > http://community.cloudera.com/> <
> > >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> > >>
> > >> Thanks,
> > >> Huaxiang
> > >>
> > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com
> <mailto:
> > hsun@cloudera.com>> wrote:
> > >>>
> > >>> Hi Tim,
> > >>>
> > >>>   I believe that it runs into an issue which is specific to
> > >>> cloudera
> > >> release we fixed recently. For details, could you discuss it in cdh
> > forum?
> > >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> > hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> > >> can explain more there.
> > >>>
> > >>>   Thanks,
> > >>>   Huaxiang
> > >>>
> > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> > yuzhihong@gmail.com> <mailto:
> > >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> > >>>>
> > >>>> Have you looked at HBASE-16578 ?
> > >>>>
> > >>>> Cheers
> > >>>>
> > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > timrobertson100@gmail.com <ma...@gmail.com>
> > >> <mailto:timrobertson100@gmail.com
> > >> <ma...@gmail.com>>>
> > wrote:
> > >>>>>
> > >>>>> Hi devs,
> > >>>>> [Had a quick chat with Lars G. about this and before opening a
> > >>>>> Jira I thought I'd raise it here first]
> > >>>>>
> > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > >>>>>
> > >>>>> Before I dig into this further, I'd like to just ask if anyone
> > >>>>> has
> > seen
> > >>>>> this before?
> > >>>>>
> > >>>>> The initial state was a table (tim_test) built with MOB support
> > >>>>> and a
> > >> few
> > >>>>> 10's million rows and 10's billions of cells.
> > >>>>>
> > >>>>> I wanted to rename the table to get this into production and did
> > >>>>> so
> > as
> > >>>>> follows:
> > >>>>>
> > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > >>>>>
> > >>>>> At this stage the application all looked good, and so I
> > >>>>> continued
> > with:
> > >>>>>
> > >>>>> delete_snapshot 'tim_test-snapshot'
> > >>>>> disable 'tim_test'
> > >>>>> drop ‘tim_test’
> > >>>>>
> > >>>>> Then things went... awry and data just started dropping out in
> > >>>>> the
> > app.
> > >>>>> Before long, all MOB data seemingly is gone.
> > >>>>>
> > >>>>> The references in the new table MOB folder appear to point to
> > >>>>> the
> > >> source
> > >>>>> table (e.g.
> > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed
> > >>>>> 2f5f
> > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > 8ae6318dfba2).
> > >>>>>
> > >>>>> The RS logs full of ERROR like:
> > >>>>>
> > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > >> regionserver.HStore:
> > >>>>> The mob file
> > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > >> bfa2ddd66b48
> > >>>>> could not be found in the locations
> > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > >> <hdfs://ha-nn/hbase/mobdir/
> > <hdfs://ha-nn/hbase/mobdir/>
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > <hdfs://ha-nn/hbase/archive/
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> > >>>>>
> > >>>>> What I don't know is:
> > >>>>> 1) was this running a background task to copy the MOB data when
> > >>>>> the snapshot was cloned and I just deleted the source before the
> > >>>>> copy was complete?
> > >>>>> - or
> > >>>>> 2) when running "snapshot and clone" it just references the
> > >>>>> source
> > MOB
> > >>>>> data until a (?) change?
> > >>>>> 3) snapshot and clone just doesn't support MOB?
> > >>>>>
> > >>>>> Can anyone shed some light on this easily before I dig into it
> > please?
> > >>>>>
> > >>>>> While this situation exists (at least in 1.0.0) might it be good
> > >>>>> to
> > get
> > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Tim
> >
> >
>

RE: Data loss in MOB snapshot and clone?

Posted by "Du, Jingcheng" <ji...@intel.com>.
Hi Tim,

> was this running a background task to copy the MOB data when the snapshot was cloned and I just deleted the source before the copy was complete?
The MOB data can be copied when mob compaction happens. But the MOB files should not be deleted even if they are not copied and after the source table is deleted. The archive cleaner should keep them until all the references are gone. Let me check the code again.

> when running "snapshot and clone" it just references the source MOB data until a (?) change?
Yes, it just references the source MOB data until MOB compaction.

> snapshot and clone just doesn't support MOB?
It supports.

Regards,
Jingcheng

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: Thursday, October 13, 2016 1:56 AM
To: dev@hbase.apache.org
Subject: Re: Data loss in MOB snapshot and clone?

Thanks - well it is now on the CDH community forum too.

Jonathan Hsieh pretty much described what I see in his comment on
HBASE-12332
https://issues.apache.org/jira/browse/HBASE-12332?focusedCommentId=14241478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14241478



On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:

> Hi Tim,,
>
> Just read more details, it may not be related with the issue we fixed 
> (mob compaction related).
> I am doing a similar test to see if I can reproduce it.
>
> Thanks,
> Huaxiang
> > On Oct 12, 2016, at 10:29 AM, Tim Robertson 
> > <ti...@gmail.com>
> wrote:
> >
> > Thanks Ted, Huaxiang
> >
> > I'll move this to a Cloudera forum and comment back here if it 
> > appears unrelated.
> >
> > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> <ma...@cloudera.com>> wrote:
> >
> >> By the way, I forgot the forum link: http://community.cloudera.com 
> >> <
> http://community.cloudera.com/> <
> >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> >>
> >> Thanks,
> >> Huaxiang
> >>
> >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com <mailto:
> hsun@cloudera.com>> wrote:
> >>>
> >>> Hi Tim,
> >>>
> >>>   I believe that it runs into an issue which is specific to 
> >>> cloudera
> >> release we fixed recently. For details, could you discuss it in cdh
> forum?
> >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> >> can explain more there.
> >>>
> >>>   Thanks,
> >>>   Huaxiang
> >>>
> >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> yuzhihong@gmail.com> <mailto:
> >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> >>>>
> >>>> Have you looked at HBASE-16578 ?
> >>>>
> >>>> Cheers
> >>>>
> >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> timrobertson100@gmail.com <ma...@gmail.com>
> >> <mailto:timrobertson100@gmail.com 
> >> <ma...@gmail.com>>>
> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>> [Had a quick chat with Lars G. about this and before opening a 
> >>>>> Jira I thought I'd raise it here first]
> >>>>>
> >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> >>>>>
> >>>>> Before I dig into this further, I'd like to just ask if anyone 
> >>>>> has
> seen
> >>>>> this before?
> >>>>>
> >>>>> The initial state was a table (tim_test) built with MOB support 
> >>>>> and a
> >> few
> >>>>> 10's million rows and 10's billions of cells.
> >>>>>
> >>>>> I wanted to rename the table to get this into production and did 
> >>>>> so
> as
> >>>>> follows:
> >>>>>
> >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> >>>>>
> >>>>> At this stage the application all looked good, and so I 
> >>>>> continued
> with:
> >>>>>
> >>>>> delete_snapshot 'tim_test-snapshot'
> >>>>> disable 'tim_test'
> >>>>> drop ‘tim_test’
> >>>>>
> >>>>> Then things went... awry and data just started dropping out in 
> >>>>> the
> app.
> >>>>> Before long, all MOB data seemingly is gone.
> >>>>>
> >>>>> The references in the new table MOB folder appear to point to 
> >>>>> the
> >> source
> >>>>> table (e.g.
> >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed
> >>>>> 2f5f
> >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> 8ae6318dfba2).
> >>>>>
> >>>>> The RS logs full of ERROR like:
> >>>>>
> >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> >> regionserver.HStore:
> >>>>> The mob file
> >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> >> bfa2ddd66b48
> >>>>> could not be found in the locations 
> >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 
> >> <hdfs://ha-nn/hbase/mobdir/
> <hdfs://ha-nn/hbase/mobdir/>
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> <hdfs://ha-nn/hbase/archive/
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> >>>>>
> >>>>> What I don't know is:
> >>>>> 1) was this running a background task to copy the MOB data when 
> >>>>> the snapshot was cloned and I just deleted the source before the 
> >>>>> copy was complete?
> >>>>> - or
> >>>>> 2) when running "snapshot and clone" it just references the 
> >>>>> source
> MOB
> >>>>> data until a (?) change?
> >>>>> 3) snapshot and clone just doesn't support MOB?
> >>>>>
> >>>>> Can anyone shed some light on this easily before I dig into it
> please?
> >>>>>
> >>>>> While this situation exists (at least in 1.0.0) might it be good 
> >>>>> to
> get
> >>>>> info about data loss for MOB tables into the snapshot clone docs?
> >>>>>
> >>>>> Thanks,
> >>>>> Tim
>
>

Re: Data loss in MOB snapshot and clone?

Posted by Tim Robertson <ti...@gmail.com>.
Thanks - well it is now on the CDH community forum too.

Jonathan Hsieh pretty much described what I see in his comment on
HBASE-12332
https://issues.apache.org/jira/browse/HBASE-12332?focusedCommentId=14241478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14241478



On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <hs...@cloudera.com> wrote:

> Hi Tim,,
>
> Just read more details, it may not be related with the issue we fixed (mob
> compaction related).
> I am doing a similar test to see if I can reproduce it.
>
> Thanks,
> Huaxiang
> > On Oct 12, 2016, at 10:29 AM, Tim Robertson <ti...@gmail.com>
> wrote:
> >
> > Thanks Ted, Huaxiang
> >
> > I'll move this to a Cloudera forum and comment back here if it appears
> > unrelated.
> >
> > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com
> <ma...@cloudera.com>> wrote:
> >
> >> By the way, I forgot the forum link: http://community.cloudera.com <
> http://community.cloudera.com/> <
> >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> >>
> >> Thanks,
> >> Huaxiang
> >>
> >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com <mailto:
> hsun@cloudera.com>> wrote:
> >>>
> >>> Hi Tim,
> >>>
> >>>   I believe that it runs into an issue which is specific to cloudera
> >> release we fixed recently. For details, could you discuss it in cdh
> forum?
> >>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:
> hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
> >> can explain more there.
> >>>
> >>>   Thanks,
> >>>   Huaxiang
> >>>
> >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> yuzhihong@gmail.com> <mailto:
> >> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
> >>>>
> >>>> Have you looked at HBASE-16578 ?
> >>>>
> >>>> Cheers
> >>>>
> >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> timrobertson100@gmail.com <ma...@gmail.com>
> >> <mailto:timrobertson100@gmail.com <ma...@gmail.com>>>
> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>> [Had a quick chat with Lars G. about this and before opening a Jira I
> >>>>> thought I'd raise it here first]
> >>>>>
> >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> >>>>>
> >>>>> Before I dig into this further, I'd like to just ask if anyone has
> seen
> >>>>> this before?
> >>>>>
> >>>>> The initial state was a table (tim_test) built with MOB support and a
> >> few
> >>>>> 10's million rows and 10's billions of cells.
> >>>>>
> >>>>> I wanted to rename the table to get this into production and did so
> as
> >>>>> follows:
> >>>>>
> >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> >>>>>
> >>>>> At this stage the application all looked good, and so I continued
> with:
> >>>>>
> >>>>> delete_snapshot 'tim_test-snapshot'
> >>>>> disable 'tim_test'
> >>>>> drop ‘tim_test’
> >>>>>
> >>>>> Then things went... awry and data just started dropping out in the
> app.
> >>>>> Before long, all MOB data seemingly is gone.
> >>>>>
> >>>>> The references in the new table MOB folder appear to point to the
> >> source
> >>>>> table (e.g.
> >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f
> >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> 8ae6318dfba2).
> >>>>>
> >>>>> The RS logs full of ERROR like:
> >>>>>
> >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> >> regionserver.HStore:
> >>>>> The mob file
> >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> >> bfa2ddd66b48
> >>>>> could not be found in the locations
> >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 <hdfs://ha-nn/hbase/mobdir/
> <hdfs://ha-nn/hbase/mobdir/>
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> <hdfs://ha-nn/hbase/archive/
> >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> >>>>>
> >>>>> What I don't know is:
> >>>>> 1) was this running a background task to copy the MOB data when the
> >>>>> snapshot was cloned and I just deleted the source before the copy was
> >>>>> complete?
> >>>>> - or
> >>>>> 2) when running "snapshot and clone" it just references the source
> MOB
> >>>>> data until a (?) change?
> >>>>> 3) snapshot and clone just doesn't support MOB?
> >>>>>
> >>>>> Can anyone shed some light on this easily before I dig into it
> please?
> >>>>>
> >>>>> While this situation exists (at least in 1.0.0) might it be good to
> get
> >>>>> info about data loss for MOB tables into the snapshot clone docs?
> >>>>>
> >>>>> Thanks,
> >>>>> Tim
>
>

Re: Data loss in MOB snapshot and clone?

Posted by Huaxiang Sun <hs...@cloudera.com>.
Hi Tim,,

Just read more details, it may not be related with the issue we fixed (mob compaction related).
I am doing a similar test to see if I can reproduce it.

Thanks,
Huaxiang
> On Oct 12, 2016, at 10:29 AM, Tim Robertson <ti...@gmail.com> wrote:
> 
> Thanks Ted, Huaxiang
> 
> I'll move this to a Cloudera forum and comment back here if it appears
> unrelated.
> 
> On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hsun@cloudera.com <ma...@cloudera.com>> wrote:
> 
>> By the way, I forgot the forum link: http://community.cloudera.com <http://community.cloudera.com/> <
>> http://community.cloudera.com/ <http://community.cloudera.com/>>
>> 
>> Thanks,
>> Huaxiang
>> 
>>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hsun@cloudera.com <ma...@cloudera.com>> wrote:
>>> 
>>> Hi Tim,
>>> 
>>>   I believe that it runs into an issue which is specific to cloudera
>> release we fixed recently. For details, could you discuss it in cdh forum?
>>> Copy me(hsun@cloudera.com <ma...@cloudera.com> <mailto:hsun@cloudera.com <ma...@cloudera.com>>) in the forum so I
>> can explain more there.
>>> 
>>>   Thanks,
>>>   Huaxiang
>>> 
>>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <ma...@gmail.com> <mailto:
>> yuzhihong@gmail.com <ma...@gmail.com>>> wrote:
>>>> 
>>>> Have you looked at HBASE-16578 ?
>>>> 
>>>> Cheers
>>>> 
>>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <timrobertson100@gmail.com <ma...@gmail.com>
>> <mailto:timrobertson100@gmail.com <ma...@gmail.com>>> wrote:
>>>>> 
>>>>> Hi devs,
>>>>> [Had a quick chat with Lars G. about this and before opening a Jira I
>>>>> thought I'd raise it here first]
>>>>> 
>>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
>>>>> 
>>>>> Before I dig into this further, I'd like to just ask if anyone has seen
>>>>> this before?
>>>>> 
>>>>> The initial state was a table (tim_test) built with MOB support and a
>> few
>>>>> 10's million rows and 10's billions of cells.
>>>>> 
>>>>> I wanted to rename the table to get this into production and did so as
>>>>> follows:
>>>>> 
>>>>> snapshot 'tim_test', 'tim_test-snapshot'
>>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
>>>>> 
>>>>> At this stage the application all looked good, and so I continued with:
>>>>> 
>>>>> delete_snapshot 'tim_test-snapshot'
>>>>> disable 'tim_test'
>>>>> drop ‘tim_test’
>>>>> 
>>>>> Then things went... awry and data just started dropping out in the app.
>>>>> Before long, all MOB data seemingly is gone.
>>>>> 
>>>>> The references in the new table MOB folder appear to point to the
>> source
>>>>> table (e.g.
>>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f
>> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
>> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).
>>>>> 
>>>>> The RS logs full of ERROR like:
>>>>> 
>>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
>> regionserver.HStore:
>>>>> The mob file
>>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
>> bfa2ddd66b48
>>>>> could not be found in the locations
>>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
>> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 <hdfs://ha-nn/hbase/mobdir/ <hdfs://ha-nn/hbase/mobdir/>
>> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
>>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/ <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
>> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326] <hdfs://ha-nn/hbase/archive/
>> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
>>>>> 
>>>>> What I don't know is:
>>>>> 1) was this running a background task to copy the MOB data when the
>>>>> snapshot was cloned and I just deleted the source before the copy was
>>>>> complete?
>>>>> - or
>>>>> 2) when running "snapshot and clone" it just references the source MOB
>>>>> data until a (?) change?
>>>>> 3) snapshot and clone just doesn't support MOB?
>>>>> 
>>>>> Can anyone shed some light on this easily before I dig into it please?
>>>>> 
>>>>> While this situation exists (at least in 1.0.0) might it be good to get
>>>>> info about data loss for MOB tables into the snapshot clone docs?
>>>>> 
>>>>> Thanks,
>>>>> Tim


Re: Data loss in MOB snapshot and clone?

Posted by Tim Robertson <ti...@gmail.com>.
Thanks Ted, Huaxiang

I'll move this to a Cloudera forum and comment back here if it appears
unrelated.

On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <hs...@cloudera.com> wrote:

> By the way, I forgot the forum link: http://community.cloudera.com <
> http://community.cloudera.com/>
>
> Thanks,
> Huaxiang
>
> > On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hs...@cloudera.com> wrote:
> >
> > Hi Tim,
> >
> >    I believe that it runs into an issue which is specific to cloudera
> release we fixed recently. For details, could you discuss it in cdh forum?
> > Copy me(hsun@cloudera.com <ma...@cloudera.com>) in the forum so I
> can explain more there.
> >
> >    Thanks,
> >    Huaxiang
> >
> >> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <mailto:
> yuzhihong@gmail.com>> wrote:
> >>
> >> Have you looked at HBASE-16578 ?
> >>
> >> Cheers
> >>
> >>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <timrobertson100@gmail.com
> <ma...@gmail.com>> wrote:
> >>>
> >>> Hi devs,
> >>> [Had a quick chat with Lars G. about this and before opening a Jira I
> >>> thought I'd raise it here first]
> >>>
> >>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> >>>
> >>> Before I dig into this further, I'd like to just ask if anyone has seen
> >>> this before?
> >>>
> >>> The initial state was a table (tim_test) built with MOB support and a
> few
> >>> 10's million rows and 10's billions of cells.
> >>>
> >>> I wanted to rename the table to get this into production and did so as
> >>> follows:
> >>>
> >>> snapshot 'tim_test', 'tim_test-snapshot'
> >>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> >>>
> >>> At this stage the application all looked good, and so I continued with:
> >>>
> >>> delete_snapshot 'tim_test-snapshot'
> >>> disable 'tim_test'
> >>> drop ‘tim_test’
> >>>
> >>> Then things went... awry and data just started dropping out in the app.
> >>> Before long, all MOB data seemingly is gone.
> >>>
> >>> The references in the new table MOB folder appear to point to the
> source
> >>> table (e.g.
> >>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f
> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).
> >>>
> >>> The RS logs full of ERROR like:
> >>>
> >>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> regionserver.HStore:
> >>> The mob file
> >>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> bfa2ddd66b48
> >>> could not be found in the locations
> >>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 <hdfs://ha-nn/hbase/mobdir/
> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> >>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326] <hdfs://ha-nn/hbase/archive/
> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> >>>
> >>> What I don't know is:
> >>> 1) was this running a background task to copy the MOB data when the
> >>> snapshot was cloned and I just deleted the source before the copy was
> >>> complete?
> >>> - or
> >>> 2) when running "snapshot and clone" it just references the source MOB
> >>> data until a (?) change?
> >>> 3) snapshot and clone just doesn't support MOB?
> >>>
> >>> Can anyone shed some light on this easily before I dig into it please?
> >>>
> >>> While this situation exists (at least in 1.0.0) might it be good to get
> >>> info about data loss for MOB tables into the snapshot clone docs?
> >>>
> >>> Thanks,
> >>> Tim
> >
>
>

Re: Data loss in MOB snapshot and clone?

Posted by Huaxiang Sun <hs...@cloudera.com>.
By the way, I forgot the forum link: http://community.cloudera.com <http://community.cloudera.com/>

Thanks,
Huaxiang

> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <hs...@cloudera.com> wrote:
> 
> Hi Tim,
> 
>    I believe that it runs into an issue which is specific to cloudera release we fixed recently. For details, could you discuss it in cdh forum?
> Copy me(hsun@cloudera.com <ma...@cloudera.com>) in the forum so I can explain more there.
> 
>    Thanks,
>    Huaxiang
> 
>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhihong@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Have you looked at HBASE-16578 ?
>> 
>> Cheers
>> 
>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <timrobertson100@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi devs,
>>> [Had a quick chat with Lars G. about this and before opening a Jira I
>>> thought I'd raise it here first]
>>> 
>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
>>> 
>>> Before I dig into this further, I'd like to just ask if anyone has seen
>>> this before?
>>> 
>>> The initial state was a table (tim_test) built with MOB support and a few
>>> 10's million rows and 10's billions of cells.
>>> 
>>> I wanted to rename the table to get this into production and did so as
>>> follows:
>>> 
>>> snapshot 'tim_test', 'tim_test-snapshot'
>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
>>> 
>>> At this stage the application all looked good, and so I continued with:
>>> 
>>> delete_snapshot 'tim_test-snapshot'
>>> disable 'tim_test'
>>> drop ‘tim_test’
>>> 
>>> Then things went... awry and data just started dropping out in the app.
>>> Before long, all MOB data seemingly is gone.
>>> 
>>> The references in the new table MOB folder appear to point to the source
>>> table (e.g.
>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).
>>> 
>>> The RS logs full of ERROR like:
>>> 
>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.regionserver.HStore:
>>> The mob file
>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79ebfa2ddd66b48
>>> could not be found in the locations
>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 <hdfs://ha-nn/hbase/mobdir/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326] <hdfs://ha-nn/hbase/archive/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
>>> 
>>> What I don't know is:
>>> 1) was this running a background task to copy the MOB data when the
>>> snapshot was cloned and I just deleted the source before the copy was
>>> complete?
>>> - or
>>> 2) when running "snapshot and clone" it just references the source MOB
>>> data until a (?) change?
>>> 3) snapshot and clone just doesn't support MOB?
>>> 
>>> Can anyone shed some light on this easily before I dig into it please?
>>> 
>>> While this situation exists (at least in 1.0.0) might it be good to get
>>> info about data loss for MOB tables into the snapshot clone docs?
>>> 
>>> Thanks,
>>> Tim
> 


Re: Data loss in MOB snapshot and clone?

Posted by Huaxiang Sun <hs...@cloudera.com>.
Hi Tim,

   I believe that it runs into an issue which is specific to cloudera release we fixed recently. For details, could you discuss it in cdh forum?
Copy me(hsun@cloudera.com <ma...@cloudera.com>) in the forum so I can explain more there.

   Thanks,
   Huaxiang

> On Oct 12, 2016, at 8:13 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> Have you looked at HBASE-16578 ?
> 
> Cheers
> 
>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <ti...@gmail.com> wrote:
>> 
>> Hi devs,
>> [Had a quick chat with Lars G. about this and before opening a Jira I
>> thought I'd raise it here first]
>> 
>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
>> 
>> Before I dig into this further, I'd like to just ask if anyone has seen
>> this before?
>> 
>> The initial state was a table (tim_test) built with MOB support and a few
>> 10's million rows and 10's billions of cells.
>> 
>> I wanted to rename the table to get this into production and did so as
>> follows:
>> 
>> snapshot 'tim_test', 'tim_test-snapshot'
>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
>> 
>> At this stage the application all looked good, and so I continued with:
>> 
>> delete_snapshot 'tim_test-snapshot'
>> disable 'tim_test'
>> drop ‘tim_test’
>> 
>> Then things went... awry and data just started dropping out in the app.
>> Before long, all MOB data seemingly is gone.
>> 
>> The references in the new table MOB folder appear to point to the source
>> table (e.g.
>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).
>> 
>> The RS logs full of ERROR like:
>> 
>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.regionserver.HStore:
>> The mob file
>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79ebfa2ddd66b48
>> could not be found in the locations
>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326,
>> hdfs://ha-nn/hbase/archive/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
>> 
>> What I don't know is:
>> 1) was this running a background task to copy the MOB data when the
>> snapshot was cloned and I just deleted the source before the copy was
>> complete?
>> - or
>> 2) when running "snapshot and clone" it just references the source MOB
>> data until a (?) change?
>> 3) snapshot and clone just doesn't support MOB?
>> 
>> Can anyone shed some light on this easily before I dig into it please?
>> 
>> While this situation exists (at least in 1.0.0) might it be good to get
>> info about data loss for MOB tables into the snapshot clone docs?
>> 
>> Thanks,
>> Tim


Re: Data loss in MOB snapshot and clone?

Posted by Ted Yu <yu...@gmail.com>.
Have you looked at HBASE-16578 ?

Cheers

> On Oct 12, 2016, at 8:02 AM, Tim Robertson <ti...@gmail.com> wrote:
> 
> Hi devs,
> [Had a quick chat with Lars G. about this and before opening a Jira I
> thought I'd raise it here first]
> 
> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> 
> Before I dig into this further, I'd like to just ask if anyone has seen
> this before?
> 
> The initial state was a table (tim_test) built with MOB support and a few
> 10's million rows and 10's billions of cells.
> 
> I wanted to rename the table to get this into production and did so as
> follows:
> 
>  snapshot 'tim_test', 'tim_test-snapshot'
>  clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> 
> At this stage the application all looked good, and so I continued with:
> 
>  delete_snapshot 'tim_test-snapshot'
>  disable 'tim_test'
>  drop ‘tim_test’
> 
> Then things went... awry and data just started dropping out in the app.
> Before long, all MOB data seemingly is gone.
> 
> The references in the new table MOB folder appear to point to the source
> table (e.g.
> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed2f5f2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe8ae6318dfba2).
> 
> The RS logs full of ERROR like:
> 
> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.regionserver.HStore:
> The mob file
> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79ebfa2ddd66b48
> could not be found in the locations
> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326,
> hdfs://ha-nn/hbase/archive/data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> 
> What I don't know is:
> 1) was this running a background task to copy the MOB data when the
> snapshot was cloned and I just deleted the source before the copy was
> complete?
> - or
> 2) when running "snapshot and clone" it just references the source MOB
> data until a (?) change?
> 3) snapshot and clone just doesn't support MOB?
> 
> Can anyone shed some light on this easily before I dig into it please?
> 
> While this situation exists (at least in 1.0.0) might it be good to get
> info about data loss for MOB tables into the snapshot clone docs?
> 
> Thanks,
> Tim