You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by pa...@aim.com on 2012/07/05 15:52:57 UTC

Recovering Tables from HDFS



users@accumulo,


I need help understanding if one could recover or backup tables by taking their files stored in HDFS and reattaching them to tablet servers, even though this would mean the loss of information from recent mutations and write ahead logs. The documentation on recovery is focused on the failure of a tablet server, but, in the event of a failure of the master or other situation where the tablet servers cannot be utilized, it would be beneficial to know whether the files in HDFS can be used for recovery.


Thanks,


Patrick Lynch
 
 

Re: Recovering Tables from HDFS

Posted by Patrick Lynch <pa...@aim.com>.
Keith and Adam,


Your solution worked, thank you both for your help. Everyone on my end working with Accumulo wants to say how impressed they are with it overall!


Keep up the good work,
Patrick



-----Original Message-----
From: Keith Turner <ke...@deenlo.com>
To: user <us...@accumulo.apache.org>
Sent: Thu, Jul 5, 2012 10:30 am
Subject: Re: Recovering Tables from HDFS


Patrick,

In the comments of ACCUMULO-456, I outlined out a procedure for doing
this for 1.4.

By default cloning a table will flush anything in memory.

Keith

On Thu, Jul 5, 2012 at 10:13 AM, Adam Fuchs <af...@apache.org> wrote:
> Hi Patrick,
>
> The short answer is yes, but there are a few caveats:
> 1. As you said, information that is sitting in the in-memory map and in the
> write-ahead log will not be in those files. You can periodically call flush
> (Connector.getTableOperations().flush(...)) to guarantee that your data has
> made it into the RFiles.
> 2. Old data that has been deleted may reappear. RFiles can span multiple
> tablets, which happens when tablets split. Often, one of the tablets
> compacts, getting rid of delete keys. However, the file that holds the
> original data is still in HDFS because it is referenced by another tablet
> (or because it has not yet been garbage collected). If you're using Accumulo
> in an append-only fashion, then this will not be a problem.
> 3. For the same reasons as #2, if you're doing any aggregation you might run
> into counts being incorrect.
>
> You might also check out the table cloning feature introduced in 1.4 as a
> means for backing up a table:
> http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables
>
> Cheers,
> Adam
>
>
> On Thu, Jul 5, 2012 at 9:52 AM, <pa...@aim.com> wrote:
>>
>> users@accumulo,
>>
>> I need help understanding if one could recover or backup tables by taking
>> their files stored in HDFS and reattaching them to tablet servers, even
>> though this would mean the loss of information from recent mutations and
>> write ahead logs. The documentation on recovery is focused on the failure of
>> a tablet server, but, in the event of a failure of the master or other
>> situation where the tablet servers cannot be utilized, it would be
>> beneficial to know whether the files in HDFS can be used for recovery.
>>
>> Thanks,
>>
>> Patrick Lynch
>
>

 

Re: Recovering Tables from HDFS

Posted by Keith Turner <ke...@deenlo.com>.
Patrick,

In the comments of ACCUMULO-456, I outlined out a procedure for doing
this for 1.4.

By default cloning a table will flush anything in memory.

Keith

On Thu, Jul 5, 2012 at 10:13 AM, Adam Fuchs <af...@apache.org> wrote:
> Hi Patrick,
>
> The short answer is yes, but there are a few caveats:
> 1. As you said, information that is sitting in the in-memory map and in the
> write-ahead log will not be in those files. You can periodically call flush
> (Connector.getTableOperations().flush(...)) to guarantee that your data has
> made it into the RFiles.
> 2. Old data that has been deleted may reappear. RFiles can span multiple
> tablets, which happens when tablets split. Often, one of the tablets
> compacts, getting rid of delete keys. However, the file that holds the
> original data is still in HDFS because it is referenced by another tablet
> (or because it has not yet been garbage collected). If you're using Accumulo
> in an append-only fashion, then this will not be a problem.
> 3. For the same reasons as #2, if you're doing any aggregation you might run
> into counts being incorrect.
>
> You might also check out the table cloning feature introduced in 1.4 as a
> means for backing up a table:
> http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables
>
> Cheers,
> Adam
>
>
> On Thu, Jul 5, 2012 at 9:52 AM, <pa...@aim.com> wrote:
>>
>> users@accumulo,
>>
>> I need help understanding if one could recover or backup tables by taking
>> their files stored in HDFS and reattaching them to tablet servers, even
>> though this would mean the loss of information from recent mutations and
>> write ahead logs. The documentation on recovery is focused on the failure of
>> a tablet server, but, in the event of a failure of the master or other
>> situation where the tablet servers cannot be utilized, it would be
>> beneficial to know whether the files in HDFS can be used for recovery.
>>
>> Thanks,
>>
>> Patrick Lynch
>
>

Re: Recovering Tables from HDFS

Posted by pa...@aim.com.
Adam, 


Thanks for the quick response. So, now that I understand the caveats, what I would like to know is how this would be done.


Patrick




-----Original Message-----
From: Adam Fuchs <af...@apache.org>
To: user <us...@accumulo.apache.org>
Sent: Thu, Jul 5, 2012 10:13 am
Subject: Re: Recovering Tables from HDFS


Hi Patrick,


The short answer is yes, but there are a few caveats:
1. As you said, information that is sitting in the in-memory map and in the write-ahead log will not be in those files. You can periodically call flush (Connector.getTableOperations().flush(...)) to guarantee that your data has made it into the RFiles.
2. Old data that has been deleted may reappear. RFiles can span multiple tablets, which happens when tablets split. Often, one of the tablets compacts, getting rid of delete keys. However, the file that holds the original data is still in HDFS because it is referenced by another tablet (or because it has not yet been garbage collected). If you're using Accumulo in an append-only fashion, then this will not be a problem.
3. For the same reasons as #2, if you're doing any aggregation you might run into counts being incorrect.


You might also check out the table cloning feature introduced in 1.4 as a means for backing up a table: http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables


Cheers,
Adam



On Thu, Jul 5, 2012 at 9:52 AM,  <pa...@aim.com> wrote:



users@accumulo,


I need help understanding if one could recover or backup tables by taking their files stored in HDFS and reattaching them to tablet servers, even though this would mean the loss of information from recent mutations and write ahead logs. The documentation on recovery is focused on the failure of a tablet server, but, in the event of a failure of the master or other situation where the tablet servers cannot be utilized, it would be beneficial to know whether the files in HDFS can be used for recovery.


Thanks,


Patrick Lynch
 
 



 


Re: Recovering Tables from HDFS

Posted by Adam Fuchs <af...@apache.org>.
Hi Patrick,

The short answer is yes, but there are a few caveats:
1. As you said, information that is sitting in the in-memory map and in the
write-ahead log will not be in those files. You can periodically call flush
(Connector.getTableOperations().flush(...)) to guarantee that your data has
made it into the RFiles.
2. Old data that has been deleted may reappear. RFiles can span multiple
tablets, which happens when tablets split. Often, one of the tablets
compacts, getting rid of delete keys. However, the file that holds the
original data is still in HDFS because it is referenced by another tablet
(or because it has not yet been garbage collected). If you're using
Accumulo in an append-only fashion, then this will not be a problem.
3. For the same reasons as #2, if you're doing any aggregation you might
run into counts being incorrect.

You might also check out the table cloning feature introduced in 1.4 as a
means for backing up a table:
http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables

Cheers,
Adam


On Thu, Jul 5, 2012 at 9:52 AM, <pa...@aim.com> wrote:

>    users@accumulo,
>
>  I need help understanding if one could recover or backup tables by
> taking their files stored in HDFS and reattaching them to tablet servers,
> even though this would mean the loss of information from recent mutations
> and write ahead logs. The documentation on recovery is focused on the
> failure of a tablet server, but, in the event of a failure of the master or
> other situation where the tablet servers cannot be utilized, it would be
> beneficial to know whether the files in HDFS can be used for recovery.
>
>  Thanks,
>
>  Patrick Lynch
>