You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2013/01/08 15:22:12 UTC
[jira] [Commented] (ACCUMULO-942) accumulo should be more resilient
in the face of NN failures
[ https://issues.apache.org/jira/browse/ACCUMULO-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546912#comment-13546912 ]
Keith Turner commented on ACCUMULO-942:
---------------------------------------
Can export tables in 1.5. So a user could do the following in 1.5
* Clone tables (to snapshot them)
* Export tables (basically exports info in metadata table related to table)
* Copy export metadata and hdfs snapshot metadata for backup.
After doing this a user should have a consistent snapshot of accumulo and hdfs metadata.
> accumulo should be more resilient in the face of NN failures
> ------------------------------------------------------------
>
> Key: ACCUMULO-942
> URL: https://issues.apache.org/jira/browse/ACCUMULO-942
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Reporter: Eric Newton
> Assignee: Eric Newton
> Priority: Critical
>
> We experienced a NN failure on a large cluster. The edit log was written to a RAIDed file system, but it did lose data sent to the edit log. We suspect drivers making promises it did not keep.
> This left Accumulo in a slightly corrupt state: a few references to files that were missing.
> Also, we have attempted to have backup images of HDFS archived for disaster recovery. This has not been helpful because Accumulo needs a highly consistent set of metadata, and a slightly older version of the file system confuses it.
> One defense is to use snapshots. However, this works at the table level, and it is hard to coordinate with the HDFS snapshot.
> Another approach is to leave a short history of the files in the !METADATA table. The Google paper hints at keeping historical information:
> {quote}
> We also store secondary information in the
> METADATA table, including a log of all events per-
> taining to each tablet (such as when a server begins
> serving it). This information is helpful for debugging
> and performance analysis.
> {quote}
> I think it would also be helpful for disaster recovery. It may require the GC to be more sensitive to historical information about compactions.
> Alternatively, we should start looking into high-availability NNs and bookkeeper high-performance logging.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira