You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2012/12/12 21:04:20 UTC

[jira] [Commented] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

    [ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530267#comment-13530267 ] 

Jonathan Hsieh commented on HBASE-7339:
---------------------------------------

This was encountered when testing online snapshots, but will affect offline snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a reference if treating as link fails.  Hacky but "should" work.
2) Change the regex's used to differentiate references and hfilelinks more strict so that we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust.  Currently '<hfile>.<parentregion>', maybe chanage to '<hfile>@<parentregion>'. This would then allow '<hfile>-<region>-<table>@<parentreigon>' to be interpreted correctly.  This is the "right way" but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles if it has killed a few nodes in the cluster.
                
> Splitting a hfilelink causes region servers to go down.
> -------------------------------------------------------
>
>                 Key: HBASE-7339
>                 URL: https://issues.apache.org/jira/browse/HBASE-7339
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>            Priority: Blocker
>             Fix For: hbase-6055
>
>
> Steps:
> - Have a single region table 15 hfiles in it.
> - Snapshot it.
> - Clone a snapshot 
> - region post-open task attempts to compact region.  policy does not compact all files. (default seems to be 10)
> - after compaction we have hfile links and real hfiles mixed.
> - it starts splitting
> - creating split references, opening daughers fails 
> - hfile links are "split", creating hfile link daughter refs.  <<hfile>-<region>-<table>>.<parentregion>
> - these "split" hfile links are interpreted as hfile links with table <table>.<parentregion>
> - Since this is after the splitting PONR, this aborts the server.  It then spreads to the next server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira