You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/02/23 23:32:02 UTC

[jira] Created: (HBASE-1212) merge tool expects regions all have different sequence ids

merge tool expects regions all have different sequence ids
----------------------------------------------------------

                 Key: HBASE-1212
                 URL: https://issues.apache.org/jira/browse/HBASE-1212
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


Currently merging two regions, the merge tool will compare their sequence ids.  If same, it will decrement one.  It needs to do this because on region open, files are keyed by their sequenceid; if two the same, one will erase the other.

Well, with the move to the aggregating hfile format, the sequenceid is written when the file is created and its no longer written into an aside file but as metadata on to the end of the file.  Changing the sequenceid is no longer an option.

This issue is about figuring a solution for the rare case where two store files have same sequence id AND we want to merge the two regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1212) merge tool expects regions all have different sequence ids

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1212:
-------------------------

    Fix Version/s: 0.20.0

Moving to 0.20.0

> merge tool expects regions all have different sequence ids
> ----------------------------------------------------------
>
>                 Key: HBASE-1212
>                 URL: https://issues.apache.org/jira/browse/HBASE-1212
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.0
>
>
> Currently merging two regions, the merge tool will compare their sequence ids.  If same, it will decrement one.  It needs to do this because on region open, files are keyed by their sequenceid; if two the same, one will erase the other.
> Well, with the move to the aggregating hfile format, the sequenceid is written when the file is created and its no longer written into an aside file but as metadata on to the end of the file.  Changing the sequenceid is no longer an option.
> This issue is about figuring a solution for the rare case where two store files have same sequence id AND we want to merge the two regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1212) merge tool expects regions all have different sequence ids

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1212:
-------------------------

    Fix Version/s:     (was: 0.20.0)

Thinking on it, this event should be extremely rare.  Sequence ids are monotonically increasing in a running regionserver.  Across a cluster, two files of the same family would have to end with same sequenceid.  Then whats the likelihood that of all regions on cluster these are the two to merge (Merge is a little-used tool to date).

To fix, would need to look at the content of the two files and make a judgement as to which should come before the other -- which has the most recent edits.  Maybe we could do something basic like let the file with the largest size prevail over the smaller.  Once we'd figure which file to bring to the fore, we need to rewrite the hfile so we can change the sequence id.  Since we're rewriting one of the files at least, might as well compact them.

We could move to modification times.  That should simplify this sequenceid story.  It wouldn't remove this issue.  We'd still have to figure which store file to favor if two happened to have same mod time.

In bigtable, chubby owns the storefiles/sstables.  Maybe thats where we should go so we don't have sequenceids anymore?

Moving out of 0.20.0 because this issue rare and amount of work to address is large.



> merge tool expects regions all have different sequence ids
> ----------------------------------------------------------
>
>                 Key: HBASE-1212
>                 URL: https://issues.apache.org/jira/browse/HBASE-1212
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Currently merging two regions, the merge tool will compare their sequence ids.  If same, it will decrement one.  It needs to do this because on region open, files are keyed by their sequenceid; if two the same, one will erase the other.
> Well, with the move to the aggregating hfile format, the sequenceid is written when the file is created and its no longer written into an aside file but as metadata on to the end of the file.  Changing the sequenceid is no longer an option.
> This issue is about figuring a solution for the rare case where two store files have same sequence id AND we want to merge the two regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1212) merge tool expects regions all have different sequence ids

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701645#action_12701645 ] 

stack commented on HBASE-1212:
------------------------------

I like the jgray idea that we use modtime instead of an edit number.

> merge tool expects regions all have different sequence ids
> ----------------------------------------------------------
>
>                 Key: HBASE-1212
>                 URL: https://issues.apache.org/jira/browse/HBASE-1212
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.0
>
>
> Currently merging two regions, the merge tool will compare their sequence ids.  If same, it will decrement one.  It needs to do this because on region open, files are keyed by their sequenceid; if two the same, one will erase the other.
> Well, with the move to the aggregating hfile format, the sequenceid is written when the file is created and its no longer written into an aside file but as metadata on to the end of the file.  Changing the sequenceid is no longer an option.
> This issue is about figuring a solution for the rare case where two store files have same sequence id AND we want to merge the two regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1212) merge tool expects regions all have different sequence ids

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688552#action_12688552 ] 

Nitay Joffe commented on HBASE-1212:
------------------------------------

As I noted on HBASE-1274, we can try using the file names to do the sequencing:

bq. What about using something that sorts lexicographically and sorting the dir.listFiles we use when grabbing the store files? Then when we move around two files with the same sequence ids to do a merge we can rename the actual files.

> merge tool expects regions all have different sequence ids
> ----------------------------------------------------------
>
>                 Key: HBASE-1212
>                 URL: https://issues.apache.org/jira/browse/HBASE-1212
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Currently merging two regions, the merge tool will compare their sequence ids.  If same, it will decrement one.  It needs to do this because on region open, files are keyed by their sequenceid; if two the same, one will erase the other.
> Well, with the move to the aggregating hfile format, the sequenceid is written when the file is created and its no longer written into an aside file but as metadata on to the end of the file.  Changing the sequenceid is no longer an option.
> This issue is about figuring a solution for the rare case where two store files have same sequence id AND we want to merge the two regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.