You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Gary Helmling (JIRA)" <ji...@apache.org> on 2009/10/30 23:18:01 UTC

[jira] Created: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

KeyValue expiration by Time-to-Live during major compaction is broken
---------------------------------------------------------------------

                 Key: HBASE-1949
                 URL: https://issues.apache.org/jira/browse/HBASE-1949
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.20.1
            Reporter: Gary Helmling


During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).

The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: HBASE-1949-trunk.patch

Same patch for ScanQueryMatcher.match() return value and extra unit test, applied against current trunk.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-1949-trunk.patch, ttl_expire-0.20.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Fix Version/s: 0.21.0
                   0.20.2
     Release Note: Fixed expiring of individual column values within rows via the column family time-to-live configuration.  Previously all column values following the first expired value in a row would be truncated.  Though in practice this might only be seen when lowering the TTL configuration on a column family with existing data.
           Status: Patch Available  (was: Open)

I've been running the "v2" patch (applied against the 0.20.1 release) in my development setup for a couple days with correct expiration behavior.  I should have the patched version deployed and tested against my live data early next week, at which point I should be able to completely verify the fix.

This could definitely use a good review by someone more familiar with the compaction process.  The actual code changes are very minor and the new and existing tests all pass.  But the changes eliminate early exits from the KeyValue iteration on rows in two places, so it would be good to assess any performance impact from the change.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment:     (was: ttl_expire-0.20.patch)

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-0.20.patch, HBASE-1949-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment:     (was: HBASE-0.20.patch)

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray reassigned HBASE-1949:
------------------------------------

    Assignee: Gary Helmling

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1949:
-------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Thanks for the patch Gary.  Applied branch and trunk.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: HBASE-1949-0.20.patch

Okay, one more time, actually including the issue number in the filename.  JIRA really needs a "rename file" feature.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774034#action_12774034 ] 

Jonathan Gray commented on HBASE-1949:
--------------------------------------

Reviewed patch.  Looks great.  Thanks Gary.

+1 for commit

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: HBASE-1949-v2-trunk.patch

This is an updated version of the patch against trunk, which adds the fix for QueryMatcher.match() and a couple extra tests in TestQueryMatcher to confirm the fix.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: HBASE-0.20.patch

This patch is the same as previous ttl_expire-0.20.patch, just updated to current hbase-0.20 branch and renamed for consistency.  I'll remove the other to avoid confusion.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-0.20.patch, HBASE-1949-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: HBASE-1949-v2-0.20.patch

The previous version of this patch missed an additional case in QueryMatcher.match() -- called from ScanFileGetScan -- which would exit early on a row for get requests when the first expired KeyValue was encountered.  This would not actually remove data (like the previous occurance) but would mask existing data from this client.

This version adds a change for the QueryMatcher.match() instance to return MatchCode.SKIP instead of MatchCode.NEXT in order to keep processing any following kvs.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: HBASE-1949-0.20.patch, HBASE-1949-trunk.patch, HBASE-1949-v2-0.20.patch, HBASE-1949-v2-trunk.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1949) KeyValue expiration by Time-to-Live during major compaction is broken

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-1949:
---------------------------------

    Attachment: ttl_expire-0.20.patch

The attached patch (against the 0.20 branch) changes the ScanQueryMatcher.match() return value in this case to just seek to the next column and adds a test for this case.

> KeyValue expiration by Time-to-Live during major compaction is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-1949
>                 URL: https://issues.apache.org/jira/browse/HBASE-1949
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.1
>            Reporter: Gary Helmling
>         Attachments: ttl_expire-0.20.patch
>
>
> During a major compaction on a region in a column family with a configured TTL, it looks like all KeyValues in a row after the first expired KeyValue are skipping and thrown out of the newly written file (regardless of whether the would have been expired or not).
> The StoreScanner is skipping to the next row, even when other columns with a non-expirable timestamp exists.  Unless I'm misunderstanding it, it seems like it should just seek to the next column instead.  I discovered this when altering a table to lower the TTL for a column family and force the expiration of some data which led to the entire row being expired in some instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.