You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Liyin Tang (JIRA)" <ji...@apache.org> on 2011/03/14 19:33:29 UTC
[jira] Created: (HBASE-3636) a bug about deciding whether this key
is a new key for the ROWCOL bloomfilter
a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
-----------------------------------------------------------------------------
Key: HBASE-3636
URL: https://issues.apache.org/jira/browse/HBASE-3636
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Liyin Tang
When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3636) a bug about deciding whether this
key is a new key for the ROWCOL bloomfilter
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006580#comment-13006580 ]
stack commented on HBASE-3636:
------------------------------
Do you have patch please Liyin Tang? Thanks.
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3636) a bug about deciding whether this key
is a new key for the ROWCOL bloomfilter
Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liyin Tang updated HBASE-3636:
------------------------------
Attachment: HBASE_3636[r1081520].patch
Fix the problem by changing the function compareRowColumn.
Add a unit test to verify it.
Please review.
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3636) a bug about deciding whether this
key is a new key for the ROWCOL bloomfilter
Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006633#comment-13006633 ]
Nicolas Spiegelberg commented on HBASE-3636:
--------------------------------------------
+1. good job Liyin!
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3636) a bug about deciding whether this
key is a new key for the ROWCOL bloomfilter
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007868#comment-13007868 ]
Hudson commented on HBASE-3636:
-------------------------------
Integrated in HBase-TRUNK #1792 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1792/])
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Fix For: 0.90.2
>
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HBASE-3636) a bug about deciding whether this key
is a new key for the ROWCOL bloomfilter
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-3636.
--------------------------
Resolution: Fixed
Fix Version/s: 0.90.2
Hadoop Flags: [Reviewed]
Committed to branch and trunk. Thank you for the patch Liyin.
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Fix For: 0.90.2
>
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3636) a bug about deciding whether this key
is a new key for the ROWCOL bloomfilter
Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Spiegelberg updated HBASE-3636:
---------------------------------------
Comment: was deleted
(was: duplicate of HBASE-3)
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Fix For: 0.90.2
>
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Closed: (HBASE-3636) a bug about deciding whether this key
is a new key for the ROWCOL bloomfilter
Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Spiegelberg closed HBASE-3636.
--------------------------------------
duplicate of HBASE-3
> a bug about deciding whether this key is a new key for the ROWCOL bloomfilter
> -----------------------------------------------------------------------------
>
> Key: HBASE-3636
> URL: https://issues.apache.org/jira/browse/HBASE-3636
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Liyin Tang
> Fix For: 0.90.2
>
> Attachments: HBASE_3636[r1081520].patch
>
>
> When ROWCOL bloomfilter needs to decide whether this key is a new key or not,
> it will call the matchingRowColumn function, which will compare the timestamp offset between this kv and last kv.
> But when checking the timestamp offset, it didn't deduct the original offset of the keyvalue itself.
> For example, when 2 keyvalue objects have the same row key and col key, but from different storefiles. It is highly likely that these 2 keyvalue objects have different offset value. So the timestamp offset of these 2 objects are totally different. They will be regard as new keys to add into bloomfilters.
> So after compaction, the key count of bloomfilter will increase immediately, which is almost equal to the number of entries.
> The solution is straightforward. Just compare the relevant timestamp offset, which is the timestamp offset - key_value offset.
> This also may explain this jira: https://issues.apache.org/jira/browse/HBASE-3007
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira