You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2010/02/25 07:48:27 UTC

[jira] Created: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

HFile and Memstore should maintain minimum and maximum timestamps
-----------------------------------------------------------------

                 Key: HBASE-2265
                 URL: https://issues.apache.org/jira/browse/HBASE-2265
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: Todd Lipcon


In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore track their maximum and minimum timestamps. This has the following nice properties:

- for a straight Get, if an entry has been already been found with timestamp X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior for those who sometimes write out-of-order.
- for a scan, the "latest timestamp" of the storage can be used to decide which cell wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps, instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
- in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838251#action_12838251 ] 

ryan rawson commented on HBASE-2265:
------------------------------------

I'm not sure this will help make gets better, there are 2 get cases:

- get a single column for a row.  In this case, if timestamps are written out of order, we dont know which hfile to start with.  Lets say we start with the 'newest' one, and it has TS[1], well is the fact that an older file start < TS[1] < end mean we should consult this file?  I suppose if end < TS[1] (thus the timestamp gotten is newer than the keyvalue we already got), we'd know there is nothing newer and we could conclusively rule that file out.  If TS[1] was < beginning of a file, we'd have to consider the file.  With a big spread of timestamps and keys, we wouldnt get much of an optimization.

- for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result.  A bloom filter would help here.

As for the scan, we already know which files are 'newer'.  However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other.  We might be able to cause/create an invariant that during compaction the 'newer' one comes first. The compaction might be able to help straighten this out, since i think we do minor compactions 'in order', with older files first. Seems like a tricky bit. 

Generally the ideal solution would involve no change to the KeyValue serialization format (and hence possibly requiring a store-file rewrite).

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps, instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838210#action_12838210 ] 

Todd Lipcon commented on HBASE-2265:
------------------------------------

It may actually be sufficient to just store the max timestamp and not the min. I haven't really thought of a great use for min.

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps, instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838382#action_12838382 ] 

Todd Lipcon commented on HBASE-2265:
------------------------------------

bq. With a big spread of timestamps and keys, we wouldnt get much of an optimization

Exactly. If users are writing out of order, they cannot take advantage of the optimization of culling older storage. As you mentioned, bloom filters help here. For users who are writing in order, the performance should be identical today. I think this is exactly what we want.

bq. for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result. A bloom filter would help here

Yep, and this is exactly what I would expect. Why should a column family get _not_ touch all of the files?

bq. However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other. We might be able to cause/create an invariant that during compaction the 'newer' one comes first

It's probably worth getting consensus, but I think it would be acceptable behavior to only retain the keyval from the newest storage when the timestamps are equal. That is, if I write A:ts=1, B:ts=2, C:ts=3, D:ts=3, E:ts=3, and want to retain "latest 3", I'd end up getting writes A, B, and E.

bq. Generally the ideal solution would involve no change to the KeyValue serialization format

I agree, and I think this can be done using only the existing metadata fields without any change per-keyvalue.

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore track their maximum and minimum timestamps. This has the following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior for those who sometimes write out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide which cell wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps, instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.