You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/03/15 08:08:50 UTC

[jira] Created: (HBASE-1263) Optimize for single-version families

Optimize for single-version families
------------------------------------

                 Key: HBASE-1263
                 URL: https://issues.apache.org/jira/browse/HBASE-1263
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: regionserver
            Reporter: Jonathan Gray
             Fix For: 0.20.0


As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).

However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.

It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HBASE-1263) Optimize for single-version families

Posted by Ryan Rawson <ry...@gmail.com>.
we should measure the efficiency gains at run time and compare with the
additional code complexity.

in a JIT Hotspot, the loss of performance might not be as high as we think.

-ryan

On Sat, Mar 14, 2009 at 11:14 PM, Jonathan Gray (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682131#action_12682131]
>
> Jonathan Gray commented on HBASE-1263:
> --------------------------------------
>
> One idea would be to create a special KeyValue comparator that looked at
> row and column only and ignored timestamp.
>
> Stack, it still seems pretty clunky having different KV comparators that
> stores must be aware of.  At least lots of branchy code at the beginning.
>  We also talked about potentially allowing descending order or custom
> comparators... would there be a "simple" way to make the comparator an
> additional and optional family setting?
>
> > Optimize for single-version families
> > ------------------------------------
> >
> >                 Key: HBASE-1263
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1263
> >             Project: Hadoop HBase
> >          Issue Type: New Feature
> >          Components: regionserver
> >            Reporter: Jonathan Gray
> >             Fix For: 0.20.0
> >
> >
> > As some of us have been discussing, allowing the client to manually set
> the timestamp of a put breaks the general semantics of versioning and I'd
> like to see it removed as part of HBASE-880 (a more appropriate place to
> debate that).
> > However, one trick being used when you don't want the overhead of
> versions on a frequently updated column (which are only cleared on
> compactions even if set to 1), was to use the same timestamp.  Since that
> would create an identical key it would just overwrite the value not create a
> new version.
> > It's a very common use-case, and this hack is being used as part of the
> committed increment ops from HBASE-868/HBASE-1252.  Rather than making a
> special optimization for counters, an optimization on single-version
> families that never stores more than one version of a column.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (HBASE-1263) Optimize for single-version families

Posted by "Erik Holstad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682761#action_12682761 ] 

Erik Holstad commented on HBASE-1263:
-------------------------------------

I totally agree that we should not have a system that have the same timestamp in multiple places, that will brake the whole model
and will make earlying out impossible when we are doing that based on time. 
So if we go along with deleting entries from the memCache we could just let Delete(ts) delete itself too if it finds that version in 
memcache, doing it like that means that we don't get any overhead of multiple versions and extra deletes hanging out for no use.

> Optimize for single-version families
> ------------------------------------
>
>                 Key: HBASE-1263
>                 URL: https://issues.apache.org/jira/browse/HBASE-1263
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).
> However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.
> It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1263) Optimize for single-version families

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683636#action_12683636 ] 

Jonathan Gray commented on HBASE-1263:
--------------------------------------

Good point, stack.  I knew there was something preventing us from doing exotic comparators :)

> Optimize for single-version families
> ------------------------------------
>
>                 Key: HBASE-1263
>                 URL: https://issues.apache.org/jira/browse/HBASE-1263
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).
> However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.
> It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1263) Optimize for single-version families

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1263:
-------------------------

    Fix Version/s:     (was: 0.20.0)

Moving this out.  Non-critical feature.  If it shows up before we cut 0.20.0, we'll include it.  Moving out for now.

> Optimize for single-version families
> ------------------------------------
>
>                 Key: HBASE-1263
>                 URL: https://issues.apache.org/jira/browse/HBASE-1263
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>
> As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).
> However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.
> It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1263) Optimize for single-version families

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683609#action_12683609 ] 

stack commented on HBASE-1263:
------------------------------

We already have notion of comparator that ignores timestamps -- needed to count versions -- and a comparator that ignores type so we can see if something has been deleted (though type in the key is different).  Would be easy enough to specify comparator with no timestamp on a family.

Descending sort ain't going to happen.  Whole hbase system has to have same basic comparator else fhit hits the san when regions split and get entries in .META. (Can't have the .META. order sometimes lexographically and then at other times reverse lexicographically all in the one table).

> Optimize for single-version families
> ------------------------------------
>
>                 Key: HBASE-1263
>                 URL: https://issues.apache.org/jira/browse/HBASE-1263
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).
> However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.
> It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1263) Optimize for single-version families

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682131#action_12682131 ] 

Jonathan Gray commented on HBASE-1263:
--------------------------------------

One idea would be to create a special KeyValue comparator that looked at row and column only and ignored timestamp.

Stack, it still seems pretty clunky having different KV comparators that stores must be aware of.  At least lots of branchy code at the beginning.  We also talked about potentially allowing descending order or custom comparators... would there be a "simple" way to make the comparator an additional and optional family setting?

> Optimize for single-version families
> ------------------------------------
>
>                 Key: HBASE-1263
>                 URL: https://issues.apache.org/jira/browse/HBASE-1263
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As some of us have been discussing, allowing the client to manually set the timestamp of a put breaks the general semantics of versioning and I'd like to see it removed as part of HBASE-880 (a more appropriate place to debate that).
> However, one trick being used when you don't want the overhead of versions on a frequently updated column (which are only cleared on compactions even if set to 1), was to use the same timestamp.  Since that would create an identical key it would just overwrite the value not create a new version.
> It's a very common use-case, and this hack is being used as part of the committed increment ops from HBASE-868/HBASE-1252.  Rather than making a special optimization for counters, an optimization on single-version families that never stores more than one version of a column.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.