You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "He Yongqiang (Created) (JIRA)" <ji...@apache.org> on 2012/03/29 20:02:22 UTC

[jira] [Created] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

add support in HBase to overwrite hbase timestamp to a version number during major compaction
---------------------------------------------------------------------------------------------

                 Key: HBASE-5674
                 URL: https://issues.apache.org/jira/browse/HBASE-5674
             Project: HBase
          Issue Type: Improvement
            Reporter: He Yongqiang
            Assignee: He Yongqiang


Right now, a millisecond-level timestamp is attached to every record. 
In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242115#comment-13242115 ] 

He Yongqiang commented on HBASE-5674:
-------------------------------------

okay. Now i need to make it public on my lack sense of humor. :)

Here is the real problem:
In our use case, the space the data occupies *really* matter. We need to find all kind of things that we can do to bring down the size as much as possible. Apparently we do not want to bring in LZMA compression or bzip2 compression as they are really slow. In my simple test, a 41MB data can be reduced to 32MB after i rewrite the hbase Long timestamp to zero. The 8-bytes Long timestamp is heavy is because it is binary system timestamp which makes it very hard to compress (MemstoreTS is also a Long timestamp but there is no problem with it as it will be zero eventually). And if you look at how we are using that data, pretty much that data is not used by most applications if the data is system generated (not specified by applications). A good reason to make it configurable is some application may do specify it. In that case, pretty much you as hbase can not modify that data. But for a lot of other applications which do not care this data should not suffer this problem if data size really matter to them. 
I think this could benefit other community members as they may see this problem when they want to decrease the data size. 


                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242505#comment-13242505 ] 

stack commented on HBASE-5674:
------------------------------

@He np.  Thanks for the background.  On a slightly related note, I was going to ask if you'd been following Matt's work over in hbase-4676.  The compression factor he gets over there I thought you'd be interested in.
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241544#comment-13241544 ] 

stack commented on HBASE-5674:
------------------------------

bq. A millisecond timestamp is too heavy to carry. 

For whom?

Can you not just have your client specify timestamp of 0?
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242085#comment-13242085 ] 

stack commented on HBASE-5674:
------------------------------

bq. I hope this can be done in open source hbase, and can be pluggable.

Can you do your research w/o requiring that your 'researchy' code be committed to core.  The most of us working on hbase are trying to make it an hardcore production worthy platform.  'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective.

But maybe the research aligns with where hbase is trying to go.  Whats your research on?

Thanks.
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "Matt Corgan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242460#comment-13242460 ] 

Matt Corgan commented on HBASE-5674:
------------------------------------

I've been brainstorming something similar as a follow-on to HBASE-4676.  The more similar timestamps you have in a block, the smaller the encoded version.  Most people doing a simple, flat table with 1 version of each cell don't care about the timestamps.  They're only needed to pick the latest cell.  If all timestamps in an HFile are the same then they will encode down to nothing.

One possibility is to have an option "flattenTimestamps" where you grab t=currentTimeMillis() at the beginning of a flush and overwrite all timestamps with it.  To support multiple versions of a cell, you could use t-1, t-2, etc (as long as they don't go all the way back to the previous hfile's timestamp).
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242650#comment-13242650 ] 

He Yongqiang commented on HBASE-5674:
-------------------------------------

Thanks Matt and stack for the point out of 4676. Yeah, we are very very interested in the work that is going on HBase-4767.
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242028#comment-13242028 ] 

He Yongqiang commented on HBASE-5674:
-------------------------------------

bq. For whom?

For our 'researchy' project...

bq. Can you not just have your client specify timestamp of 0?

I hope this can be done in open source hbase, and can be pluggable. 
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "Mikhail Bautin (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242650#comment-13242650 ] 

Mikhail Bautin edited comment on HBASE-5674 at 4/6/12 9:15 PM:
---------------------------------------------------------------

Thanks Matt and stack for the point out of 4676. Yeah, we are very very interested in the work that is going on HBase-4676.
                
      was (Author: he yongqiang):
    Thanks Matt and stack for the point out of 4676. Yeah, we are very very interested in the work that is going on HBase-4767.
                  
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242094#comment-13242094 ] 

He Yongqiang commented on HBASE-5674:
-------------------------------------

I use the term 'researchy' as it is mentioned so in one email thread. refer to http://osdir.com/ml/general/2012-03/msg52707.html I have no idea how this term come up.

bq. The most of us working on hbase are trying to make it an hardcore production worthy platform. 'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective.
So are you referring this as conflicting with your 'hardcore production worthy platform' goal? 
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242102#comment-13242102 ] 

stack commented on HBASE-5674:
------------------------------

I didn't get the 'reference'.  Sorry, that email went over my head.

bq. So are you referring this as conflicting with your 'hardcore production worthy platform' goal?

First its not 'my' goal.   Check out the notes from recent HBase PMC meeting:  http://blogs.apache.org/hbase/.

That said, I thought it a different kinda 'researchy' that was being referred to.  I like the 'research' you fellas are at.

(Pardon me.  I did not read the name on the issue before responding.  All I saw was the short description asking for an 'odd', little-substantiated behavior and I asked a question.  Even on your comeback, I failed to check the whom and was innocently reacting to what I thought was a request that hbase become a dumping ground for plugins and research)

 
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "Scott Chen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242851#comment-13242851 ] 

Scott Chen commented on HBASE-5674:
-----------------------------------

{quote}
The timestamp takes 8 bytes for every column.
{quote}

I meant to say 
the timestamp takes 8 bytes for every keyvalue.
                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5674) add support in HBase to overwrite hbase timestamp to a version number during major compaction

Posted by "Scott Chen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242850#comment-13242850 ] 

Scott Chen commented on HBASE-5674:
-----------------------------------

> Can you not just have your client specify timestamp of 0?

We still need the timestamp when the write was happening for versioning.
But after major compaction we don't need the exact timestamp.
We only need to know it's version is old.

The timestamp takes 8 bytes for every column.
And the last digits of these timestamps are very random so it cannot be compressed well.
In our tests, the space it consumed can be quite significant.

I believe in a lot of use cases, these millisecond resolution timestamp may not be useful after compaction.
It will be nice to have the ability to remove them to save some spaces.

                
> add support in HBase to overwrite hbase timestamp to a version number during major compaction
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5674
>                 URL: https://issues.apache.org/jira/browse/HBASE-5674
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> Right now, a millisecond-level timestamp is attached to every record. 
> In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction. 
> KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira