You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2008/09/24 01:01:44 UTC

[jira] Created: (HBASE-899) Support for specifying a timestamp and numVersions on a per-column basis

Support for specifying a timestamp and numVersions on a per-column basis
------------------------------------------------------------------------

                 Key: HBASE-899
                 URL: https://issues.apache.org/jira/browse/HBASE-899
             Project: Hadoop HBase
          Issue Type: New Feature
            Reporter: Doğacan Güney


This is just an idea and it may be better to wait after the planned API changes. But I think it would be useful to support fetching different timestamps and versions for different columns.

Example:

If a row has 2 columns, "col1:" and "col2:" I want to be able to ask for (during scan or read time, doesn't matter) 2 versions of "col1:" (maybe even between timestamps t1 and t2) but only 1 version of "col2:". This would be especially handy if during an MR job you have to read 2 versions of a small column, but do not want the overhead of reading 2 versions of every other column too....

(Also, the mechanism is already there. I mean, making the changes to support a per-column timestamp/numVersions is  ridiculously easy :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-899) Support for specifying a timestamp and numVersions on a per-column basis

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634489#action_12634489 ] 

Doğacan Güney commented on HBASE-899:
-------------------------------------

> Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy.

I am just worried about having to pass large amounts of data over RPC, only to consistently discard. It seems... a bit wasteful :D

And, if hbase intends to support row-wide timestamp range and numVersions, I just don't see how doing it per-column would be any more difficult or slower. A many-column read will already be done in a read-one-column-merge-result-to-rest kind of way. So, while reading one column, region server just checks what user specified for that column. (or maybe I am missing something:)



> Support for specifying a timestamp and numVersions on a per-column basis
> ------------------------------------------------------------------------
>
>                 Key: HBASE-899
>                 URL: https://issues.apache.org/jira/browse/HBASE-899
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Doğacan Güney
>
> This is just an idea and it may be better to wait after the planned API changes. But I think it would be useful to support fetching different timestamps and versions for different columns.
> Example:
> If a row has 2 columns, "col1:" and "col2:" I want to be able to ask for (during scan or read time, doesn't matter) 2 versions of "col1:" (maybe even between timestamps t1 and t2) but only 1 version of "col2:". This would be especially handy if during an MR job you have to read 2 versions of a small column, but do not want the overhead of reading 2 versions of every other column too....
> (Also, the mechanism is already there. I mean, making the changes to support a per-column timestamp/numVersions is  ridiculously easy :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-899) Support for specifying a timestamp and numVersions on a per-column basis

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634477#action_12634477 ] 

Andrew Purtell commented on HBASE-899:
--------------------------------------

Can this be handled with filters? For example, by making a FilterSet that ANDs its terms, then by adding to the set a filter that selects col1 by modified ColumnValueFilter that has comparison operators for timestamps, and then by adding a (new) VersionFilter that only allows through a specified number of versions? 

Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy.

> Support for specifying a timestamp and numVersions on a per-column basis
> ------------------------------------------------------------------------
>
>                 Key: HBASE-899
>                 URL: https://issues.apache.org/jira/browse/HBASE-899
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Doğacan Güney
>
> This is just an idea and it may be better to wait after the planned API changes. But I think it would be useful to support fetching different timestamps and versions for different columns.
> Example:
> If a row has 2 columns, "col1:" and "col2:" I want to be able to ask for (during scan or read time, doesn't matter) 2 versions of "col1:" (maybe even between timestamps t1 and t2) but only 1 version of "col2:". This would be especially handy if during an MR job you have to read 2 versions of a small column, but do not want the overhead of reading 2 versions of every other column too....
> (Also, the mechanism is already there. I mean, making the changes to support a per-column timestamp/numVersions is  ridiculously easy :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-899) Support for specifying a timestamp and numVersions on a per-column basis

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634522#action_12634522 ] 

Jim Kellerman commented on HBASE-899:
-------------------------------------

Once we have HBASE-847 and HBASE-52 in place this should not be difficult to add.

We also need to factor in HBASE-861. Is it a bug or not?


> Support for specifying a timestamp and numVersions on a per-column basis
> ------------------------------------------------------------------------
>
>                 Key: HBASE-899
>                 URL: https://issues.apache.org/jira/browse/HBASE-899
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Doğacan Güney
>
> This is just an idea and it may be better to wait after the planned API changes. But I think it would be useful to support fetching different timestamps and versions for different columns.
> Example:
> If a row has 2 columns, "col1:" and "col2:" I want to be able to ask for (during scan or read time, doesn't matter) 2 versions of "col1:" (maybe even between timestamps t1 and t2) but only 1 version of "col2:". This would be especially handy if during an MR job you have to read 2 versions of a small column, but do not want the overhead of reading 2 versions of every other column too....
> (Also, the mechanism is already there. I mean, making the changes to support a per-column timestamp/numVersions is  ridiculously easy :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.