You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "He Yongqiang (Created) (JIRA)" <ji...@apache.org> on 2012/02/22 22:59:48 UTC

[jira] [Created] (HBASE-5457) add inline index in data block for data which are not clustered together

add inline index in data block for data which are not clustered together
------------------------------------------------------------------------

                 Key: HBASE-5457
                 URL: https://issues.apache.org/jira/browse/HBASE-5457
             Project: HBase
          Issue Type: New Feature
            Reporter: He Yongqiang


As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently.

If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214915#comment-13214915 ] 

He Yongqiang commented on HBASE-5457:
-------------------------------------

@stack, we haven't thought that in much detail, but we can start the discussion by an example.

Let's say there is one column family, and it only contains one type column whose name is a combine of 'string and ts'. So the data is sorted by 'string' first. But one query wants the data to be sorted by ts instead.
                
> add inline index in data block for data which are not clustered together
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5457
>                 URL: https://issues.apache.org/jira/browse/HBASE-5457
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently.
> If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together

Posted by "He Yongqiang (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215002#comment-13215002 ] 

He Yongqiang commented on HBASE-5457:
-------------------------------------

@lars, in today's implementation we actually create another column family and reorg the column name to be 'ts and string', so the data is sorted by ts in this new column family. And we redirect the query to use the second column family. But this approach duplicates data. 
Without the second column family, we can do a search once we found the row. but that requires searching all data with the target row key. It hurts cpu. 
                
> add inline index in data block for data which are not clustered together
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5457
>                 URL: https://issues.apache.org/jira/browse/HBASE-5457
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently.
> If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214126#comment-13214126 ] 

stack commented on HBASE-5457:
------------------------------

bq. So if we can add inline block index on required columns, the second column family then is not needed.

What would this look like He?
                
> add inline index in data block for data which are not clustered together
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5457
>                 URL: https://issues.apache.org/jira/browse/HBASE-5457
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently.
> If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214922#comment-13214922 ] 

Lars Hofhansl commented on HBASE-5457:
--------------------------------------

@He. So you found the row and then you search inside the row with a ColumnRange or ColumnPrefix filter?
                
> add inline index in data block for data which are not clustered together
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5457
>                 URL: https://issues.apache.org/jira/browse/HBASE-5457
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently.
> If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira