You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Mikhail Bautin (Created) (JIRA)" <ji...@apache.org> on 2011/12/06 02:23:40 UTC

[jira] [Created] (HBASE-4962) Optimize time range scans using a delete Bloom filter

Optimize time range scans using a delete Bloom filter
-----------------------------------------------------

                 Key: HBASE-4962
                 URL: https://issues.apache.org/jira/browse/HBASE-4962
             Project: HBase
          Issue Type: Improvement
            Reporter: Mikhail Bautin
            Assignee: Mikhail Bautin
            Priority: Minor


To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4962) Optimize time range scans using a delete Bloom filter

Posted by "Mikhail Bautin (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin reassigned HBASE-4962:
-------------------------------------

    Assignee: Liyin Tang  (was: Mikhail Bautin)

Liyin: assigning this issue to you since you said you would work on this (correct me if I'm wrong). This is the "seek-to-timestamp" fix we were talking about. It will require adding a (row, col) "delete-column" Bloom filter, or adding another type of keys to the existing "delete-family" Bloom filter.
                
> Optimize time range scans using a delete Bloom filter
> -----------------------------------------------------
>
>                 Key: HBASE-4962
>                 URL: https://issues.apache.org/jira/browse/HBASE-4962
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Liyin Tang
>            Priority: Minor
>
> To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4962) Optimize time range scans using a delete Bloom filter

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin resolved HBASE-4962.
-----------------------------------

    Resolution: Duplicate
    
> Optimize time range scans using a delete Bloom filter
> -----------------------------------------------------
>
>                 Key: HBASE-4962
>                 URL: https://issues.apache.org/jira/browse/HBASE-4962
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Pritam Damania
>            Priority: Minor
>
> To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4962) Optimize time range scans using a delete Bloom filter

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-4962:
-----------------------------------------

    Assignee: Pritam Damania  (was: Liyin Tang)
    
> Optimize time range scans using a delete Bloom filter
> -----------------------------------------------------
>
>                 Key: HBASE-4962
>                 URL: https://issues.apache.org/jira/browse/HBASE-4962
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Pritam Damania
>            Priority: Minor
>
> To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira