You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "John Carrino (Created) (JIRA)" <ji...@apache.org> on 2011/11/17 20:03:51 UTC

[jira] [Created] (HBASE-4811) Support reverse Scan

Support reverse Scan
--------------------

                 Key: HBASE-4811
                 URL: https://issues.apache.org/jira/browse/HBASE-4811
             Project: HBase
          Issue Type: Improvement
          Components: client
    Affects Versions: 0.20.6
            Reporter: John Carrino


All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  

I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "John Carrino (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155692#comment-13155692 ] 

John Carrino commented on HBASE-4811:
-------------------------------------

So looking at the region splitting code it looks like any scans that
are open on ranges that are split get a special exception type and
then just open a new scanner.  So we don't have to worry about reverse
iteration any more than forward with respect to splitting.

I think this might boil down to writing a reverse iterator
(HFileScanner) for HFile.

-jc


On Tue, Nov 22, 2011 at 3:48 PM, stack (Commented) (JIRA)

                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155426#comment-13155426 ] 

stack commented on HBASE-4811:
------------------------------

bq. Is there a fundamental reason that HBase only supports forward Scan?

Yes.  All the data is sorted in one direction only and all Scan objects are written to go in the data's 'natural' direction.  There is no native support for going backwards whether its reading from files 'backwards' or getting a view on our MemStore that gives a reverse-sort-view.

To make it work, you'd have to write a bunch of code and you'd be always going against the grain.

It used to come up the odd time in the early days but versions on the above args would usually quiet them.

If you need more detail, ask.
                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "John Carrino (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155519#comment-13155519 ] 

John Carrino commented on HBASE-4811:
-------------------------------------

Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as "sstables" or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

Also to provide some context, this table we want to scan both ways is effectively an index which will be relatively small and we would like to pin in memory (as much as possible).  Also likely that this will run on all Sold State, so doing reverse reads won't be a perf hit like it would be for spinny drives.


                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155573#comment-13155573 ] 

stack commented on HBASE-4811:
------------------------------

I'd suggest you spend more time w/ the code base to see how much of effort would be required doing a reverse scan (Superficially, yes, our MemStore is a NavigableSet but that is not what client interacts with; ditto our sstable-like hfile thing.  IIRC leveldb counsels that the reverse range is going against the grain and at a minimum is much slower than the natural scan).
                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4811) Support reverse Scan

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-4811:
---------------------------------------

    Comment: was deleted

(was: Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as "sstables" or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

-jc


On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA)
)
    
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "John Carrino (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155961#comment-13155961 ] 

John Carrino commented on HBASE-4811:
-------------------------------------

Digging a littler deeper it appears that this was already planned when the V2 HFile format was written.  In the header of a block is the offset of the previous block of the same type.  I think this is currently used to support efficient lookups when seeking to a location, but could also be used easily for reverse scan.
                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4811) Support reverse Scan

Posted by "John Carrino (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155506#comment-13155506 ] 

John Carrino commented on HBASE-4811:
-------------------------------------

Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as "sstables" or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

-jc


On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA)

                
> Support reverse Scan
> --------------------
>
>                 Key: HBASE-4811
>                 URL: https://issues.apache.org/jira/browse/HBASE-4811
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.6
>            Reporter: John Carrino
>
> All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending.  Is there a fundamental reason that HBase only supports forward Scan?  It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables.  
> I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira