You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiang Li (JIRA)" <ji...@apache.org> on 2016/12/09 08:01:58 UTC

[jira] [Commented] (HBASE-9272) A parallel, unordered scanner

    [ https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734638#comment-15734638 ] 

Xiang Li commented on HBASE-9272:
---------------------------------

[~lhofhansl] I would like to continue to work on this if you are ok. I also leave a note on HBASE-1935 to check with Stack. If I understand it correctly, HBASE-1935 and HBASE-9272 have the same goal/idea to implement parallel scan. Please correct me if I am wrong. 
Just need to check with you: could you please share why you stopped at that time? What are you un-happy at? Any tech obstacles or challenges you see that are hard to overcome? Feel free to comment if you have any! Thanks!

> A parallel, unordered scanner
> -----------------------------
>
>                 Key: HBASE-9272
>                 URL: https://issues.apache.org/jira/browse/HBASE-9272
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 9272-0.94-v2.txt, 9272-0.94-v3.txt, 9272-0.94-v4.txt, 9272-0.94.txt, 9272-trunk-v2.txt, 9272-trunk-v3.txt, 9272-trunk-v3.txt, 9272-trunk-v4.txt, 9272-trunk.txt, ParallelClientScanner.java, ParallelClientScanner.java
>
>
> The contract of ClientScanner is to return rows in sort order. That limits the order in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and queries regions in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, or in cases where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a bit heavy on the synchronization side - it used a BlockingQueue to hand data between the client using the scanner and the threads doing the scanning, it also could potentially starve some scanners long enugh to time out at the server).
> On the plus side, it's only a 130 lines of code. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)