You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Himanshu Vashishtha (JIRA)" <ji...@apache.org> on 2011/05/04 10:20:03 UTC
[jira] [Commented] (HBASE-3607) Cursor functionality for results generated by Coprocessors

    [ https://issues.apache.org/jira/browse/HBASE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028663#comment-13028663 ] 

Himanshu Vashishtha commented on HBASE-3607:
--------------------------------------------

Here is the revised version of the approach. Some key features (thanks to Gary) are:
a) There is a client side cursor object (CursorClient) that is instantiated as per client request. Internally, during its instantiation, server side cursors are instantiated on each region. These server side cursors are registered with in the CP environment. During registration, we get an identifier: cursorId. CursorClient holds a mapping of region reference row to these cursorIds.

b) Existing CP RPC is being used. For invoking next(), the cursorId has to be different for each rpc, ie, appropriate combination of region reference row to cursorId should be send to the region. This is achieved by instantiating Exec objects for these calls. Its arguments are set as per the cursorId value.
Since a cursor can result null as a valid response (when it has exhausted its quota of rows, or in case of nsre), MultiResponse is not allergic to null response. It treats null as valid response (Please comment if its too intrusive).

c)Server side cursor objects are held in a lease, and this lease manager is an attribute of the RegionEnvironment.
Its corresponding attributes are there in HConstants.

d) I used a NGram tester for testing the design and implementation. It is modelled after Google NGram dataset. One should get an idea of using streaming results. Current test class creates 20 rows in 3 different regions and then reads them with a cursor batch size equals 2.

e) In case the result from any cursor is null for an iteration (may be due to exception or it exhausted the region rows), the server side cursor object is deregistered then and there. Therefore, client doesn;t need to call any close method per se. (I found that its not an intuitive one though :) ).


> Cursor functionality for results generated by Coprocessors
> ----------------------------------------------------------
>
>                 Key: HBASE-3607
>                 URL: https://issues.apache.org/jira/browse/HBASE-3607
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Himanshu Vashishtha
>         Attachments: patch-2.txt
>
>
> I tried to come up with a scanner like functionality for results generated by coprocessors at region level. 
> This is just a poc, and it will be good to have your comments on it.
> It has support for both Incremental and In-memory Result sets. Attached is a patch that has a test case for an incremental result (i.e., client receives a cursorId from the CP core method, it instantiates a cursor object and iterates over the result set. He can set a cache limit on the CursorCallable object to reduce the number of rpc --> just like scanners.
> In its current state, it has some limitations too :)), like, it is region specific only, i.e., one can instantiate and use cursor at one region only (and that region is determined by the input row while instantiating the cursor). I will try to expand it so that it can have atleast a sequential access to other regions, but as I said, I want the opinion of experts to know whether this approach really makes some sense or not.
> I have tested it with the inbuilt testing framework on my laptop only.
> It will be good if I copy the use case here in the description too:
> Test table has rows like:
>  /**
>    * The scenario is that I have these rows keys in the test table:
>   'aaa-123'
>   'aaa-456'
>   'abc-111'
>   'abd-111'
>   'abd-222'
>   & I want to return:
>   ('aaa', 2)
>   ('abc', 1)
>   ('abd', 2)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira