You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/12/01 03:31:25 UTC

[jira] Commented: (HBASE-2001) Coprocessors: Colocate arbitrary code with regions

    [ https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784016#action_12784016 ] 

Andrew Purtell commented on HBASE-2001:
---------------------------------------

{quote}
"Regions contain references to the coprocessor implementation classes associated with them."
Q: On above, its indeed the classes, not objects?  Objects can cross the split?  Not easily anyways.
{quote}

When regions are split, new coprocessor object instances would be allocated on the daughters -- one instance for each of the coprocessor classes listed in the region metadata -- when they are opening and the coprocessor's onOpen method is invoked to give it a chance to initialize. Prior to this the parent would be informed of the impending split via an onSplit invocation, and when it closes its onClose method would be called so it can clean up. How to manage the split beyond this would be the problem of the coprocessor. 

{quote}
Do we need both closing and pendingClose? [...]
{quote}

I found that state transition in the master code and copied it verbatim from a comment block. Actually coprocessors only go through three states: opening, open, closing. 

{quote}
Why no control over flush?  Maybe it would want to hold up a flush?  You think that too dangerous?
{quote}

I do think that is too dangerous. 

{quote}
Rather, should we do the java Events model where one method gets all event types, the passed in object says that the event is.  In the method, first thing you check if its an event you are interested in?  Makes things easier to implement especially if you are only implementing part of the functionality.  This model may not make sense though for this context or may be overkill (See java.util.EventObject and some of its implementations).
{quote}

I thought about that and go back and forth. Explicit interface is also self-documenting while arcane gotchas can hide in event specific detail. There's also the notion of using ASM to weave in policy enforcement. That could be easier if each callback is its own well defined method. On the other hand there's a lot of foo() { super(); } crap for each callback that a coprocessor does not care about. My current thinking is the later does not outweigh the former. 

By the way, I am thinking about using ASM to weave in CPU and memory accounting and limit enforcement as a generic code safety policy regardless.

{quote}
Will Coprocessors make for lots of new object instantiations?  Its going to be invoked on each Get and Scan.
{quote}

Not unless the coprocessor does it. 

{quote}
The logging interface seems odd.  Why have new define?  Why not just use apache logging?
{quote}

The idea is no I/O outside of the interface is allowed. There will be an additional verification step at classload time, implemented with ASM, that checks against a whitelist. Making the whitelist to the extent possible a single interface is a simplifying choice.

{quote}
Should we be extracting an Interface from Region so we can have a Region implemetention and so your Coprocessor can have an implementation too?  We sort of did something like with the "Incommon" interface we have for testing that has allows for implementations that run the same tests only now against the Region and then against the client-side.  Extracting a 'official' Region interface sounds grand to me... would help with testing?
{quote}

That's a good idea. Should be a separate issue? 

{quote}
How does the PrivateStore persist?  Where?  What you thinking?
{quote}

One PrivateStore for each coprocessor would persist as an HFile+log in the region's store. Would be cloned into daughters on split. Would get periodic compaction whenever the store is compacted. The general idea is to do something less than manage a real table in a way that hooks in naturally with store management. I gave it a table interface but it could be just a bag of KVs if supporting multiple column families in a single HFile+log is too much trouble. 



> Coprocessors: Colocate arbitrary code with regions
> --------------------------------------------------
>
>                 Key: HBASE-2001
>                 URL: https://issues.apache.org/jira/browse/HBASE-2001
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>         Attachments: asm-3.2-bin.zip, asm-transformations.pdf, org.apache.hadoop.hbase.HCoprocessor.java, org.apache.hadoop.hbase.HCoprocessor.pdf
>
>
> "Support arbitrary code that runs run next to each region in table. As regions split and move, coprocessor code should automatically  move also."
> Use classloader which looks on HDFS.
> Associate a list of classes to load with each table. Put this in HRI so it inherits from table but can be changed on a per region basis (so then those region specific changes can inherited by daughters). 
> Not completely arbitrary code, should require implementation of an interface with callbacks for:
> * Open
> * Close
> * Split
> * Compact
> * (Multi)get and scanner next()
> * (Multi)put
> * (Multi)delete
> Add method to HRegionInterface for invoking coprocessor methods and retrieving results.  
> Add methods in o.a.h.h.regionserver or subpackage which implement convenience functions for coprocessor methods and consistent/controlled access to internals: store access, threading, persistent and ephemeral state, scratch storage, etc. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.