You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Segel (JIRA)" <ji...@apache.org> on 2015/04/10 14:34:12 UTC

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14489529#comment-14489529 ] 

Michael Segel  commented on HBASE-4047:
---------------------------------------

Sorry I don't always follow Jiras. 

To answer your question... in terms of patches, it would be a massive rewrite and would probably break the existing code base using coprocessors today. 
In terms of me providing a patch... will Apache indemnify me if I get sued for introducing IP that I may have used or learned at a former company / client? 
(Didn't think so.)  

I can tell you what you need and I can pencil out a design.  But that's as far as I can go. 

In terms of a strong requirement. By creating a flag that will stop the loading of coprocessor code after the system coprocessors are loaded, the security issue is reduced to a point that the requirement goes away.  There is a large enough client who could make that request from one of the vendors, however they are not using HBase at a level where they are implementing coprocessors. 

Outside of a requirement. The issue is that using coprocessors adds risk to the system. Risk in terms of performance, stability, and security.  It also causes issues when it comes to maintenance.  You want to remove (not shut off) a coprocessor you can't without restarting the RS and reloading the coprocessors that you want loaded. (e.g. class collision) 

Coprocessors is necessary for extending HBase beyond a simple object store.  Security (XASecure / Ranger) require it.  Adding OLTP and RDBMs like features are also important to many.  (Transactions / Isolation levels) Fixing issues with compactions... 

But I digress.   

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)