You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2011/06/30 18:30:28 UTC

[jira] [Created] (HBASE-4047) [Coprocessors] Generic external process host

[Coprocessors] Generic external process host
--------------------------------------------

                 Key: HBASE-4047
                 URL: https://issues.apache.org/jira/browse/HBASE-4047
             Project: HBase
          Issue Type: New Feature
          Components: coprocessors
            Reporter: Andrew Purtell


Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.

Since the initial design of HBase coprocessors some additional considerations are in play:

- Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);

- Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;

- The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.

Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Asaf Mesika (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499401#comment-13499401 ] 

Asaf Mesika commented on HBASE-4047:
------------------------------------

This truly sounds like a great feature. I was wondering:
* Did you find any performance penalties for shifting data back and forth between the processes? Did you this using the loopback interface?
* What method did you choose to communicate between those processes? TCP? Output stream piping?

                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499113#comment-13499113 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

[~asafm] Not at all, but I had to move to more immediate priorities. Hoping to circle back and do this in 2013.
                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068462#comment-13068462 ] 

eric baldeschwieler commented on HBASE-4047:
--------------------------------------------

Interesting!  I've been very concerned about the implications for multi-teneted use cases of implementing co-processors hosted inside HBase.  This seems like a very good idea.  Once 0.23 is real, I'll see what I can do to help with this.  I've also been thinking about HBase inside MR as you propose.  Is there a jira for that?

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499577#comment-13499577 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

[~asafm] I didn't get beyond some early high level thoughts. Therefore there is no data, but sure there will be some performance penalty, we must introduce an RPC mechanism between the RegionServer and the child external coprocessor host.

It seems reasonable that the external coprocessor host should handle all IPC issues, use Process/ProcessBuilder to launch a child process for hosting the user coprocessor code and get access to its stdin and stdout.

We will need to introduce a new type of Observer to the coprocessor framework that can be a singleton watching all regions in the RS. Currently we allocate a coprocessor environment for each region and an Observer can only see what goes on in that environment (for only that region). Otherwise you can imagine for a RS hosting 1000 regions there might be 1000 threads just for IPC between the external coprocessor host in the RS and not one child but 1000. That's a nonstarter. So we want one coprocessor in the RS managing communication to one child, and both parent+child handle all Observer (and Endpoint) actions on all regions, using NIO to multiplex communication among the input and output streams set up by Process/ProcessBuilder. How efficiently this can be done and how low latency it can be kept will determine the performance penalty for external coprocessors.
                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094958#comment-13094958 ] 

Arun C Murthy commented on HBASE-4047:
--------------------------------------

Andrew, there are several non MR frameworks being built on NextGen MR right now - happy to help more if you are planning on using it:

# Spark - https://github.com/mesos/spark-yarn/
# MPI - MAPREDUCE-2911

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067420#comment-13067420 ] 

Arun C Murthy commented on HBASE-4047:
--------------------------------------

Andrew, sounds exciting! Glad to help in any way possible.

Some questions: 
# Is 'generic host co-processor' a system process i.e. managed by HBase itself?
# Does the 'generic host' co-processor live forever on each region server? Or is it launched on demand?


> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Asaf Mesika (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499046#comment-13499046 ] 

Asaf Mesika commented on HBASE-4047:
------------------------------------

Was this idea abandoned?

                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068669#comment-13068669 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

@Arun:

1. A child JVM managed by HBase itself.

2. (Re)Launched on demand. 


> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125523#comment-13125523 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

Getting to this a bit late, thinking about design.

Here are some possible motivating cases:

  - A hot value cache implemented in C/C++

  - Indexing and search with Lucene indexes hosted on a colocated (impl bundled/linked with the external coprocessor and private to it) R+W distributed FS like Gluster

  - Support something we are building internally that requires efficient hand off of HFiles between processes for compaction strategy override.

Suggestions welcome, preferably useful to real activities you may be undertaking.

                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058870#comment-13058870 ] 

stack commented on HBASE-4047:
------------------------------

wow

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068674#comment-13068674 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

@Eric Ideas for frameworks for computation on top of HBase can be found in HBASE-3131 and HBASE-3220. Also, the proof of concept patch for HBASE-2000 included a toy mapreduce framework.

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-4047:
-------------------------------------

    Assignee: Andrew Purtell

Assigning to me as I will be starting implementation of this middle of September.

If someone wants to work on this sooner, no problem, just reassign.

> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170501#comment-13170501 ] 

Andrew Purtell commented on HBASE-4047:
---------------------------------------

Start with testcases, the first a test that confirms a stuck child process via SIGSTOP doesn't take down the regionserver. Thinking there should be three selectable strategies:

1. Close and reopen the region, triggering force termination of the stuck child on close, and fork/initialization of a new child on open, along with reinit of all region related resources, other coprocessors, etc.

2. Unload/reload the malfunctioning coprocessor. Will require some work in the coprocessor framework to actually support unloading in a reasonable way. The JVM may make this complicated for integrated CPs, so perhaps just for those hosted in external processes.

3. Unload/terminate the malfunctioning coprocessor and continue operation. Consider changes in the CP framework for temporary blacklisting, will need that to avoid loading the suspect CP after a split.
                
> [Coprocessors] Generic external process host
> --------------------------------------------
>
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> Where HBase coprocessors deviate substantially from the design (as I understand it) of Google's BigTable coprocessors is we've reimagined it as a framework for internal extension. In contrast BigTable coprocessors run as separate processes colocated with tablet servers. The essential trade off is between performance, flexibility and possibility, and the ability to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in play:
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and the probability that limits will be enforced via cgroups at the OS level after this is generally available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another coprocessor, but one that forks a child process, loads the target coprocessor into the child, establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide for the coprocessor loaded into the child the same semantics as if it was loaded internally to the parent, and (eventually) use available resource management capabilities on the platform -- perhaps via the mrng resource controller or directly with cgroups -- to limit the child as desired by system administrators or the application designer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira