You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Jean-Baptiste Onofré <jb...@nanthrax.net> on 2012/02/08 15:25:53 UTC

[PROPOSAL] Hadoop OSGi compliant and Apache Karaf features

Hi folks,

I'm working right now to turn Hadoop as an OSGi compliant set of modules.

I've more or less achieved the first step:
- turn all Hadoop modules (common, annotations, hdfs, mapreduce, etc) as 
OSGi bundle
- provide a Karaf features descriptor to easily deploy it into Apache 
Karaf OSGi container

I will upload the patches on the different Jira related to that.

The second step that I propose is to introduce blueprint descriptor in 
order to expose some Hadoop features as OSGi services.
It won't affect the "non-OSGi" users but give lot of fun and interest 
for OSGi users ;)

WDYT ?

Regards
JB

PS: the Jira issues are HADOOP-6484, HADOOP-7977, MAPREDUCE-243. It 
would be great if someone can assign it to me (easier to track it).
-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] Hadoop OSGi compliant and Apache Karaf features

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Steve

My other comments inline:
> Zookeeper would be nice too, as you could bring up a very small cluster

+1, I will tackle that too ;)

> -there are a lot of calls to System.exit() in Hadoop when it isn't
> happy, you need a security manager to catch them and turn them into
> exceptions -and no, the code doesn't expect exceptions everywhere.

I will check if we can trap this. Maybe a modification in the core code 
could do that.

>
> -There are a lot of assumptions that every service (namenode, datanode,
> etc) is running in its own VM, with its own singletons. They will all
> need their own classloaders, which implies separate OSGi bundles for
> each public service.

We can imagine a kind of "fork" in the OSGi container. On the other 
hand, singletons are per classloader, so we can handle that.

>
> YARN is even more interesting, as it works by deploying the application
> master (such as the MR engine) on request, picking a suitable node and
> executing the entry point with a classpath (somehow) set up. If you are
> going to work with trunk you will need to address this, the simplest
> tactic being "don't try and run YARN-based services under OSGi, just the
> YARN Resource Manager and Node Managers itself";
>
> A more advanced options "support OSGi-based YARN services specially",
> would also be good if it could start both Application Masters and their
> container applications themselves (Task Trackers &c), and aided the
> execution of things like actual tasks within the OSGi container (for
> speed).
>
> If you are looking a production use of this stuff, you'll need to worry
> about loading of the native libraries too. Otherwise this becomes more
> restricted to experimental-small-machine setups.
>

Thanks for these comments !! I will take care of that in the following 
patches ;)

Thanks again,
Regards
JB

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] Hadoop OSGi compliant and Apache Karaf features

Posted by Steve Loughran <st...@apache.org>.
On 08/02/12 14:25, Jean-Baptiste Onofré wrote:
> Hi folks,
>
> I'm working right now to turn Hadoop as an OSGi compliant set of modules.
>
> I've more or less achieved the first step:
> - turn all Hadoop modules (common, annotations, hdfs, mapreduce, etc) as
> OSGi bundle
> - provide a Karaf features descriptor to easily deploy it into Apache
> Karaf OSGi container
>
> I will upload the patches on the different Jira related to that.
>
> The second step that I propose is to introduce blueprint descriptor in
> order to expose some Hadoop features as OSGi services.
> It won't affect the "non-OSGi" users but give lot of fun and interest
> for OSGi users ;)
>

Zookeeper would be nice too, as you could bring up a very small cluster

As I mentioned in one of the JIRA comments

-there are a lot of calls to System.exit() in Hadoop when it isn't 
happy, you need a security manager to catch them and turn them into 
exceptions -and no, the code doesn't expect exceptions everywhere.

-There are a lot of assumptions that every service (namenode, datanode, 
etc) is running in its own VM, with its own singletons. They will all 
need their own classloaders, which implies separate OSGi bundles for 
each public service.

YARN is even more interesting, as it works by deploying the application 
master (such as the MR engine) on request, picking a suitable node and 
executing the entry point with a classpath (somehow) set up. If you are 
going to work with trunk you will need to address this, the simplest 
tactic being "don't try and run YARN-based services under OSGi, just the 
YARN Resource Manager and Node Managers itself";

A more advanced options "support OSGi-based YARN services specially", 
would also be good if it could start both Application Masters and their 
container applications themselves (Task Trackers &c), and aided the 
execution of things like actual tasks within the OSGi container (for 
speed).

If you are looking a production use of this stuff, you'll need to worry 
about loading of the native libraries too. Otherwise this becomes more 
restricted to experimental-small-machine setups.