You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by "Hiller, Dean (Contractor)" <de...@broadridge.com> on 2011/01/02 18:51:04 UTC

any plans to deploy OSGi bundles on cluster?

I was looking at distributed cache and how I need to copy local jars to
hdfs.  I was wondering if there was any plans to just deploy an OSGi
bundle(ie. Introspect and auto deploy jars from bundle to the
distributed cache and then make the api calls to deploy them to the
slave nodes so there is no work for the developer to do except deploy
OSGi bundles).

 

Not to mention, the OSGi classloader mechanism is so sweet, that I could
deploy jar A to be used by all my jobs and also deploy jar B version 1
and jar B version 2 which could be used at the same time by different
jobs without classloading problems.  

 

Ie. This is impossible with most classloading mechanisms(including
jboss's old way-they are moving to OSGi).

 

Job 1 using jar A version1 and jarB version 1

Job 2 using jar A version 1 and jarB version 2

 

Unless of course they cheat and load jar A twice but that is what OSGi
avoids.  It is a flattened classloading model in which all the
classloaders are peers "except" for the one bootstrap classloader parent
where you should never put jars into unless you are the platform.

 

Dean


This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

RE: any plans to deploy OSGi bundles on cluster?

Posted by "Hiller, Dean (Contractor)" <de...@broadridge.com>.

Check your logs for Connection problems(all of them).  If you see any
localhost connections, that could be the problem if you are runnig a
cluster.

Dean

 

From: Jon Lederman [mailto:jon2718@gmail.com] 
Sent: Sunday, January 02, 2011 11:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: any plans to deploy OSGi bundles on cluster?

 

Hi,

 

I was able to run MapReduce jobs fine in standalone mode.

 

I am running on a novel mult-core microprocessor hosted via a PCI card.
I am running SMP Linux. 

 

Java -version reports:

 

java version "1.6.0_18"

OpenJDK Runtime Environment (IcedTea6 1.8) (linux-gnu build
1.6.0_18-b18)

OpenJDK Zero VM (build 14.0-b16, interpreted mode)

 

Any thoughts?  Somehow it seems that the DataNode does not try to
connect to the NameNode.  

 

After executing jps, I note that a NameNode, Secondary NameNode,
JobTracker, TaskTracker and DataNode are all running.

 

However, when I try any shell commands such as hadoop fs -ls or hadoop
fs -mkdir test, the system hangs.  

 

When I look at the log files, the NameNode on startup indicates:

Network topology has 0 racks and 0 datanodes

Also, my DataNode startup log is suspiciously short indicating only.
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
************************************************************/

There is no attempt from the DataNode to communicate or otherwise
establish communication with the NameNode.  It appears to me that the
NameNode and DataNode aren't communicating, which may be the source of
my problem.  However, i don't know why this would be or how I can debug
it since I am not sure of the internal operation of Hadoop.

 

Any help would be greatly appreciated.

Thanks.

 

-Jon

 

 

On Jan 2, 2011, at 9:51 AM, Hiller, Dean (Contractor) wrote:





I was looking at distributed cache and how I need to copy local jars to
hdfs.  I was wondering if there was any plans to just deploy an OSGi
bundle(ie. Introspect and auto deploy jars from bundle to the
distributed cache and then make the api calls to deploy them to the
slave nodes so there is no work for the developer to do except deploy
OSGi bundles).

 

Not to mention, the OSGi classloader mechanism is so sweet, that I could
deploy jar A to be used by all my jobs and also deploy jar B version 1
and jar B version 2 which could be used at the same time by different
jobs without classloading problems. 

 

Ie. This is impossible with most classloading mechanisms(including
jboss's old way-they are moving to OSGi).

 

Job 1 using jar A version1 and jarB version 1

Job 2 using jar A version 1 and jarB version 2

 

Unless of course they cheat and load jar A twice but that is what OSGi
avoids.  It is a flattened classloading model in which all the
classloaders are peers "except" for the one bootstrap classloader parent
where you should never put jars into unless you are the platform.

 

Dean

This message and any attachments are intended only for the use of the
addressee and
may contain information that is privileged and confidential. If the
reader of the 
message is not the intended recipient or an authorized representative of
the
intended recipient, you are hereby notified that any dissemination of
this
communication is strictly prohibited. If you have received this
communication in
error, please notify us immediately by e-mail and delete the message and
any
attachments from your system.
 

 

 


This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: any plans to deploy OSGi bundles on cluster?

Posted by Jon Lederman <jo...@gmail.com>.

Hi,

I was able to run MapReduce jobs fine in standalone mode.

I am running on a novel mult-core microprocessor hosted via a PCI card.  I am running SMP Linux. 

Java -version reports:

java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (linux-gnu build 1.6.0_18-b18)
OpenJDK Zero VM (build 14.0-b16, interpreted mode)

Any thoughts?  Somehow it seems that the DataNode does not try to connect to the NameNode.  

After executing jps, I note that a NameNode, Secondary NameNode, JobTracker, TaskTracker and DataNode are all running.

However, when I try any shell commands such as hadoop fs -ls or hadoop fs -mkdir test, the system hangs.  

When I look at the log files, the NameNode on startup indicates:
Network topology has 0 racks and 0 datanodes
Also, my DataNode startup log is suspiciously short indicating only.  /************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
************************************************************/
There is no attempt from the DataNode to communicate or otherwise establish communication with the NameNode.  It appears to me that the NameNode and DataNode aren't communicating, which may be the source of my problem.  However, i don't know why this would be or how I can debug it since I am not sure of the internal operation of Hadoop.

Any help would be greatly appreciated.
Thanks.

-Jon

On Jan 2, 2011, at 9:51 AM, Hiller, Dean (Contractor) wrote:

> I was looking at distributed cache and how I need to copy local jars to hdfs.  I was wondering if there was any plans to just deploy an OSGi bundle(ie. Introspect and auto deploy jars from bundle to the distributed cache and then make the api calls to deploy them to the slave nodes so there is no work for the developer to do except deploy OSGi bundles).
>  
> Not to mention, the OSGi classloader mechanism is so sweet, that I could deploy jar A to be used by all my jobs and also deploy jar B version 1 and jar B version 2 which could be used at the same time by different jobs without classloading problems. 
>  
> Ie. This is impossible with most classloading mechanisms(including jboss’s old way-they are moving to OSGi).
>  
> Job 1 using jar A version1 and jarB version 1
> Job 2 using jar A version 1 and jarB version 2
>  
> Unless of course they cheat and load jar A twice but that is what OSGi avoids.  It is a flattened classloading model in which all the classloaders are peers “except” for the one bootstrap classloader parent where you should never put jars into unless you are the platform.
>  
> Dean
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the 
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
> 
>

Re: any plans to deploy OSGi bundles on cluster?

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Jan 4, 2011, at 10:30 AM, Hiller, Dean (Contractor) wrote:

> I guess I meant in the setting for number of tasks in child JVM before
> teardown.  In that case, it is nice to separate/unload my previous
> classes from the child JVM which OSGi does.  I was thinking we may do 10
> tasks / JVM setting which I thought meant have a "Child" process run 10
> tasks before shutting down....4 may be from one job and 4 from a new job
> with conflicting classes maybe.  Will that work or is it not advised?

I don't use the JVM re-use options, but I'm 99% certain that the task JVM's are not shared between jobs.

RE: any plans to deploy OSGi bundles on cluster?

Posted by "Hiller, Dean (Contractor)" <de...@broadridge.com>.

I guess I meant in the setting for number of tasks in child JVM before
teardown.  In that case, it is nice to separate/unload my previous
classes from the child JVM which OSGi does.  I was thinking we may do 10
tasks / JVM setting which I thought meant have a "Child" process run 10
tasks before shutting down....4 may be from one job and 4 from a new job
with conflicting classes maybe.  Will that work or is it not advised?
Thanks,
Dean

-----Original Message-----
From: Allen Wittenauer [mailto:awittenauer@linkedin.com] 
Sent: Monday, January 03, 2011 9:28 PM
To: <ma...@hadoop.apache.org>
Subject: Re: any plans to deploy OSGi bundles on cluster?


On Jan 2, 2011, at 9:51 AM, Hiller, Dean (Contractor) wrote:

> I was looking at distributed cache and how I need to copy local jars
to
> hdfs.  I was wondering if there was any plans to just deploy an OSGi
> bundle(ie. Introspect and auto deploy jars from bundle to the
> distributed cache and then make the api calls to deploy them to the
> slave nodes so there is no work for the developer to do except deploy
> OSGi bundles).

	AFAIK, no.

> Not to mention, the OSGi classloader mechanism is so sweet, that I
could
> deploy jar A to be used by all my jobs and also deploy jar B version 1
> and jar B version 2 which could be used at the same time by different
> jobs without classloading problems.  

	Given that distributed caches are set per-job, this isn't a
problem with Hadoop either.  Each job's task gets its own JVM.  The only
time that I know of versioning being an issue is when one conflicts with
a bundled Hadoop jar.  [... and that problem is either fixed or will be
committed soon to trunk]
This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: any plans to deploy OSGi bundles on cluster?

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Jan 2, 2011, at 9:51 AM, Hiller, Dean (Contractor) wrote:

> I was looking at distributed cache and how I need to copy local jars to
> hdfs.  I was wondering if there was any plans to just deploy an OSGi
> bundle(ie. Introspect and auto deploy jars from bundle to the
> distributed cache and then make the api calls to deploy them to the
> slave nodes so there is no work for the developer to do except deploy
> OSGi bundles).

	AFAIK, no.

> Not to mention, the OSGi classloader mechanism is so sweet, that I could
> deploy jar A to be used by all my jobs and also deploy jar B version 1
> and jar B version 2 which could be used at the same time by different
> jobs without classloading problems.  

	Given that distributed caches are set per-job, this isn't a problem with Hadoop either.  Each job's task gets its own JVM.  The only time that I know of versioning being an issue is when one conflicts with a bundled Hadoop jar.  [... and that problem is either fixed or will be committed soon to trunk]