You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by blunderboy <sa...@gmail.com> on 2012/05/30 06:36:03 UTC

OSGI bundle of nutch

Can we run Apache Nutch 1.4 in OSGI framework. I want to create an OSGI
bundle of nutch. I am using eclipse indigo to compile the Nutch source code.
So i think there should be some kind of plugin which can create OSGI bundle
instead of jar after the compilation. I just need the OSGI bundle of Nutch.

I don't even know if it is possible.

--
View this message in context: http://lucene.472066.n3.nabble.com/OSGI-bundle-of-nutch-tp3986767.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: OSGI bundle of nutch

Posted by Kirby Bohling <ki...@gmail.com>.
I'm not really interested in continuing this conversation, but to summaries
what I've said before, and give pointers to a couple of the places I think
would have interesting followup.

The tooling for making an OSGi app is much better than the last discussion
I was involved in[1].  Also, Tika is now OSGi complaint[2] (which would
make life tons easier), than the last time there was a discussion.  The big
trick IMHO, is still how to get the underlying Hadoop to be OSGi aware or
friendly, and package everything into a job, and get Hadoop to unpack it in
such a way that the OSGi implementation will work correctly.

The plugin model Nutch uses is very similar to how OSGi works, so it should
be fairly straightforward to port.  I had a version of Nutch that
semi-worked based upon a 1.0 or 1.1 branch.  The big problem was how many
things would needed to be re-worked, while there was a pretty significant
re-write/re-org in progress.  I wouldn't bother with making Nutch OSGi
friendly, until Hadoop can distributed jobs as OSGi bundles.  It should be
possible, but I haven't wanted to take the time to dig into it.  But there
are guys on the Hadoop list who have made Hadoop run as inside an OSGi
framework [3].  Not sure if they distributed jobs as OSGi bundles or what.
 I've never taken the time to investigate them.  It just doesn't seem to be
high on the list of priorities of the Hadoop folks to get that done.  The
problem with OSGi is that so many third party libraries assume global
classloaders to implement their plugin systems.  Nutch used to be plagued
by code which didn't have good release controls (at one point it was using
at least one version of an Apache project that was never released, and had
been abandoned in place, commons-console I think?  It also wasn't obvious
the precise versions of several other libraries, I think that the usage of
Ivy has addressed all of those issues).

I think it'd be great to see this done, but I never had the time to
actually make it happen.  Hopefully the pointers get you to the proper
discussions and points where similar and related work are being done.

Kirby

[1] http://www.mail-archive.com/user@nutch.apache.org/msg02694.html
[2] http://tika.apache.org/0.6/index.html
[3]
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201202.mbox/%3C4F3285F1.2000704@nanthrax.net%3E

On Wed, May 30, 2012 at 6:45 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> please see the countless conversations we've had on this on both user@
> and dev@ list archives.
>
> Thank you
>
>
>
> On Wed, May 30, 2012 at 5:36 AM, blunderboy <sa...@gmail.com>
> wrote:
> > Can we run Apache Nutch 1.4 in OSGI framework. I want to create an OSGI
> > bundle of nutch. I am using eclipse indigo to compile the Nutch source
> code.
> > So i think there should be some kind of plugin which can create OSGI
> bundle
> > instead of jar after the compilation. I just need the OSGI bundle of
> Nutch.
> >
> > I don't even know if it is possible.
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/OSGI-bundle-of-nutch-tp3986767.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>
> --
> Lewis
>

RE: Cannot run program "chmod"

Posted by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>.
Problem solved, thanks Sebastian , the problem was with swap memory, I had to change the fstab file and run swapon command in centos.

_____________________________________________________________________
Ing. Eyeris Rodriguez Rueda
Teléfono:837-3370
Universidad de las Ciencias Informáticas
_____________________________________________________________________

-----Mensaje original-----
De: Sebastian Nagel [mailto:wastl.nagel@googlemail.com] 
Enviado el: miércoles, 30 de mayo de 2012 2:35 PM
Para: user@nutch.apache.org
CC: Ing. Eyeris Rodriguez Rueda
Asunto: Re: Cannot run program "chmod"

This is not really a problem of Nutch. The fork to run the command "chmod" failed because your machine does not have enough memory (RAM + swap).

For more information you should google for  error=12, Cannot allocate memory  hadoop error=12

Possible solutions (assuming you are using Linux):
 look for how to "overcommit memory"
 increase the swap space

Sebastian

-------- Original Message --------
Subject: Cannot run program "chmod"
Date: Wed, 30 May 2012 14:46:20 -0400 (CDT)
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
Reply-To: user@nutch.apache.org
To: user@nutch.apache.org

Hi all.
When i'm crawling, nutch throw this error.

Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory

but if i try to crawl again so the error es with another site, any body can suggest me how to solve this problem.

This is a part of my log file.
*****************************************************************************************
2012-05-30 14:15:47,468 INFO  crawl.Crawl - crawl started in: crawl
2012-05-30 14:15:47,480 INFO  crawl.Crawl - rootUrlDir = urls
2012-05-30 14:15:47,480 INFO  crawl.Crawl - threads = 10
2012-05-30 14:15:47,480 INFO  crawl.Crawl - depth = 5
2012-05-30 14:15:47,480 INFO  crawl.Crawl - solrUrl=http://localhost:8080/solr
2012-05-30 14:15:47,480 INFO  crawl.Crawl - topN = 300
*
*
*
*
*
2012-05-30 14:17:26,147 INFO  parse.ParseSegment - Parsing: http://www.uci.cu/
2012-05-30 14:17:26,262 WARN  mapred.LocalJobRunner - job_local_0011
java.io.IOException: Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
****************************************************************************************


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Cannot run program "chmod"

Posted by Sebastian Nagel <wa...@googlemail.com>.
This is not really a problem of Nutch. The fork to run the command "chmod" failed
because your machine does not have enough memory (RAM + swap).

For more information you should google for
 error=12, Cannot allocate memory
 hadoop error=12

Possible solutions (assuming you are using Linux):
 look for how to "overcommit memory"
 increase the swap space

Sebastian

-------- Original Message --------
Subject: Cannot run program "chmod"
Date: Wed, 30 May 2012 14:46:20 -0400 (CDT)
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
Reply-To: user@nutch.apache.org
To: user@nutch.apache.org

Hi all.
When i'm crawling, nutch throw this error.

Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory

but if i try to crawl again so the error es with another site, any body can suggest me how to solve
this problem.

This is a part of my log file.
*****************************************************************************************
2012-05-30 14:15:47,468 INFO  crawl.Crawl - crawl started in: crawl
2012-05-30 14:15:47,480 INFO  crawl.Crawl - rootUrlDir = urls
2012-05-30 14:15:47,480 INFO  crawl.Crawl - threads = 10
2012-05-30 14:15:47,480 INFO  crawl.Crawl - depth = 5
2012-05-30 14:15:47,480 INFO  crawl.Crawl - solrUrl=http://localhost:8080/solr
2012-05-30 14:15:47,480 INFO  crawl.Crawl - topN = 300
*
*
*
*
*
2012-05-30 14:17:26,147 INFO  parse.ParseSegment - Parsing: http://www.uci.cu/
2012-05-30 14:17:26,262 WARN  mapred.LocalJobRunner - job_local_0011
java.io.IOException: Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
*****************************************************************************************

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Cannot run program "chmod"

Posted by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>.
Hi all.
When i'm crawling, nutch throw this error.

Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory

but if i try to crawl again so the error es with another site, any body can suggest me how to solve this problem.

This is a part of my log file.
*****************************************************************************************
2012-05-30 14:15:47,468 INFO  crawl.Crawl - crawl started in: crawl
2012-05-30 14:15:47,480 INFO  crawl.Crawl - rootUrlDir = urls
2012-05-30 14:15:47,480 INFO  crawl.Crawl - threads = 10
2012-05-30 14:15:47,480 INFO  crawl.Crawl - depth = 5
2012-05-30 14:15:47,480 INFO  crawl.Crawl - solrUrl=http://localhost:8080/solr
2012-05-30 14:15:47,480 INFO  crawl.Crawl - topN = 300
*
*
*
*
*
2012-05-30 14:17:26,147 INFO  parse.ParseSegment - Parsing: http://www.uci.cu/
2012-05-30 14:17:26,262 WARN  mapred.LocalJobRunner - job_local_0011
java.io.IOException: Cannot run program "chmod": java.io.IOException: error=12, Cannot allocate memory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
*****************************************************************************************

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: OSGI bundle of nutch

Posted by Lewis John Mcgibbney <le...@gmail.com>.
please see the countless conversations we've had on this on both user@
and dev@ list archives.

Thank you



On Wed, May 30, 2012 at 5:36 AM, blunderboy <sa...@gmail.com> wrote:
> Can we run Apache Nutch 1.4 in OSGI framework. I want to create an OSGI
> bundle of nutch. I am using eclipse indigo to compile the Nutch source code.
> So i think there should be some kind of plugin which can create OSGI bundle
> instead of jar after the compilation. I just need the OSGI bundle of Nutch.
>
> I don't even know if it is possible.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/OSGI-bundle-of-nutch-tp3986767.html
> Sent from the Nutch - User mailing list archive at Nabble.com.



-- 
Lewis