You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/01/25 17:27:04 UTC

threads-safe methods in Nutch

Hi guys,

 

I know it's me again. I have been testing Nutch robustly lately and here
some threads issues that I found.

I am running version 0.8.2-dev. When Nutch is initially run (either from
script or ANT), it has a default of 10 threads for the fetcher. This is
actually good for performance reason as large number of urls can be indexed
fast enough. The problem is some plugins are not thread safe (or is it the
fetcher that's not thread-safe).

 

I am running the parse-xml plugin (Nutch-185) and some issues:

 

When running multiple threads such as the default "10 threads", I have some
inconsistency with the stored fields and values. I found out the first 6
documents will be indexed without problem and then 4 with errors, 4 correct
and x numbers with errors and so forth. At first I couldn't see where the
problem was, and after several debugging activities, I realize that it could
be a threading issue. I run Nutch with the minimum threading of 1 and the
fields were stored without any issues.

 

I don't know how to conclude this but I think that the methods that Nutch
uses for threading are not thread safe. I could be wrong therefore I am
awaiting any reply.

 

Regards,

 

Armel

 

-------------------------------------------------

Armel T. Nene

iDNA Solutions

Tel: +44 (207) 257 6124

Mobile: +44 (788) 695 0483 

 <http://blog.idna-solutions.com/> http://blog.idna-solutions.com