You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by hareesh <ha...@hotmail.com> on 2010/06/11 14:22:05 UTC

LinkDb creation is Too slow

Hi,

I am facing a problem now. I will tell you the problem in detail.I have
crawled near about 1.5 Million urls.Every thing (Crawling and  Update Db)
was fine . Once this process was done. I started the invert link process. It
worked fine and created the linkdb. After that I was having some problem
with my Indexing plug-in. I did some code modifications to my plug-in and
every thing went fine till that.
After that i thought of indexing another segment. for that first I deleted
the linkdb already there and tried to create the linkdb from the new
segment. The problem is that the linkdb is taking too long its going in
hours.. now its almost 8 Hrs that have started the process. There is no
exception nothing. As I haven't done any modifications to the nutch source I
believe the problem is not because of that.

Present Segment Details:

No of URL in the segment : 80,000


I'm using a cluster of 4 with configuration of 2Tb hard drive and 4GB Ram.
All the systems except one is  having enough disk space too. one of the
system from the cluster is having 48GB only as free.

Can any one help me with there comments. Thanks in advance.


-- 
View this message in context: http://lucene.472066.n3.nabble.com/LinkDb-creation-is-Too-slow-tp888368p888368.html
Sent from the Nutch - User mailing list archive at Nabble.com.