You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Wmelo <wm...@olimpo.com.br> on 2005/12/02 01:49:35 UTC

More Problems with crawling

I am wondering if Nutch is really usable in the real world.  As I mentioned
in a early e-mail today, I had problems with something like "hot spot in
virtual machine".  Now I am having another kind of problem, that is the same
problem that I reported about 1 month ago without any reply.  I am using
Nutch 0.7.1, FC-3, 1 gig ram, 4 mbits conection, 50 threads and a got the
following error message:

051201 201356 Processing pagesByURL: Sorted 44235.525534441804
instructions/second
Exception in thread "main" java.io.IOException: key out of order:
http://neic.usgs.gov/neis/states/states.html after
http:o/neic.usgs.gov/neis/states/state_largest.html
at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:134)
at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:120)
at
org.apache.nutch.db.WebDBWriter$PagesByURLProcessor.mergeEdits(WebDBWriter.java:736)
at
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:557)
at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
at
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
at
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)
[root@localhost nutch-0.7.1]#

I am using Nutch for more than 1 year now, but, recently, for each 5 tries,
I am happy if I succeed in finishing the task just one time.
I really don't kown what is going on, as a I am just an user and not a
programmer.  What I really know is that I am geting much more headaches than
satisfaction in trying to setup anything using nutch.
Tanks,
Wmelo


________________________________________________
Olimpo - A sua internet !
http://www.olimpo.com.br