You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chih How Bong <ch...@gmail.com> on 2005/12/27 06:31:44 UTC

Crawl problem in 0.7 and 0.7.1

Hi all,
  I encountered problems when I run nutch 0.7 and 0.7.1 crawler.
  Although I have added a number of root url in a plain text file *urls *as
it the crawler seems unwillingly to fetch any of the urls. However, when In
fall back to the nutch 0.6, everything just works fine under it.
  Therefore, I wondering if this problem happen to all of you? Currently, I
am running nutch 0.7.1 with JDK1.5 update 6 on Ubuntu 5.10. Anywhere I came
across the same problem under my apple Mac too.
  Below are the content of the log of the crawler, it shows that the crawler
returrns 0 entry.
  Thanks in advance.


051227 212142 parsing file:/opt/nutch-0.7.1/conf/nutch-default.xml
051227 212143 parsing file:/opt/nutch-0.7.1/conf/crawl-tool.xml
051227 212143 parsing file:/opt/nutch-0.7.1/conf/nutch-site.xml
051227 212143 No FS indicated, using default:local
051227 212143 crawl started in: crawl.test
051227 212143 rootUrlFile = urls
051227 212143 threads = 10
051227 212143 depth = 3
...

...

..051227 212143 *Added 0 pages*
051227 212143 FetchListTool started
051227 212144 *Overall processing: Sorted 0 entries in 0.0 seconds.
*051227 212144 Overall processing: Sorted NaN entries/second
051227 212144 FetchListTool completed
051227 212144 logging at INFO
051227 212145 Updating /opt/nutch-0.7.1/crawl.test/db
051227 212145 Updating for /opt/nutch-0.7.1
/crawl.test/segments/20051227212143
051227 212145 Finishing update
051227 212145 Update finished
051227 212145 FetchListTool started
*051227 212145 Overall processing: Sorted 0 entries in 0.0 seconds.*
051227 212145 Overall processing: Sorted NaN entries/second
051227 212145 FetchListTool completed
051227 212145 logging at INFO
051227 212146 Updating /opt/nutch-0.7.1/crawl.test/db
051227 212146 Updating for /opt/nutch-0.7.1
/crawl.test/segments/20051227212145
051227 212146 Finishing update
051227 212146 Update finished
051227 212146 FetchListTool started
051227 212146 Overall processing: Sorted 0 entries in 0.0 seconds.
051227 212146 Overall processing: Sorted NaN entries/second
051227 212146 FetchListTool completed
051227 212146 logging at INFO
051227 212147 Updating /opt/nutch-0.7.1/crawl.test/db
051227 212147 Updating for /opt/nutch-0.7.1
/crawl.test/segments/20051227212146
051227 212147 Finishing update
051227 212147 Update finished
051227 212147 Updating /opt/nutch-0.7.1/crawl.test/segments from /opt/nutch-
0.7.1/crawl.test/db
051227 212147  reading /opt/nutch-0.7.1/crawl.test/segments/20051227212143
051227 212148  reading /opt/nutch-0.7.1/crawl.test/segments/20051227212145
051227 212148  reading /opt/nutch-0.7.1/crawl.test/segments/20051227212146
051227 212148 Sorting pages by url...
051227 212148 Getting updated scores and anchors from db...
051227 212148 Sorting updates by segment...
051227 212148 Updating segments...
051227 212148 Done updating /opt/nutch-0.7.1/crawl.test/segments from
/opt/nutch-0.7.1/crawl.test/db
051227 212148 indexing segment: /opt/nutch-0.7.1
/crawl.test/segments/20051227212143
051227 212148 * Opening segment 20051227212143
051227 212148 * Indexing segment 20051227212143
051227 212148 * Optimizing index...
051227 212148 * Moving index to NFS if needed...
051227 212148 DONE indexing segment 20051227212143: total 0 records in
0.026s (NaN rec/s).
051227 212148 done indexing
051227 212148 indexing segment: /opt/nutch-0.7.1
/crawl.test/segments/20051227212145
051227 212148 * Opening segment 20051227212145
051227 212148 * Indexing segment 20051227212145
051227 212148 * Optimizing index...
051227 212148 * Moving index to NFS if needed...
051227 212148 DONE indexing segment 20051227212145: total 0 records in
0.075s (NaN rec/s).
051227 212148 done indexing
051227 212148 indexing segment: /opt/nutch-0.7.1
/crawl.test/segments/20051227212146
051227 212148 * Opening segment 20051227212146
051227 212148 * Indexing segment 20051227212146
051227 212148 * Optimizing index...
051227 212148 * Moving index to NFS if needed...
*051227 212148 DONE indexing segment 20051227212146: total 0 records in
0.011 s (NaN rec/s).
*051227 212148 done indexing
051227 212148 Reading url hashes...
051227 212148 Sorting url hashes...
051227 212148 Deleting url duplicates...
051227 212148 Deleted 0 url duplicates.
051227 212148 Reading content hashes...
051227 212148 Sorting content hashes...
051227 212148 Deleting content duplicates...
051227 212148 Deleted 0 content duplicates.
051227 212148 Duplicate deletion complete locally.  Now returning to NFS...
051227 212148 DeleteDuplicates complete
051227 212148 Merging segment indexes...
051227 212148 crawl finished: crawl.test

Rgds
Chih-How Bong