You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Henrich Martin <ma...@googlemail.com> on 2011/01/07 13:14:05 UTC
Empty linkdb
Hello,
using 'cygwin' running 'crawl' command as in
nutch-1.2/bin/nutch crawl seed/urls -dir c1 -depth 3 -threads 1 >& c1.log
everything works as expected. In particular the 'linkdb' is created and
populated correctly.
The 'hadoop' logs read:
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: starting at 2011-01-07
11:51:55
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: linkdb: c4/linkdb
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: URL normalize: true
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: URL filter: true
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c4/segments/20110107114838*
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c4/segments/20110107114949*
2011-01-07 11:51:55,129 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c4/segments/20110107115101*
2011-01-07 11:51:55,144 WARN mapred.JobClient - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2011-01-07 11:52:12,270 INFO crawl.LinkDb - LinkDb: finished at 2011-01-07
11:52:12, elapsed: 00:00:17
On the contrary using 'cygwin' running 'invertlinks' as in
nutch-1.2/bin/nutch invertlinks c1/linkdb -dir c1/segments
over the same or any other input segments the resulting 'linkdb' is created
correctly but remains empty.
Then the 'hadoop' logs read:
2011-01-07 11:45:37,126 INFO crawl.LinkDb - LinkDb: starting at 2011-01-07
11:45:37
2011-01-07 11:45:37,126 INFO crawl.LinkDb - LinkDb: linkdb: c1/linkdb6
2011-01-07 11:45:37,126 INFO crawl.LinkDb - LinkDb: URL normalize: true
2011-01-07 11:45:37,126 INFO crawl.LinkDb - LinkDb: URL filter: true
2011-01-07 11:45:37,142 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c1/segments/20110106153349*
2011-01-07 11:45:37,142 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c1/segments/20110106153544*
2011-01-07 11:45:37,142 INFO crawl.LinkDb - LinkDb: adding segment: *
file:/D:/mynutch/c1/segments/20110106154120*
2011-01-07 11:45:53,314 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2011-01-07 11:45:54,236 INFO crawl.LinkDb - LinkDb: finished at 2011-01-07
11:45:54, elapsed: 00:00:17
Notice the difference in the 'WARN' message. Some path issue i suspect. Any
ideas?
Thx