You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Olive g <ol...@hotmail.com> on 2006/04/06 15:39:43 UTC
Re: please help!! inverlinks not work properly with more than 5 input parts (0.8
Hi Andrzej,
Thank you for your reply. Could you please confirm whether invertlinks
supports more than
5 parts?
According to the tutorial (http://wiki.apache.org/nutch/NutchTutorial):
"Step-by-Step: Indexing
Before indexing we first invert all of the links, so that we may index
incoming anchor text with the pages.
bin/nutch invertlinks crawl/linkdb crawl/segments "
This step failed when we had more than 5 parts. Has anyone successfully
excuted invertlinks with more than 5 parts on 0.8? I would appreciate any
confirmation. All I wanted to find out is whether there is a possible bug in
invertlinks, or something else. It all comes down to invertlinks command.
Thank you again for your help.
Olive
>From: Andrzej Bialecki <ab...@getopt.org>
>Reply-To: nutch-user@lucene.apache.org
>To: nutch-user@lucene.apache.org
>Subject: Re: please help!! inverlinks not work properly with more than 5
>input parts (0.8)
>Date: Thu, 06 Apr 2006 14:43:29 +0200
>
>Olive g wrote:
>>Hi gurus,
>>
>>I posted questions on how to do incremental crawls on 0.8 a few days ago
>>and thank you all for your help. However, when I tried to workaround (see
>>http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04111.html),
>>inverlinks crashed when there were more than 5 input parts.
>>
>
>You should understand very clearly that what you are doing is NOT supported
>and very non-standard. It might (or might not) have worked as a one time
>workaround to get you out of trouble.
>
>Nutch DOES support incremental crawling and indexing, and the way it does
>is described in the tutorial (http://wiki.apache.org/nutch/NutchTutorial).
>Please follow the tutorial where it says about "Step-by-Step or Whole-web
>Crawling" - you will save yourself (and us) a lot of grief.
>
>--
>Best regards,
>Andrzej Bialecki <><
>___. ___ ___ ___ _ _ __________________________________
>[__ || __|__/|__||\/| Information Retrieval, Semantic Web
>___|||__|| \| || | Embedded Unix, System Integration
>http://www.sigram.com Contact: info at sigram dot com
>
>
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
RE: please help!! It always return 0 hit.
Posted by lin yuan <li...@msn.com>.
Ok,I will update and rebuild the code and try it again.
>From: "Dennis Kubes" <nu...@dragonflymc.com>
>Reply-To: nutch-user@lucene.apache.org
>To: <nu...@lucene.apache.org>
>Subject: RE: please help!! It always return 0 hit.
>Date: Fri, 7 Apr 2006 12:47:25 -0500
>
>Copying from Hadooop to local and then performing a search on the index is
a
>question that needs to be posted to the list. My guess would be that you
>have an older version of the code and there were some bugs copying crc
>files. I think I remember something about that on the list a little while
>back. So you might want to update and rebuild you code base.
>
>If you want to do a crawl and search without using hadoop follow the nutch
>0.8 tutorial on the website (not wiki) for a regular crawl. You would
also
>want to set fs.default.name to local and comment out the rest of the
>hadoop-site.xml file options. Also make sure to set the nutch-site.xml
file
>in the WEB-INF/classes directory to the absolute path of the crawl
directory
>as below.
>
>Dennis
>
><?xml version="1.0"?>
><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
><configuration>
> <property>
> <name>fs.default.name</name>
> <value>local</value>
> </property>
> <property>
> <name>searcher.dir</name>
> <value>C:\TESTBED\NUTCH\CRAWLED</value>
> </property>
></configuration>
>
>-----Original Message-----
>From: lin yuan [mailto:lin_yuan@msn.com]
>Sent: Friday, April 07, 2006 4:33 AM
>To: nutch-user@lucene.apache.org
>Subject: please help!! It always return 0 hit.
>
>Hi Denis ,
> According to your tutorial
>(http://wiki.apache.org/nutch/NutchHadoopTutorial):
>I have setup Nutch and Hadoop,so far so good.But when I performing a
search,
>It always return 0 hit.
> So I want to do a search without hadoop, and used the command followed:
> bin/hadoop dfs -copyToLocal crawled crawled
>
> It seems that there is somthing wrong.would you give me some tips to
debug
>it? I use the nutch 0.8 392087 revision.
> The output said:
>
>060407 172334 parsing
>jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa
> ult.xml
>060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
>060407 172334 No FS indicated, using default:boxA:9000
>060407 172334 Client connection to 127.0.0.1:9000: starting
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 000/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00000/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 001/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00001/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 002/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00002/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 003/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00003/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 004/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00004/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 005/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00005/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 006/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00006/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 007/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00007/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 008/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00008/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 009/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00009/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 010/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00010/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
> 011/index.done. Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00011/.index.done.
> crc
> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>
>
>
>
>Best regards,
> Lin Yuan
>
>_________________________________________________________________
>与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn
>
>
_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn
RE: please help!! It always return 0 hit.
Posted by Dennis Kubes <nu...@dragonflymc.com>.
Copying from Hadooop to local and then performing a search on the index is a
question that needs to be posted to the list. My guess would be that you
have an older version of the code and there were some bugs copying crc
files. I think I remember something about that on the list a little while
back. So you might want to update and rebuild you code base.
If you want to do a crawl and search without using hadoop follow the nutch
0.8 tutorial on the website (not wiki) for a regular crawl. You would also
want to set fs.default.name to local and comment out the rest of the
hadoop-site.xml file options. Also make sure to set the nutch-site.xml file
in the WEB-INF/classes directory to the absolute path of the crawl directory
as below.
Dennis
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>local</value>
</property>
<property>
<name>searcher.dir</name>
<value>C:\TESTBED\NUTCH\CRAWLED</value>
</property>
</configuration>
-----Original Message-----
From: lin yuan [mailto:lin_yuan@msn.com]
Sent: Friday, April 07, 2006 4:33 AM
To: nutch-user@lucene.apache.org
Subject: please help!! It always return 0 hit.
Hi Denis ,
According to your tutorial
(http://wiki.apache.org/nutch/NutchHadoopTutorial):
I have setup Nutch and Hadoop,so far so good.But when I performing a search,
It always return 0 hit.
So I want to do a search without hadoop, and used the command followed:
bin/hadoop dfs -copyToLocal crawled crawled
It seems that there is somthing wrong.would you give me some tips to debug
it? I use the nutch 0.8 392087 revision.
The output said:
060407 172334 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa
ult.xml
060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
060407 172334 No FS indicated, using default:boxA:9000
060407 172334 Client connection to 127.0.0.1:9000: starting
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
000/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00000/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
001/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00001/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
002/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00002/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
003/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00003/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
004/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00004/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
005/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00005/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
006/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00006/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
007/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00007/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
008/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00008/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
009/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00009/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
010/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00010/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
011/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00011/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
Best regards,
Lin Yuan
_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn
please help!! It always return 0 hit.
Posted by lin yuan <li...@msn.com>.
Hi Denis ,
According to your tutorial
(http://wiki.apache.org/nutch/NutchHadoopTutorial):
I have setup Nutch and Hadoop,so far so good.But when I performing a
search,It always return 0 hit.
So I want to do a search without hadoop, and used the command followed:
bin/hadoop dfs -copyToLocal crawled crawled
It seems that there is somthing wrong.would you give me some tips to debug
it? I use the nutch 0.8 392087 revision.
The output said:
060407 172334 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa
ult.xml
060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
060407 172334 No FS indicated, using default:boxA:9000
060407 172334 Client connection to 127.0.0.1:9000: starting
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
000/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00000/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
001/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00001/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
002/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00002/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
003/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00003/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
004/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00004/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
005/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00005/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
006/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00006/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
007/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00007/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
008/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00008/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
009/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00009/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
010/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00010/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file:
/user/nutch/crawled/indexes/part-00
011/index.done. Ignoring with exception java.rmi.RemoteException:
java.io.IOExc eption: Cannot
open filename /user/nutch/crawled/indexes/part-00011/.index.done.
crc
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
Best regards,
Lin Yuan
_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn