You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Olive g <ol...@hotmail.com> on 2006/04/06 15:39:43 UTC

Re: please help!! inverlinks not work properly with more than 5 input parts (0.8

Hi Andrzej,

Thank you for your reply. Could you please confirm whether invertlinks 
supports more than
5 parts?

According to the tutorial (http://wiki.apache.org/nutch/NutchTutorial):

"Step-by-Step: Indexing

Before indexing we first invert all of the links, so that we may index 
incoming anchor text with the pages.

bin/nutch invertlinks crawl/linkdb crawl/segments  "


This step failed when we had more than 5 parts. Has anyone successfully 
excuted invertlinks with more than 5 parts on 0.8? I would appreciate any 
confirmation. All I wanted to find out is whether there is a possible bug in 
invertlinks, or something else. It all comes down to invertlinks command.

Thank you again for your help.

Olive


>From: Andrzej Bialecki <ab...@getopt.org>
>Reply-To: nutch-user@lucene.apache.org
>To: nutch-user@lucene.apache.org
>Subject: Re: please help!! inverlinks  not work properly with more than 5 
>input parts (0.8)
>Date: Thu, 06 Apr 2006 14:43:29 +0200
>
>Olive g wrote:
>>Hi gurus,
>>
>>I posted questions on how to do incremental crawls on 0.8 a few days ago 
>>and thank you all for your help. However, when I tried to workaround (see 
>>http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04111.html), 
>>inverlinks crashed when there were more than 5 input parts.
>>
>
>You should understand very clearly that what you are doing is NOT supported 
>and very non-standard. It might (or might not) have worked as a one time 
>workaround to get you out of trouble.
>
>Nutch DOES support incremental crawling and indexing, and the way it does 
>is described in the tutorial (http://wiki.apache.org/nutch/NutchTutorial). 
>Please follow the tutorial where it says about "Step-by-Step or Whole-web 
>Crawling" - you will save yourself (and us) a lot of grief.
>
>--
>Best regards,
>Andrzej Bialecki     <><
>___. ___ ___ ___ _ _   __________________________________
>[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>___|||__||  \|  ||  |  Embedded Unix, System Integration
>http://www.sigram.com  Contact: info at sigram dot com
>
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


RE: please help!! It always return 0 hit.

Posted by lin yuan <li...@msn.com>.
Ok,I will update and rebuild the code and try it again.


>From: "Dennis Kubes" <nu...@dragonflymc.com>
>Reply-To: nutch-user@lucene.apache.org
>To: <nu...@lucene.apache.org>
>Subject: RE: please help!! It always return 0 hit.
>Date: Fri, 7 Apr 2006 12:47:25 -0500
>
>Copying from Hadooop to local and then performing a search on the index is 
a
>question that needs to be posted to the list. My guess would be that you
>have an older version of the code and there were some bugs copying crc
>files.  I think I remember something about that on the list a little while
>back.  So you might want to update and rebuild you code base.
>
>If you want to do a crawl and search without using hadoop follow the nutch
>0.8 tutorial on the website (not wiki) for a regular crawl.  You would 
also
>want to set fs.default.name to local and comment out the rest of the
>hadoop-site.xml file options.  Also make sure to set the nutch-site.xml 
file
>in the WEB-INF/classes directory to the absolute path of the crawl 
directory
>as below.
>
>Dennis
>
><?xml version="1.0"?>
><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
><configuration>
>   <property>
>     <name>fs.default.name</name>
>     <value>local</value>
>   </property>
>   <property>
>     <name>searcher.dir</name>
>     <value>C:\TESTBED\NUTCH\CRAWLED</value>
>   </property>
></configuration>
>
>-----Original Message-----
>From: lin yuan [mailto:lin_yuan@msn.com]
>Sent: Friday, April 07, 2006 4:33 AM
>To: nutch-user@lucene.apache.org
>Subject: please help!! It always return 0 hit.
>
>Hi Denis ,
>  According to your tutorial
>(http://wiki.apache.org/nutch/NutchHadoopTutorial):
>I have setup Nutch and Hadoop,so far so good.But when I performing a 
search,
>It always return 0 hit.
>   So I want to do a search without hadoop, and used the command followed:
>     bin/hadoop dfs -copyToLocal crawled crawled
>
>  It seems that there is somthing wrong.would you give me some tips to 
debug
>it? I use the nutch 0.8 392087 revision.
>  The output said:
>
>060407 172334 parsing
>jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa
>                             ult.xml
>060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
>060407 172334 No FS indicated, using default:boxA:9000
>060407 172334 Client connection to 127.0.0.1:9000: starting
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      000/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00000/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      001/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00001/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      002/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00002/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      003/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00003/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      004/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00004/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      005/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00005/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      006/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00006/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      007/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00007/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      008/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00008/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      009/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00009/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      010/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00010/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>060407 172335 Problem opening checksum file:
>/user/nutch/crawled/indexes/part-00
>      011/index.done.  Ignoring with exception java.rmi.RemoteException:
>java.io.IOExc                                              eption: Cannot
>open filename /user/nutch/crawled/indexes/part-00011/.index.done.
>                                    crc
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>                                        sorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>.
>
>
>
>
>Best regards,
>    Lin Yuan
>
>_________________________________________________________________
>与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn
>
>

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn  


RE: please help!! It always return 0 hit.

Posted by Dennis Kubes <nu...@dragonflymc.com>.
Copying from Hadooop to local and then performing a search on the index is a
question that needs to be posted to the list. My guess would be that you
have an older version of the code and there were some bugs copying crc
files.  I think I remember something about that on the list a little while
back.  So you might want to update and rebuild you code base.

If you want to do a crawl and search without using hadoop follow the nutch
0.8 tutorial on the website (not wiki) for a regular crawl.  You would also
want to set fs.default.name to local and comment out the rest of the
hadoop-site.xml file options.  Also make sure to set the nutch-site.xml file
in the WEB-INF/classes directory to the absolute path of the crawl directory
as below.

Dennis

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>local</value>
  </property>
  <property>
    <name>searcher.dir</name>
    <value>C:\TESTBED\NUTCH\CRAWLED</value>
  </property>
</configuration>

-----Original Message-----
From: lin yuan [mailto:lin_yuan@msn.com] 
Sent: Friday, April 07, 2006 4:33 AM
To: nutch-user@lucene.apache.org
Subject: please help!! It always return 0 hit.

Hi Denis ,
 According to your tutorial
(http://wiki.apache.org/nutch/NutchHadoopTutorial):
I have setup Nutch and Hadoop,so far so good.But when I performing a search,
It always return 0 hit.
  So I want to do a search without hadoop, and used the command followed: 
    bin/hadoop dfs -copyToLocal crawled crawled

 It seems that there is somthing wrong.would you give me some tips to debug
it? I use the nutch 0.8 392087 revision.
 The output said:

060407 172334 parsing 
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa                  
                            ult.xml
060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
060407 172334 No FS indicated, using default:boxA:9000
060407 172334 Client connection to 127.0.0.1:9000: starting
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     000/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00000/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     001/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00001/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     002/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00002/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     003/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00003/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     004/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00004/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     005/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00005/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     006/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00006/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     007/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00007/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     008/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00008/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     009/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00009/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     010/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00010/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     011/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00011/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.




Best regards,
   Lin Yuan

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn  



please help!! It always return 0 hit.

Posted by lin yuan <li...@msn.com>.
Hi Denis ,
 According to your tutorial 
(http://wiki.apache.org/nutch/NutchHadoopTutorial):
I have setup Nutch and Hadoop,so far so good.But when I performing a 
search,It always return 0 hit.
  So I want to do a search without hadoop, and used the command followed: 
    bin/hadoop dfs -copyToLocal crawled crawled

 It seems that there is somthing wrong.would you give me some tips to debug 
it? I use the nutch 0.8 392087 revision.
 The output said:

060407 172334 parsing 
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-defa                  
                            ult.xml
060407 172334 parsing file:/nutch/search/conf/hadoop-site.xml
060407 172334 No FS indicated, using default:boxA:9000
060407 172334 Client connection to 127.0.0.1:9000: starting
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     000/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00000/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     001/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00001/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     002/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00002/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     003/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00003/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     004/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00004/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     005/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00005/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     006/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00006/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     007/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00007/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     008/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00008/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     009/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00009/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     010/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00010/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.
060407 172335 Problem opening checksum file: 
/user/nutch/crawled/indexes/part-00                                         
     011/index.done.  Ignoring with exception java.rmi.RemoteException: 
java.io.IOExc                                              eption: Cannot 
open filename /user/nutch/crawled/indexes/part-00011/.index.done.           
                                   crc
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces       
                                       sorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
.




Best regards,
   Lin Yuan

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn