You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by info <in...@radionav.it> on 2006/07/18 12:31:56 UTC

Tutorial for Hadoop and Nutch nigtly build

Hello List,
I have an a question ... I try the nutch v 0.8 with hadoop file system ,
i have some problem with the sequence of operation to do to add some new
url to crawl at the database.
The sequence is :
inject new urlist
generate the fetch list
updatedb
fetch the segment
invertlink 
gerate new index
dedup the index

The problem is in that at the end of the process i have on my file
system three index file ... one index the other indexes the other one
index1
if i try to generate an index with the same name of another as indexes
the command return an error because tell me that the file already
exist .

At the end of the process the problem is that if i search a new url that
i have add i don't found any information in the dbase ...

There're some People that can help me by an real  example ?  

I have another problem I try to add new node ... and I try to follow the
hadoop tutorial but the problem is that in the ssh connection the server
ask me again the password .... I put in the .ssh directory the
AutorizedKey as explain in the tutorial someone can help me ?


How is possible to understand if the hadoop node work properly i try the
command bin/hadoop dfs -report and i see that the statistic use of slave
node is 0

Best Regards 
Roberto Navoni