You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Shadi Saleh <pr...@gmail.com> on 2015/01/08 15:48:56 UTC

New URLS

Dear all,
I am trying to fetch new urls, I added new url to seed file and then I
executed the following:

./crawl urls/seed.txt index -depth 1

./nutch inject index/crawldb urls/seed.txt

./nutch generate index/crawldb index/segments

s1=`ls -d index/segments/2* | tail -1`

./nutch fetch $s1

./nutch updatedb index/crawldb $s1

./nutch generate index/crawldb index/segments -topN 1000

s2=`ls -d index/segments/2* | tail -1`

./nutch fetch $s2

./nutch updatedb index/crawldb $s2

./nutch generate index/crawldb index/segments -topN 1000

s3=`ls -d index/segments/2* | tail -1`

./nutch fetch $s3

./nutch updatedb index/crawldb $s3

./nutch invertlinks index/linkdb -dir index/segments


But always, no more urls to fetch.


Any idea please?


Best





-- 




*Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
of Mathematics and Physics*
*-Charles University in Prague*

*16017 Prague 6 - Czech Republic Mob +420773515578*