You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Shadi Saleh <pr...@gmail.com> on 2015/01/08 15:48:56 UTC
New URLS
Dear all,
I am trying to fetch new urls, I added new url to seed file and then I
executed the following:
./crawl urls/seed.txt index -depth 1
./nutch inject index/crawldb urls/seed.txt
./nutch generate index/crawldb index/segments
s1=`ls -d index/segments/2* | tail -1`
./nutch fetch $s1
./nutch updatedb index/crawldb $s1
./nutch generate index/crawldb index/segments -topN 1000
s2=`ls -d index/segments/2* | tail -1`
./nutch fetch $s2
./nutch updatedb index/crawldb $s2
./nutch generate index/crawldb index/segments -topN 1000
s3=`ls -d index/segments/2* | tail -1`
./nutch fetch $s3
./nutch updatedb index/crawldb $s3
./nutch invertlinks index/linkdb -dir index/segments
But always, no more urls to fetch.
Any idea please?
Best
--
*Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
of Mathematics and Physics*
*-Charles University in Prague*
*16017 Prague 6 - Czech Republic Mob +420773515578*