You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Nutch <nu...@proservice.ge> on 2006/02/22 14:10:43 UTC

Re[2]: AW: nutch-0.8 crawl problem

Здравствуйте, Gal.

Вы писали 22 февраля 2006 г., 14:11:37:

> :) a bit misleading....

> first: Hadoop is the evolution from "Nutch Distributed File System".

> It is based on google's file system. It enable one to keep all data in a
> distributed file system which is very suitable to Nutch.

> When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls

> now to create the seeds:

> create the urls.txt file in a folder called seeds i.e. seeds/urls.txt

> bin/hadoop dfs -put seeds seeds
> this will copy the seeds folder into hadoop file system

> and now

> bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log

> Happy crawling.

> Gal.


> On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote:
>> matt
>> 
>> as the tutorial stated ..
>> 
>> bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
>> 
>> the urls is in .txt right? i created it and put inside c:/nutch-0.7.1
>> 
>> Stephanie
>> 
>>               
>> ---------------------------------
>>  Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new and used cars.




> __________ NOD32 1.1415 (20060221) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com


Thanks a lot!!!
I'll try it
One thing else.. Do I have to download and compile hadoop sources?


-- 
С уважением,
 Nutch                          mailto:nuther@proservice.ge