You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nutch <nu...@proservice.ge> on 2006/02/22 14:10:43 UTC
Re[2]: AW: nutch-0.8 crawl problem
Здравствуйте, Gal.
Вы писали 22 февраля 2006 г., 14:11:37:
> :) a bit misleading....
> first: Hadoop is the evolution from "Nutch Distributed File System".
> It is based on google's file system. It enable one to keep all data in a
> distributed file system which is very suitable to Nutch.
> When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls
> now to create the seeds:
> create the urls.txt file in a folder called seeds i.e. seeds/urls.txt
> bin/hadoop dfs -put seeds seeds
> this will copy the seeds folder into hadoop file system
> and now
> bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log
> Happy crawling.
> Gal.
> On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote:
>> matt
>>
>> as the tutorial stated ..
>>
>> bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
>>
>> the urls is in .txt right? i created it and put inside c:/nutch-0.7.1
>>
>> Stephanie
>>
>>
>> ---------------------------------
>> Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new and used cars.
> __________ NOD32 1.1415 (20060221) Information __________
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
Thanks a lot!!!
I'll try it
One thing else.. Do I have to download and compile hadoop sources?
--
С уважением,
Nutch mailto:nuther@proservice.ge