You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Puneet Dhanda <pp...@gmail.com> on 2018/08/15 13:03:23 UTC

bin/crawl not working

Hi,

I am using Nutch-1.15. The following command does not execute, it
keeps complaining about it's Usage.

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/
TestCrawl/  2

Usage: crawl [options] <crawl_dir> <num_rounds>


Please assist.

Re: bin/crawl not working

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi,

please also note that the way the index writer plugins are configured has changed with 1.15,
see release notes and https://wiki.apache.org/nutch/bin/nutch%20index.

The Solr URL cannot be passed anymore via -Dsolr.server.url=...
I'll update the bin/crawl wiki page.

Thanks,
Sebastian

On 08/15/2018 03:24 PM, Sadiki Latty wrote:
> Hi Puneet,
> 
> To my recollection bin/crawl takes 3 arguments 
> 
> Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
> 
> In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this
> 
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/  2
> 
> Where "urls" is the path to your seed urls
> 
> Reference: https://wiki.apache.org/nutch/bin/crawl
> 
> Hope this helps
> 
> -----Original Message-----
> From: Puneet Dhanda [mailto:ppunet@gmail.com] 
> Sent: August-15-18 9:03 AM
> To: user@nutch.apache.org
> Subject: bin/crawl not working
> 
> Hi,
> 
> I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.
> 
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2
> 
> Usage: crawl [options] <crawl_dir> <num_rounds>
> 
> 
> Please assist.
> 


RE: bin/crawl not working

Posted by Sadiki Latty <sl...@uottawa.ca>.
Hi Puneet,

To my recollection bin/crawl takes 3 arguments 

Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>

In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/  2

Where "urls" is the path to your seed urls

Reference: https://wiki.apache.org/nutch/bin/crawl

Hope this helps

-----Original Message-----
From: Puneet Dhanda [mailto:ppunet@gmail.com] 
Sent: August-15-18 9:03 AM
To: user@nutch.apache.org
Subject: bin/crawl not working

Hi,

I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2

Usage: crawl [options] <crawl_dir> <num_rounds>


Please assist.