You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Puneet Dhanda <pp...@gmail.com> on 2018/08/15 13:03:23 UTC
bin/crawl not working
Hi,
I am using Nutch-1.15. The following command does not execute, it
keeps complaining about it's Usage.
bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/
TestCrawl/ 2
Usage: crawl [options] <crawl_dir> <num_rounds>
Please assist.
Re: bin/crawl not working
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi,
please also note that the way the index writer plugins are configured has changed with 1.15,
see release notes and https://wiki.apache.org/nutch/bin/nutch%20index.
The Solr URL cannot be passed anymore via -Dsolr.server.url=...
I'll update the bin/crawl wiki page.
Thanks,
Sebastian
On 08/15/2018 03:24 PM, Sadiki Latty wrote:
> Hi Puneet,
>
> To my recollection bin/crawl takes 3 arguments
>
> Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
>
> In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this
>
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/ 2
>
> Where "urls" is the path to your seed urls
>
> Reference: https://wiki.apache.org/nutch/bin/crawl
>
> Hope this helps
>
> -----Original Message-----
> From: Puneet Dhanda [mailto:ppunet@gmail.com]
> Sent: August-15-18 9:03 AM
> To: user@nutch.apache.org
> Subject: bin/crawl not working
>
> Hi,
>
> I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.
>
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/ 2
>
> Usage: crawl [options] <crawl_dir> <num_rounds>
>
>
> Please assist.
>
RE: bin/crawl not working
Posted by Sadiki Latty <sl...@uottawa.ca>.
Hi Puneet,
To my recollection bin/crawl takes 3 arguments
Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this
bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/ 2
Where "urls" is the path to your seed urls
Reference: https://wiki.apache.org/nutch/bin/crawl
Hope this helps
-----Original Message-----
From: Puneet Dhanda [mailto:ppunet@gmail.com]
Sent: August-15-18 9:03 AM
To: user@nutch.apache.org
Subject: bin/crawl not working
Hi,
I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.
bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/ 2
Usage: crawl [options] <crawl_dir> <num_rounds>
Please assist.