You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Sachin Shaju <sa...@mstack.com> on 2016/09/29 13:16:46 UTC

Custom options in nutch crawl script

I was trying to give custom options in *bin/crawl* script and encountered
an issue. I gave a custom config in nutch to ignore external outlinks in my
crawl command like :-

*bin/crawl -i -D elastic.index=test -D db.ignore.external.links=true urls/
CrawlTest/ 3*

But this is not working. Then I set this property in nutch-site.xml then it
is working.

Then I tried to set a custom config to index data to a specific elastic
index other than what is given in nutch-site.xml as java option in
bin/crawl. To my surprise it is working.
The command I've used :-

*bin/crawl -i -D elastic.index=test urls/ CrawlTest/ 3*

So I would like to know why my first command didn't work ?Am I missing
anything. Please help.

Regards,
Sachin Shaju

sachin.s@mstack.com
+919539887554

-- 
 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you should not disseminate, distribute or copy this 
e-mail. Please notify the sender immediately and destroy all copies of this 
message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.

www.mStack.com