You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Thalatam, Venkata naveen" <ve...@bankofamerica.com> on 2014/12/10 12:54:41 UTC

Not able to crawl a website using Nutch

Hello,

Can someone assist me with the below error

I am trying to crawl a website within organization and unsuccessful using nutch

[cid:image002.png@01D0149E.2CC39D70]



Best Regards,

Venkata Naveen Thalatam (Naveen)[Description: Description: cid:image002.png@01CEBCAF.48F83D90]<si...@bankofamerica.com>


----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary.  If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender.  Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses.  Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read.   By messaging with Sender you consent to the foregoing.

Re: Not able to crawl a website using Nutch

Posted by feng lu <am...@gmail.com>.
Hi Thalatam

You can check this tutorial to get how to use Nutch command line interface.

http://wiki.apache.org/nutch/NutchTutorial

bin/nutch crawl was deprecated, you can use bin/crawl command instead.



On Wed, Dec 10, 2014 at 7:56 PM, Thalatam, Venkata naveen <
venkata.naveen.thalatam@bankofamerica.com> wrote:

>
>
> *From:* Thalatam, Venkata naveen
> *Sent:* Wednesday, December 10, 2014 5:25 PM
> *To:* dev@nutch.apache.org
> *Subject:* Not able to crawl a website using Nutch
> *Importance:* High
>
>
>
> Hello,
>
>
>
> Can someone assist me with the below error
>
>
>
> I am trying to crawl a website within organization and unsuccessful using
> nutch
>
>
>
>
>
>
>
>
>
>
> Best Regards,
>
>
>
> *Venkata Naveen Thalatam (Naveen)*[image: Description: Description:
> cid:image002.png@01CEBCAF.48F83D90]
>
>
>  ------------------------------
>
> This message w/attachments (message) is intended solely for the use of the
> intended recipient(s) and may contain information that is privileged,
> confidential or proprietary. If you are not an intended recipient, please
> notify the sender, and then please delete and destroy all copies and
> attachments, and be advised that any review or dissemination of, or the
> taking of any action in reliance on, the information contained in or
> attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or a
> solicitation of any investment products or other financial product or
> service, an official confirmation of any transaction, or an official
> statement of Sender. Subject to applicable law, Sender may intercept,
> monitor, review and retain e-communications (EC) traveling through its
> networks/systems and may produce any such EC to regulators, law
> enforcement, in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the handling
> of EC, and EC may be archived, supervised and produced in countries other
> than the country in which you are located. This message cannot be
> guaranteed to be secure or free of errors or viruses. Attachments that are
> part of this EC may have additional important disclosures and disclaimers,
> which you should read. By messaging with Sender you consent to the
> foregoing.
>  ------------------------------
> This message w/attachments (message) is intended solely for the use of the
> intended recipient(s) and may contain information that is privileged,
> confidential or proprietary. If you are not an intended recipient, please
> notify the sender, and then please delete and destroy all copies and
> attachments, and be advised that any review or dissemination of, or the
> taking of any action in reliance on, the information contained in or
> attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or a
> solicitation of any investment products or other financial product or
> service, an official confirmation of any transaction, or an official
> statement of Sender. Subject to applicable law, Sender may intercept,
> monitor, review and retain e-communications (EC) traveling through its
> networks/systems and may produce any such EC to regulators, law
> enforcement, in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the handling
> of EC, and EC may be archived, supervised and produced in countries other
> than the country in which you are located. This message cannot be
> guaranteed to be secure or free of errors or viruses. Attachments that are
> part of this EC may have additional important disclosures and disclaimers,
> which you should read. By messaging with Sender you consent to the
> foregoing.
>



-- 
Don't Grow Old, Grow Up... :-)

RE: Not able to crawl a website using Nutch

Posted by "Thalatam, Venkata naveen" <ve...@bankofamerica.com>.
[cid:image004.png@01D0149E.63B21D70]

From: Thalatam, Venkata naveen
Sent: Wednesday, December 10, 2014 5:25 PM
To: dev@nutch.apache.org
Subject: Not able to crawl a website using Nutch
Importance: High

Hello,

Can someone assist me with the below error

I am trying to crawl a website within organization and unsuccessful using nutch

[cid:image005.png@01D0149E.63B21D70]



Best Regards,

Venkata Naveen Thalatam (Naveen)[Description: Description: cid:image002.png@01CEBCAF.48F83D90]<si...@bankofamerica.com>

________________________________
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited.
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law.
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. By messaging with Sender you consent to the foregoing.

----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary.  If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender.  Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses.  Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read.   By messaging with Sender you consent to the foregoing.