You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Ji <fj...@yahoo.com> on 2006/03/27 03:09:19 UTC
fetching https pages
hi there:
Does the following lines in nutch-site.xml will let
nutch to fetch https page down?
"protocol-(http|https)"
I tried that but gives me error message of
"
failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol
not found for url=https
"
Any idea how to fix it?
thanks,
Michael
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: fetching https pages
Posted by Andrzej Bialecki <ab...@getopt.org>.
Michael Ji wrote:
> hi there:
>
> Does the following lines in nutch-site.xml will let
> nutch to fetch https page down?
>
> "protocol-(http|https)"
>
No. There is no plugin named "protocol-https". In order to handle HTTPS
you need to use the "protocol-httpclient" plugin, which handles both
HTTP and HTTPS - and then you should remove "protocol-http" from your
config.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: fetching https pages
Posted by kauu <ba...@gmail.com>.
i think u need a protocol to parse the https
so u need to change this in ur nutch-site.xml if u hava the
protocol-https plugin
<name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|protocol-https
|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
On 3/27/06, Michael Ji <fj...@yahoo.com> wrote:
>
> hi there:
>
> Does the following lines in nutch-site.xml will let
> nutch to fetch https page down?
>
> "protocol-(http|https)"
>
> I tried that but gives me error message of
>
> "
> failed with:
> org.apache.nutch.protocol.ProtocolNotFound: protocol
> not found for url=https
> "
>
> Any idea how to fix it?
>
> thanks,
>
> Michael
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
--
www.babatu.com