You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Ji <fj...@yahoo.com> on 2006/03/27 03:09:19 UTC

fetching https pages

hi there:

Does the following lines in nutch-site.xml will let
nutch to fetch https page down?

"protocol-(http|https)"

I tried that but gives me error message of 

"
failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol
not found for url=https
"

Any idea how to fix it?

thanks,

Michael




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: fetching https pages

Posted by Andrzej Bialecki <ab...@getopt.org>.
Michael Ji wrote:
> hi there:
>
> Does the following lines in nutch-site.xml will let
> nutch to fetch https page down?
>
> "protocol-(http|https)"
>   

No. There is no plugin named "protocol-https". In order to handle HTTPS 
you need to use the "protocol-httpclient" plugin, which handles both 
HTTP and HTTPS - and then you should remove "protocol-http" from your 
config.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: fetching https pages

Posted by kauu <ba...@gmail.com>.
i think u need a protocol to parse the https
so u need to change this in ur nutch-site.xml if u hava the
protocol-https plugin


<name>plugin.includes</name>
  <value>nutch-extensionpoints|protocol-http|protocol-https
|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>

<description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins.
  </description>
</property>

On 3/27/06, Michael Ji <fj...@yahoo.com> wrote:
>
> hi there:
>
> Does the following lines in nutch-site.xml will let
> nutch to fetch https page down?
>
> "protocol-(http|https)"
>
> I tried that but gives me error message of
>
> "
> failed with:
> org.apache.nutch.protocol.ProtocolNotFound: protocol
> not found for url=https
> "
>
> Any idea how to fix it?
>
> thanks,
>
> Michael
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



--
www.babatu.com