You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jyoti aditya <jy...@gmail.com> on 2016/12/09 11:31:51 UTC

proxy setting in nutch

Hi team,

As i wanted to crawl some website. I am using paid proxy to hit that
specific website.
I wanted to know how we can configure nutch, so that it will crawl using my
proxy id.

I have around 1000 proxy ip and it user name and password. So wanted to
know how we can configure nutch so that it will use my proxy in round robin
fashion?

Also in nutch-default.xml i tried setting this property

property>
  <name>http.proxy.host</name>
  <value>12.34..56.789</value>
  <description>The proxy hostname.  If empty, no proxy is
used.</description>
</property>

<property>
  <name>http.proxy.port</name>
  <value>1234</value>
  <description>The proxy port.</description>
</property>

<property>
  <name>http.proxy.username</name>
  <value>qwer</value>
  <description>Username for proxy. This will be used by
  'protocol-httpclient', if the proxy server requests basic, digest
  and/or NTLM authentication. To use this, 'protocol-httpclient' must
  be present in the value of 'plugin.includes' property.
  NOTE: For NTLM authentication, do not prefix the username with the
  domain, i.e. 'susam' is correct whereas 'DOMAIN\susam' is incorrect.
  </description>
</property>.

But in hadoop log i found it throwing SSL exception.

Please help me out in fixing this issue.


With Regards
Jyoti Aditya