You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Diego Bonesso <di...@gmail.com> on 2013/10/14 22:09:56 UTC
Only in domain / authentication
Hello, I have two questions? I'm using nutch 2.2. I put two urls in
seed.txt . In dir /conf in nutch-site.xml, I create a property
db.ignore.external.links with value true. First question my job should stay
only in two urls domains? In the second url I have to authenticate , how i
can configure this? The url auth is something like
http://www.domain.com/login. Thanks a lot.
Re: Only in domain / authentication
Posted by Diego Bonesso <di...@gmail.com>.
Hello,I configured seed.txt with http://example.com.br site. This site has
a authentication session in http://example.com.br/login. I created a rule
in httpclient-auth.xml as follow:
<auth -configuration>
<credentials username="user" password="1111">
<authscope host="186.xxx.161.xxx" port="80" realm="login"/>
</credentials>
</auth -configuration>
First how can I ensure that nutch used authentication?
Second how I can fetch all site?
Thanks!!!
On Tue, Oct 15, 2013 at 1:23 AM, Talat UYARER <ta...@agmlab.com>wrote:
> Hi Diego,
> First Question:
> db.ignore.external.links property is correct for staying in domain.
>
> Second Question:
> If you need authentication, I should use protocol-htttpclient instead of
> protocol-http. You should changes plugins.include and you should add
>
> <property>
> <name>http.auth.file</name>
> <value>httpclient-auth.xml</**value>
> <description></description>
> </property>
>
> property in your nutch-site.xml. httpclient-auth.xml is your auth
> configuration file. You can add your auth configuration. You can see some
> example in this file's comment lines.
>
> Talat
>
>
> 14-10-2013 23:09 tarihinde, Diego Bonesso yazdı:
>
> Hello, I have two questions? I'm using nutch 2.2. I put two urls in
>> seed.txt . In dir /conf in nutch-site.xml, I create a property
>> db.ignore.external.links with value true. First question my job should
>> stay
>> only in two urls domains? In the second url I have to authenticate , how i
>> can configure this? The url auth is something like
>> http://www.domain.com/login. Thanks a lot.
>>
>>
>
Re: Only in domain / authentication
Posted by Talat UYARER <ta...@agmlab.com>.
Hi Diego,
First Question:
db.ignore.external.links property is correct for staying in domain.
Second Question:
If you need authentication, I should use protocol-htttpclient instead of
protocol-http. You should changes plugins.include and you should add
<property>
<name>http.auth.file</name>
<value>httpclient-auth.xml</value>
<description></description>
</property>
property in your nutch-site.xml. httpclient-auth.xml is your auth
configuration file. You can add your auth configuration. You can see
some example in this file's comment lines.
Talat
14-10-2013 23:09 tarihinde, Diego Bonesso yazdı:
> Hello, I have two questions? I'm using nutch 2.2. I put two urls in
> seed.txt . In dir /conf in nutch-site.xml, I create a property
> db.ignore.external.links with value true. First question my job should stay
> only in two urls domains? In the second url I have to authenticate , how i
> can configure this? The url auth is something like
> http://www.domain.com/login. Thanks a lot.
>