You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by hadi <md...@gmail.com> on 2012/03/04 13:19:09 UTC
java.net.UnknownHostException during fetching
I have one link with many external link inside it,when the fetching process
start many external link failed with: java.net.UnknownHostException, i use
nutch 1.4 and i set the below setting in nutch-site.xml, is this any
misconfiguration?
.
.
.
<property>
<name>parser.timeout</name>
<value>30</value>
</property>
<property>
<name>db.fetch.interval.default</name>
<value>36000</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>false</value>
</property>
<property>
<name>http.timeout</name>
<value>30000</value>
</property>
<property>
<name>db.max.outlinks.per.page</name>
<value>-1</value>
</property>
<property>
<name>db.fetch.interval.max</name>
<value>7776000</value>
</property>
<property>
<name>fetcher.threads.fetch</name>
<value>10</value>
</property>
.
.
.
--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3797938.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: java.net.UnknownHostException during fetching
Posted by hadi <md...@gmail.com>.
Yes I can Download them with wget but i can not ping them, is it
depend on my DNS ? how can i solve that?
thanks
On 3/4/12, remi tassing [via Lucene]
<ml...@n3.nabble.com> wrote:
>
>
> So you can actually ping those servers or use wget or curl to download them?
>
> On Sun, Mar 4, 2012 at 7:49 PM, hadi <md...@gmail.com> wrote:
>
>> But my links are not dead or does not need proxy,they are all open in
>> browser
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798727.html
>
> To unsubscribe from java.net.UnknownHostException during fetching, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3797938&code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNzk3OTM4fC02NDQ5ODMwMjM=
--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3799979.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: java.net.UnknownHostException during fetching
Posted by remi tassing <ta...@gmail.com>.
So you can actually ping those servers or use wget or curl to download them?
On Sun, Mar 4, 2012 at 7:49 PM, hadi <md...@gmail.com> wrote:
> But my links are not dead or does not need proxy,they are all open in
> browser
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
Re: java.net.UnknownHostException during fetching
Posted by hadi <md...@gmail.com>.
But my links are not dead or does not need proxy,they are all open in browser
--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: java.net.UnknownHostException during fetching
Posted by remi tassing <ta...@gmail.com>.
I had that same error for "dead" URLs or those that needed proxies to get
access to
Remi
On Sun, Mar 4, 2012 at 1:19 PM, hadi <md...@gmail.com> wrote:
> I have one link with many external link inside it,when the fetching process
> start many external link failed with: java.net.UnknownHostException, i use
> nutch 1.4 and i set the below setting in nutch-site.xml, is this any
> misconfiguration?
>
>
> .
> .
> .
> <property>
> <name>parser.timeout</name>
> <value>30</value>
> </property>
> <property>
> <name>db.fetch.interval.default</name>
> <value>36000</value>
> </property>
> <property>
> <name>db.ignore.external.links</name>
> <value>false</value>
> </property>
> <property>
> <name>http.timeout</name>
> <value>30000</value>
> </property>
> <property>
> <name>db.max.outlinks.per.page</name>
> <value>-1</value>
> </property>
> <property>
> <name>db.fetch.interval.max</name>
> <value>7776000</value>
> </property>
> <property>
> <name>fetcher.threads.fetch</name>
> <value>10</value>
> </property>
>
> .
> .
> .
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3797938.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>