You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by hadi <md...@gmail.com> on 2012/03/04 13:19:09 UTC

java.net.UnknownHostException during fetching

I have one link with many external link inside it,when the fetching process
start many external link failed with: java.net.UnknownHostException, i use
nutch 1.4 and i set the below setting in nutch-site.xml, is this any
misconfiguration?


.
.
.
   <property>
        <name>parser.timeout</name>
        <value>30</value>       
    </property>
    <property>
        <name>db.fetch.interval.default</name>
        <value>36000</value>       
    </property>
    <property>
        <name>db.ignore.external.links</name>
        <value>false</value>        
    </property>
    <property>
        <name>http.timeout</name>
        <value>30000</value>       
    </property>
    <property>
        <name>db.max.outlinks.per.page</name>
        <value>-1</value>
    </property>
    <property>
        <name>db.fetch.interval.max</name>
        <value>7776000</value>        
    </property>
   <property>
        <name>fetcher.threads.fetch</name>
        <value>10</value>  
   </property>
   
.
.
.




--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3797938.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.net.UnknownHostException during fetching

Posted by hadi <md...@gmail.com>.
Yes I can Download them with wget but i can not ping them, is it
depend on my DNS ? how can i solve that?
thanks

On 3/4/12, remi tassing [via Lucene]
<ml...@n3.nabble.com> wrote:
>
>
> So you can actually ping those servers or use wget or curl to download them?
>
> On Sun, Mar 4, 2012 at 7:49 PM, hadi <md...@gmail.com> wrote:
>
>> But my links are not dead or does not need proxy,they are all open in
>> browser
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798727.html
>
> To unsubscribe from java.net.UnknownHostException during fetching, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3797938&code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNzk3OTM4fC02NDQ5ODMwMjM=


--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3799979.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.net.UnknownHostException during fetching

Posted by remi tassing <ta...@gmail.com>.
So you can actually ping those servers or use wget or curl to download them?

On Sun, Mar 4, 2012 at 7:49 PM, hadi <md...@gmail.com> wrote:

> But my links are not dead or does not need proxy,they are all open in
> browser
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: java.net.UnknownHostException during fetching

Posted by hadi <md...@gmail.com>.
But my links are not dead or does not need proxy,they are all open in browser


--
View this message in context: http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3798693.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.net.UnknownHostException during fetching

Posted by remi tassing <ta...@gmail.com>.
I had that same error for "dead" URLs or those that needed proxies to get
access to

Remi

On Sun, Mar 4, 2012 at 1:19 PM, hadi <md...@gmail.com> wrote:

> I have one link with many external link inside it,when the fetching process
> start many external link failed with: java.net.UnknownHostException, i use
> nutch 1.4 and i set the below setting in nutch-site.xml, is this any
> misconfiguration?
>
>
> .
> .
> .
>   <property>
>        <name>parser.timeout</name>
>        <value>30</value>
>    </property>
>    <property>
>        <name>db.fetch.interval.default</name>
>        <value>36000</value>
>    </property>
>    <property>
>        <name>db.ignore.external.links</name>
>        <value>false</value>
>    </property>
>    <property>
>        <name>http.timeout</name>
>        <value>30000</value>
>    </property>
>    <property>
>        <name>db.max.outlinks.per.page</name>
>        <value>-1</value>
>    </property>
>    <property>
>        <name>db.fetch.interval.max</name>
>        <value>7776000</value>
>    </property>
>   <property>
>        <name>fetcher.threads.fetch</name>
>        <value>10</value>
>   </property>
>
> .
> .
> .
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/java-net-UnknownHostException-during-fetching-tp3797938p3797938.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>