You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/07/23 20:16:23 UTC

Multithreads vs UnknownHostException

Hi,

I have an application where I'm trying to read about 30 URLs at a time
from a 5000 URLs' list.

I have implemented 30 threads to retrieve the content.

I'm initialysing the HttpClient that way:

SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http", 80,
PlainSocketFactory.getSocketFactory()));
schemeRegistry.register(new Scheme("https", 443,
SSLSocketFactory.getSocketFactory()));
PoolingClientConnectionManager cm = new
PoolingClientConnectionManager(schemeRegistry);
cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20);
	
HttpParams params = new BasicHttpParams();
client = new DefaultHttpClient(cm, params);
client.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, new
Integer(45000));
client.getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT,
new Integer(45000));
client.getParams().setParameter(CoreConnectionPNames.TCP_NODELAY, false);


However, when I'm running my threads and retrieving the content, I'm
often getting an UnknownHostException on an host I know it exists.

Like, if I have 500 URLs from this host, some of them will be retrieve
correctly and some will throw an UnknownHostException.

I'm wondering where it might be coming from.

Each thread is creating an HTTPGet method and is invoking it using the
client created above (only one instance of the client for the entire
application).

Initialy, I thought this was because of the DNS. But since it's the
same host that is sometime working, that mean it should be on the
cache now.

Should it be better for me to create one client per thread?

Thanks for your comments.

JM

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Multithreads vs UnknownHostException

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sam,

Thanks a lot for your reply.

I have also suspected the DNS to be the root cause of this issue so I
moved to OpenDNS servers but I'm still getting the errors. I have
never suspected that my router might be one of the reason why it's
failing.

So I will try all your suggestions. I will first buy a new router.
Mine is 5 or 6 years old and I was already thinking of replacing it
anyway. I will also try google servers, and probably the
InMemoryDNSResolver since it will allow me do retry the DNS requests
if the first one is failing... For the DNS cache I thought I had one
but based on the way the application is responding, I'm suspecting it
to not beeing used.

So I will try all of that a provide a feedback here in case someone is
facing same kind of issue in the futur.

JM

2012/7/24, Sam Crawford <sa...@gmail.com>:
> UnknownHostException is certainly related to DNS, so I would focus
> investigations there initially. It sounds like the DNS request is
> timing out, and this is not uncommon if your DNS server is a cheap
> home router (I have found that some of these do not handle concurrency
> very well). For example, these are the kinds of things I'd explore:
>
> 1) Are you definitely running a caching nameserver? Is it on your
> local machine? If not, you could explore this.
> 2) Have you tried a different DNS server? e.g. Google's at 8.8.8.8 or
> 8.8.4.4
> 3) Are the requests all for the same hostname(s)? If so, you could
> consider performing the lookups at the start, storing them in an
> InMemoryDnsResolver
> (http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/conn/InMemoryDnsResolver.html)
> and then never rely upon external DNS lookups after that.
>
> Hope this helps,
>
> Sam
>
>
> On 23 July 2012 19:16, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:
>> Hi,
>>
>> I have an application where I'm trying to read about 30 URLs at a time
>> from a 5000 URLs' list.
>>
>> I have implemented 30 threads to retrieve the content.
>>
>> I'm initialysing the HttpClient that way:
>>
>> SchemeRegistry schemeRegistry = new SchemeRegistry();
>> schemeRegistry.register(new Scheme("http", 80,
>> PlainSocketFactory.getSocketFactory()));
>> schemeRegistry.register(new Scheme("https", 443,
>> SSLSocketFactory.getSocketFactory()));
>> PoolingClientConnectionManager cm = new
>> PoolingClientConnectionManager(schemeRegistry);
>> cm.setMaxTotal(200);
>> cm.setDefaultMaxPerRoute(20);
>>
>> HttpParams params = new BasicHttpParams();
>> client = new DefaultHttpClient(cm, params);
>> client.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, new
>> Integer(45000));
>> client.getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT,
>> new Integer(45000));
>> client.getParams().setParameter(CoreConnectionPNames.TCP_NODELAY, false);
>>
>>
>> However, when I'm running my threads and retrieving the content, I'm
>> often getting an UnknownHostException on an host I know it exists.
>>
>> Like, if I have 500 URLs from this host, some of them will be retrieve
>> correctly and some will throw an UnknownHostException.
>>
>> I'm wondering where it might be coming from.
>>
>> Each thread is creating an HTTPGet method and is invoking it using the
>> client created above (only one instance of the client for the entire
>> application).
>>
>> Initialy, I thought this was because of the DNS. But since it's the
>> same host that is sometime working, that mean it should be on the
>> cache now.
>>
>> Should it be better for me to create one client per thread?
>>
>> Thanks for your comments.
>>
>> JM
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Multithreads vs UnknownHostException

Posted by Sam Crawford <sa...@gmail.com>.
UnknownHostException is certainly related to DNS, so I would focus
investigations there initially. It sounds like the DNS request is
timing out, and this is not uncommon if your DNS server is a cheap
home router (I have found that some of these do not handle concurrency
very well). For example, these are the kinds of things I'd explore:

1) Are you definitely running a caching nameserver? Is it on your
local machine? If not, you could explore this.
2) Have you tried a different DNS server? e.g. Google's at 8.8.8.8 or 8.8.4.4
3) Are the requests all for the same hostname(s)? If so, you could
consider performing the lookups at the start, storing them in an
InMemoryDnsResolver
(http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/conn/InMemoryDnsResolver.html)
and then never rely upon external DNS lookups after that.

Hope this helps,

Sam


On 23 July 2012 19:16, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:
> Hi,
>
> I have an application where I'm trying to read about 30 URLs at a time
> from a 5000 URLs' list.
>
> I have implemented 30 threads to retrieve the content.
>
> I'm initialysing the HttpClient that way:
>
> SchemeRegistry schemeRegistry = new SchemeRegistry();
> schemeRegistry.register(new Scheme("http", 80,
> PlainSocketFactory.getSocketFactory()));
> schemeRegistry.register(new Scheme("https", 443,
> SSLSocketFactory.getSocketFactory()));
> PoolingClientConnectionManager cm = new
> PoolingClientConnectionManager(schemeRegistry);
> cm.setMaxTotal(200);
> cm.setDefaultMaxPerRoute(20);
>
> HttpParams params = new BasicHttpParams();
> client = new DefaultHttpClient(cm, params);
> client.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, new
> Integer(45000));
> client.getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT,
> new Integer(45000));
> client.getParams().setParameter(CoreConnectionPNames.TCP_NODELAY, false);
>
>
> However, when I'm running my threads and retrieving the content, I'm
> often getting an UnknownHostException on an host I know it exists.
>
> Like, if I have 500 URLs from this host, some of them will be retrieve
> correctly and some will throw an UnknownHostException.
>
> I'm wondering where it might be coming from.
>
> Each thread is creating an HTTPGet method and is invoking it using the
> client created above (only one instance of the client for the entire
> application).
>
> Initialy, I thought this was because of the DNS. But since it's the
> same host that is sometime working, that mean it should be on the
> cache now.
>
> Should it be better for me to create one client per thread?
>
> Thanks for your comments.
>
> JM
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org