You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Josef Roehrl - PHEMI <jr...@phemi.com> on 2015/11/13 11:25:06 UTC

Quick question re UnknownHostException

Hi All,

3 times in the past few weeks (twice on 1 system, once on another), the
master gets UnknownHostException (s), one by one, for each of the tablet
servers.  Then, it wants to stop them. Eventually, all the tablet servers
quit.

It goes like this for all the tablet servers:

12 08:14:01,0498tserver:620
ERROR

error sending update to tserver3:9997:
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException

12 09:01:53,0352master:12
ERROR

org.apache.thrift.transport.TTransportException: java.net.UnknownHostException

12 16:35:50,0672master:110
ERROR

unable to get tablet server status tserver3:9997[250e6cd2c500012]
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException



I've redacted the real host names, of course.

This could be a DNS problem, though the system was running fine for days
before this happened (same scenario on the 2 systems with really quite
different DNS servers).

If any one has a hint or seen something like this, I would appreciate any
pointers.

I have looked at the JIRA issues regarding DNS outages, but nothing seems
to fit this pattern.

Thanks

-- 


Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website <http://www.phemi.com/> Twitter <https://twitter.com/PHEMISystems>
Linkedin
<http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>

Re: Quick question re UnknownHostException

Posted by Josef Roehrl - PHEMI <jr...@phemi.com>.
Never expect the unexpected :8)

On 15-11-13 08:44 AM, Josh Elser wrote:
> :) never think to expect a DNS issue until you have a DNS issue
>
> Josef Roehrl - PHEMI wrote:
>> Hi Everyone,
>>
>> Turns out that it was a DNS server issue exactly.  Had to get this
>> confirmed by the Data Centre, though.
>>
>> Thanks!
>>
>> On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI
>> <jroehrl@phemi.com <ma...@phemi.com>> wrote:
>>
>>     Hi All,
>>
>>     3 times in the past few weeks (twice on 1 system, once on another),
>>     the master gets UnknownHostException (s), one by one, for each of
>>     the tablet servers.  Then, it wants to stop them. Eventually, all
>>     the tablet servers quit.
>>
>>     It goes like this for all the tablet servers:
>>
>>     12 08:14:01,0498    tserver:6    20
>>     ERROR
>>
>>
>>     error sending update to tserver3:9997: 
>> org.apache.thrift.transport.TTransportException: 
>> java.net.UnknownHostException
>>
>>     12 09:01:53,0352    master:1    2
>>     ERROR
>>
>>
>>     org.apache.thrift.transport.TTransportException: 
>> java.net.UnknownHostException
>>
>>     12 16:35:50,0672    master:1    10
>>     ERROR
>>
>>
>>     unable to get tablet server status tserver3:9997[250e6cd2c500012] 
>> org.apache.thrift.transport.TTransportException: 
>> java.net.UnknownHostException
>>
>>
>>
>>     I've redacted the real host names, of course.
>>
>>     This could be a DNS problem, though the system was running fine for
>>     days before this happened (same scenario on the 2 systems with
>>     really quite different DNS servers).
>>
>>     If any one has a hint or seen something like this, I would
>>     appreciate any pointers.
>>
>>     I have looked at the JIRA issues regarding DNS outages, but nothing
>>     seems to fit this pattern.
>>
>>     Thanks
>>
>>     --
>>>
>>>     Josef Roehrl
>>>     Senior Software Developer
>>>     *PHEMI Systems*
>>>     180-887 Great Northern Way
>>>     Vancouver, BC V5T 4T5
>>>     604-336-1119
>>>     Website <http://www.phemi.com/> Twitter
>>>     <https://twitter.com/PHEMISystems> Linkedin
>>> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>>
>>
>>
>>
>>
>> -- 
>>>
>>> Josef Roehrl
>>> Senior Software Developer
>>> *PHEMI Systems*
>>> 180-887 Great Northern Way
>>> Vancouver, BC V5T 4T5
>>> 604-336-1119
>>> Website <http://www.phemi.com/> Twitter
>>> <https://twitter.com/PHEMISystems> Linkedin
>>> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1> 
>>>
>>

-- 
>
> Josef Roehrl
> Senior Software Developer
> *PHEMI Systems*
> 180-887 Great Northern Way
> Vancouver, BC V5T 4T5
> 604-336-1119 <tel:604-336-1119>
> Website <http://www.phemi.com/> Twitter 
> <https://twitter.com/PHEMISystems> Linkedin 
> <http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1> 
>

Re: Quick question re UnknownHostException

Posted by Josh Elser <jo...@gmail.com>.
:) never think to expect a DNS issue until you have a DNS issue

Josef Roehrl - PHEMI wrote:
> Hi Everyone,
>
> Turns out that it was a DNS server issue exactly.  Had to get this
> confirmed by the Data Centre, though.
>
> Thanks!
>
> On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI
> <jroehrl@phemi.com <ma...@phemi.com>> wrote:
>
>     Hi All,
>
>     3 times in the past few weeks (twice on 1 system, once on another),
>     the master gets UnknownHostException (s), one by one, for each of
>     the tablet servers.  Then, it wants to stop them. Eventually, all
>     the tablet servers quit.
>
>     It goes like this for all the tablet servers:
>
>     12 08:14:01,0498	tserver:6	20	
>     ERROR
>     	
>
>     error sending update to tserver3:9997: org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
>     12 09:01:53,0352	master:1	2	
>     ERROR
>     	
>
>     org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
>     12 16:35:50,0672	master:1	10	
>     ERROR
>     	
>
>     unable to get tablet server status tserver3:9997[250e6cd2c500012] org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
>
>
>     I've redacted the real host names, of course.
>
>     This could be a DNS problem, though the system was running fine for
>     days before this happened (same scenario on the 2 systems with
>     really quite different DNS servers).
>
>     If any one has a hint or seen something like this, I would
>     appreciate any pointers.
>
>     I have looked at the JIRA issues regarding DNS outages, but nothing
>     seems to fit this pattern.
>
>     Thanks
>
>     --
>>
>>     Josef Roehrl
>>     Senior Software Developer
>>     *PHEMI Systems*
>>     180-887 Great Northern Way
>>     Vancouver, BC V5T 4T5
>>     604-336-1119
>>     Website <http://www.phemi.com/> Twitter
>>     <https://twitter.com/PHEMISystems> Linkedin
>>     <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>
>
>
>
>
> --
>>
>> Josef Roehrl
>> Senior Software Developer
>> *PHEMI Systems*
>> 180-887 Great Northern Way
>> Vancouver, BC V5T 4T5
>> 604-336-1119
>> Website <http://www.phemi.com/> Twitter
>> <https://twitter.com/PHEMISystems> Linkedin
>> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>

Re: Quick question re UnknownHostException

Posted by Adam Fuchs <sc...@gmail.com>.
Josef,

If these are intermittent failures, you might consider turning on the
watcher [1] to automatically restart your processes. This should keep your
cluster from atrophying over time. You'll still have to take administrative
action to fix the DNS problem, but your availability should be better.

Cheers,
Adam

[1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#watcher

On Fri, Nov 13, 2015 at 6:57 AM, Josef Roehrl - PHEMI <jr...@phemi.com>
wrote:

> Hi Everyone,
>
> Turns out that it was a DNS server issue exactly.  Had to get this
> confirmed by the Data Centre, though.
>
> Thanks!
>
> On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI <jr...@phemi.com>
> wrote:
>
>> Hi All,
>>
>> 3 times in the past few weeks (twice on 1 system, once on another), the
>> master gets UnknownHostException (s), one by one, for each of the tablet
>> servers.  Then, it wants to stop them. Eventually, all the tablet servers
>> quit.
>>
>> It goes like this for all the tablet servers:
>>
>> 12 08:14:01,0498tserver:620
>> ERROR
>>
>> error sending update to tserver3:9997: org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>>
>> 12 09:01:53,0352master:12
>> ERROR
>>
>> org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>>
>> 12 16:35:50,0672master:110
>> ERROR
>>
>> unable to get tablet server status tserver3:9997[250e6cd2c500012] org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>>
>>
>>
>> I've redacted the real host names, of course.
>>
>> This could be a DNS problem, though the system was running fine for days
>> before this happened (same scenario on the 2 systems with really quite
>> different DNS servers).
>>
>> If any one has a hint or seen something like this, I would appreciate any
>> pointers.
>>
>> I have looked at the JIRA issues regarding DNS outages, but nothing seems
>> to fit this pattern.
>>
>> Thanks
>>
>> --
>>
>>
>> Josef Roehrl
>> Senior Software Developer
>> *PHEMI Systems*
>> 180-887 Great Northern Way
>> Vancouver, BC V5T 4T5
>> 604-336-1119
>> Website <http://www.phemi.com/> Twitter
>> <https://twitter.com/PHEMISystems> Linkedin
>> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>>
>>
>>
>
>
> --
>
>
> Josef Roehrl
> Senior Software Developer
> *PHEMI Systems*
> 180-887 Great Northern Way
> Vancouver, BC V5T 4T5
> 604-336-1119
> Website <http://www.phemi.com/> Twitter <https://twitter.com/PHEMISystems>
>  Linkedin
> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>
>
>

Re: Quick question re UnknownHostException

Posted by Josef Roehrl - PHEMI <jr...@phemi.com>.
Hi Everyone,

Turns out that it was a DNS server issue exactly.  Had to get this
confirmed by the Data Centre, though.

Thanks!

On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI <jr...@phemi.com>
wrote:

> Hi All,
>
> 3 times in the past few weeks (twice on 1 system, once on another), the
> master gets UnknownHostException (s), one by one, for each of the tablet
> servers.  Then, it wants to stop them. Eventually, all the tablet servers
> quit.
>
> It goes like this for all the tablet servers:
>
> 12 08:14:01,0498tserver:620
> ERROR
>
> error sending update to tserver3:9997: org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
> 12 09:01:53,0352master:12
> ERROR
>
> org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
> 12 16:35:50,0672master:110
> ERROR
>
> unable to get tablet server status tserver3:9997[250e6cd2c500012] org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
>
>
>
> I've redacted the real host names, of course.
>
> This could be a DNS problem, though the system was running fine for days
> before this happened (same scenario on the 2 systems with really quite
> different DNS servers).
>
> If any one has a hint or seen something like this, I would appreciate any
> pointers.
>
> I have looked at the JIRA issues regarding DNS outages, but nothing seems
> to fit this pattern.
>
> Thanks
>
> --
>
>
> Josef Roehrl
> Senior Software Developer
> *PHEMI Systems*
> 180-887 Great Northern Way
> Vancouver, BC V5T 4T5
> 604-336-1119
> Website <http://www.phemi.com/> Twitter <https://twitter.com/PHEMISystems>
>  Linkedin
> <http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
>
>
>


-- 


Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website <http://www.phemi.com/> Twitter <https://twitter.com/PHEMISystems>
Linkedin
<http://www.linkedin.com/company/3561810?trk=tyah&amp;trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>