You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Mike Drob (JIRA)" <ji...@apache.org> on 2014/04/22 19:20:15 UTC

[jira] [Updated] (ACCUMULO-2716) Duplicate connection loss logging in Writer

     [ https://issues.apache.org/jira/browse/ACCUMULO-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Drob updated ACCUMULO-2716:
--------------------------------

    Description: 
Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.

| WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
| ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |

These always occur in pairs, at the same millisecond, and coming from the same tserver. I _think_ that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.

The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
{code}
    } catch (TTransportException e) {
      log.warn("Error connecting to " + server + ": " + e);
      throw e;
    }
{code}

and then later log again in {{update()}}:
{code}
      } catch (TException e) {
        log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
        TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
      }
{code}



  was:
Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.

| WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
| ERROR | error sending update to a2422.halxg.cloudera.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |

These always occur in pairs, at the same millisecond, and coming from the same tserver. I _think_ that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.

The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
{code}
    } catch (TTransportException e) {
      log.warn("Error connecting to " + server + ": " + e);
      throw e;
    }
{code}

and then later log again in {{update()}}:
{code}
      } catch (TException e) {
        log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
        TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
      }
{code}




> Duplicate connection loss logging in Writer
> -------------------------------------------
>
>                 Key: ACCUMULO-2716
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2716
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>              Labels: logging
>
> Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.
> | WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
> | ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
> These always occur in pairs, at the same millisecond, and coming from the same tserver. I _think_ that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.
> The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
> {code}
>     } catch (TTransportException e) {
>       log.warn("Error connecting to " + server + ": " + e);
>       throw e;
>     }
> {code}
> and then later log again in {{update()}}:
> {code}
>       } catch (TException e) {
>         log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
>         TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)