You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@sentry.apache.org by "Vadim Spector (JIRA)" <ji...@apache.org> on 2017/07/25 20:52:00 UTC
[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

     [ https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vadim Spector updated SENTRY-1866:
----------------------------------
    Summary: Add ping Thrift APIs for Sentry services  (was: Add ping Thrift interface to the existing Sentry clients)

> Add ping Thrift APIs for Sentry services
> ----------------------------------------
>
>                 Key: SENTRY-1866
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1866
>             Project: Sentry
>          Issue Type: Improvement
>            Reporter: Vadim Spector
>
> Sentry HA-specific: when the Sentry client fails over from one sentry server to the other, it does not print a message that it has done so. Have such a client print a simple, clear INFO level message when the client fails over form one Sentry server to another.
> Design considerations:
> "Sentry client" stands for a specific class instance capable of connecting to a specific Sentry server instance from some app (usually another Hadoop service). In HA scenario, Sentry client relies on connection pooling (SentryTransportPool class) to select one of several available configured Sentry server instances. Whenever connection fails, Sentry client simply asks SentryTransportPool to a) invalidate this specific connection and b) get another connection instead. There is no monitoring of Sentry server liveliness per se. Each Sentry client finds out about a failure independently and only at the time of trying to use it. Thus there may be no particular correlation between the time of the discovery of connection failure and the time Sentry server actually becomes unavailable. E.g. a client can discover a failure of the old connection, long after Sentry server crushed and then was restarted (and maybe restarted more than once!).
> Intuitively, one would like yto have a single log per Sentry server crush/shutdown; but due to the explanations above, it seems difficult, if not impossible, to group the connections by instance(s) of Sentry server when these connections were initiated. Therefore, it may be challenging to say whether multiple connection failures have to do with "the same" Sentry server instance going down. Therefore, it is difficult to report exactly one connection failure per one Sentry server shutdown/crush event.
> Yet, the desire to have visibility into such events in the field is understandable. At the same time, if we simply log every connection failure, such logging can be massive - there may be many concurrent connections to Sentry server(s) from the same app. Such logging would be less than useful.
> The solution is required to use some less than perfect rules, by which the number of connection failure logs can be contained. The alternative solution of introducing periodic pinging of Sentry server and only logging pinging failures would be possible as well (and it would be awesome if Sentry server responded to pings with the server-id initialized as the server start time stamp - this would totally solve the problem), but requires more radical changes.
> The simplest solution seems to be as follows: since the recovery of the failed Sentry serve is likely to take some time, we do not need to be too clever; it may just be enough to report each connection failure to a given Sentry instance no more often than once every N (configurable value) seconds. If one connection failure to Sentry server instance X has been reported, another one won't be reported before N seconds expire. This will keep the number of connection failure messages at bay. Such logs may still be confusing, if a client attempts to use some old connection from the old server instance after some idle period, and after the problem has long been fixed, but this is arguably still better than nothing.
> Alternative suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)