You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ranger.apache.org by "Alok Lal (JIRA)" <ji...@apache.org> on 2015/02/23 23:50:12 UTC

[jira] [Commented] (RANGER-265) If Hive repository's connection is setup incorrectly then it can make policy manager unresponsive.

    [ https://issues.apache.org/jira/browse/RANGER-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334032#comment-14334032 ] 

Alok Lal commented on RANGER-265:
---------------------------------

[~rmani] noted that this fix protects not only attempts to get new connections but also code that uses the connection to talk to the subject service.  For example, if a rogue service hangs while responding to a lookup query that won't lead to connection stacking in the policy manager.

On that note, though, one item to review are the current timeout values.  Current values 5 or 10 seconds seem like an eternity.  A more reasonable timeout value should be a few hundred mili-seconds.  We should note that for overwhelming number of deployments ranger protected services probably reside in a single data center.  Even for cross colo calls few hundred mili-seconds should sufice.

Thoughts? 

> If Hive repository's connection is setup incorrectly then it can make policy manager unresponsive.
> --------------------------------------------------------------------------------------------------
>
>                 Key: RANGER-265
>                 URL: https://issues.apache.org/jira/browse/RANGER-265
>             Project: Ranger
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Alok Lal
>            Assignee: Alok Lal
>             Fix For: 0.4.0, 0.5.0
>
>         Attachments: 0001-RANGER-265-Changed-TimedEventUtil-so-that-it-would-i.patch
>
>
> [Reporting on behalf of Abhay Kulkarni]
> If the connection of a hive repository is setup incorrectly in such a manner that connect call gets blocked, i.e. does not error out then it quickly leads to a buildup of connect threads inside policy manager.  And eventually it stops responding to both the service plugins and to user requests.
> The cause of the problem seems to be the incomplete implementation of the deferred/timed execution class.  While the problem was reported on Hive it could affect any component.  This problem could be fixed at several levels:
> - Server could timeout the connection call and then interrupt that thread.
> - Server code that opens client connections to various services could in turn change how the critical sections are structured so that this problem does not happen and/or connection attempts don't block if someone else is trying to get a connection, etc.
> - For not a fix that could have widest impact should be considered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)