You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ranger.apache.org by "Abhay Kulkarni (Jira)" <ji...@apache.org> on 2023/03/06 23:58:00 UTC

[jira] [Commented] (RANGER-3987) Potential risk of OOM

    [ https://issues.apache.org/jira/browse/RANGER-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697180#comment-17697180 ] 

Abhay Kulkarni commented on RANGER-3987:
----------------------------------------

A potential solution could be to execute  doCreateOrUpdateXXPluginInfo() in its own transaction (instead of queuing it up to be executed after the original transaction is completed) if the following condition is true:

httpCode == HttpServletResponse.SC_NOT_MODIFIED.

 

> Potential risk of OOM
> ---------------------
>
>                 Key: RANGER-3987
>                 URL: https://issues.apache.org/jira/browse/RANGER-3987
>             Project: Ranger
>          Issue Type: Bug
>          Components: admin
>    Affects Versions: 2.2.0
>            Reporter: KyrieG
>            Assignee: KyrieG
>            Priority: Critical
>
> During every policy-loading process of other components, the attribute "LastActivationTimeInMillis" is always set to System.currentTimeMillis(). See loadPolicy(): 
> {code:java}
> // from PolicyRefresher.java loadPolicy()
> //load policy from PolicyAdmin
> ServicePolicies svcPolicies = loadPolicyfromPolicyAdmin();
> if (svcPolicies == null) {
>    //if Policy fetch from Policy Admin Fails, load from cache
>    if (!policiesSetInPlugin) {
>       svcPolicies = loadFromCache();
>    }
> }
> if (PERF_POLICYENGINE_INIT_LOG.isDebugEnabled()) {
>    long freeMemory = Runtime.getRuntime().freeMemory();
>    long totalMemory = Runtime.getRuntime().totalMemory();
>    PERF_POLICYENGINE_INIT_LOG.debug("In-Use memory: " + (totalMemory - freeMemory) + ", Free memory:" + freeMemory);
> }
> if (svcPolicies != null) {
>    plugIn.setPolicies(svcPolicies);
>    policiesSetInPlugin = true;
>    serviceDefSetInPlugin = false;
>    setLastActivationTimeInMillis(System.currentTimeMillis()); // always updated during each policy loading
>    lastKnownVersion = svcPolicies.getPolicyVersion() != null ? svcPolicies.getPolicyVersion() : -1L;
> } else {
>    if (!policiesSetInPlugin && !serviceDefSetInPlugin) {
>       plugIn.setPolicies(null);
>       serviceDefSetInPlugin = true;
>    }
> } {code}
> In this case, the column "info" from table "x_plugin_info" would always need to be updated since it is a json string containing activationTime. See doCreateOrUpdateXXPluginInfo(): 
> {code:java}
> // from AssetMgr, doCreateOrUpdateXXPluginInfo().
> if (lastPolicyActivationTime != null && lastPolicyActivationTime > 0 && (dbObj.getPolicyActivationTime() == null || !dbObj.getPolicyActivationTime().equals(lastPolicyActivationTime))) {
>    dbObj.setPolicyActivationTime(lastPolicyActivationTime);
>    needsUpdating = true;
> } {code}
> Since doCreateOrUpdateXXPluginInfo() is a Runnble committed to RangerTransactionService. (RangerTransactionSynchronizationAdapter in Ranger 2.3.0 though, the risk might still be there). Also see doCreateOrUpdateXXPluginInfo(): 
> {code:java}
> // code placeholder
> commitWork = new Runnable() {
>    @Override
>    public void run() {
>       doCreateOrUpdateXXPluginInfo(pluginInfo, entityType, isTagVersionResetNeeded, clusterName);
>    }
> }; 
> ...
> activityLogger.commitAfterTransactionComplete(commitWork);{code}
> RangerTransactionService use a thread pool with unlimited work queue, ScheduledExecutorService, to store extra Runnables.
> In our cases, there are 1000+ hive and hbase instances, the ranger admin seems to be  under tremendous pressure becuase every instance would periodically request policy-downloading API and trigger an update of the table "x_plugin_info". Since the core thread pool seems to be poor and DB is also likely under pressure, the work queue is stacking, leaking out JVM Heap and causing OOM finally.
> I think adding more core threads would help, but when the system grow, this part of code would bring a lot overhead, is there any solution?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)