You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Peter Turcsanyi (Jira)" <ji...@apache.org> on 2020/06/15 21:09:00 UTC

[jira] [Resolved] (NIFI-7527) AbstractKuduProcessor deadlocks after TGT refresh

     [ https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Turcsanyi resolved NIFI-7527.
-----------------------------------
    Fix Version/s: 1.12.0
       Resolution: Fixed

> AbstractKuduProcessor deadlocks after TGT refresh 
> --------------------------------------------------
>
>                 Key: NIFI-7527
>                 URL: https://issues.apache.org/jira/browse/NIFI-7527
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Tamas Palfy
>            Assignee: Tamas Palfy
>            Priority: Major
>             Fix For: 1.12.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.
> The reason is that the onTrigger initiates a read lock:
> {code:java}
>     @Override
>     public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
>         kuduClientReadLock.lock();
>         try {
>             onTrigger(context, session, kuduClientR);
>         } finally {
>             kuduClientReadLock.unlock();
>         }
>     }
> {code}
> and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs - a write lock is attempted:
> {code:java}
> ...
>             public synchronized boolean checkTGTAndRelogin() throws LoginException {
>                 boolean didRelogin = super.checkTGTAndRelogin();
>                 if (didRelogin) {
>                     createKuduClient(context);
>                 }
>                 return didRelogin;
>             }
> ...
>     protected void createKuduClient(ProcessContext context) {
>         kuduClientWriteLock.lock();
>         try {
>             if (this.kuduClientR.get() != null) {
>                 try {
>                     this.kuduClientR.get().close();
>                 } catch (KuduException e) {
>                     getLogger().error("Couldn't close Kudu client.");
>                 }
>             }
>             if (kerberosUser != null) {
>                 final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
>                 this.kuduClientR.set(kerberosAction.execute());
>             } else {
>                 this.kuduClientR.set(buildClient(context));
>             }
>         } finally {
>             kuduClientWriteLock.unlock();
>         }
>     }
> {code}
> This attempt at the write lock will get stuck, waiting for the previous read lock to get released.
> (Other threads may have acquired the same read lock but they can release it eventually - unless they too try to acquire the write lock themselves.)
> For the fix it seemed to be best to re-evalute the locking logic.
> Previously basically the whole onTrigger logic was encapsulated in a read lock, including the checking - and recreating as needed -  the Kudu client (as explained before).
> It's best to just keep the actual privileged action in the read lock so the the refreshing of the TGT and re-creation of the Kudu client can safely be done in a write lock before that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)