You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/10/13 12:07:00 UTC

[jira] [Work logged] (BEAM-13042) Prevent unexpected blocking in RegisterAndProcessBundleOperation hasFailed

     [ https://issues.apache.org/jira/browse/BEAM-13042?focusedWorklogId=664598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-664598 ]

ASF GitHub Bot logged work on BEAM-13042:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Oct/21 12:06
            Start Date: 13/Oct/21 12:06
    Worklog Time Spent: 10m 
      Work Description: scwhittle commented on pull request #15716:
URL: https://github.com/apache/beam/pull/15716#issuecomment-942233005


   R: @kennknowles 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 664598)
    Time Spent: 20m  (was: 10m)

> Prevent unexpected blocking in RegisterAndProcessBundleOperation hasFailed
> --------------------------------------------------------------------------
>
>                 Key: BEAM-13042
>                 URL: https://issues.apache.org/jira/browse/BEAM-13042
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have observed stack traces in stuck jobs as follows:
> --- Threads (1): [Thread[pool-8-thread-1,5,main]] State: WAITING stack: ---
>   sun.misc.Unsafe.park(Native Method)
>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.hasFailed(RegisterAndProcessBundleOperation.java:407)
>   org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:332)
>   org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:326)
>   org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$121/1203893465.run(Unknown Source)
>   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   java.lang.Thread.run(Thread.java:748)
> This code appears to not expect blocking as it checks isDone before calling get()
>   public boolean hasFailed() throws ExecutionException, InterruptedException {
>     if (processBundleResponse != null && processBundleResponse.toCompletableFuture().isDone()) {
>       return !processBundleResponse.toCompletableFuture().get().getError().isEmpty();
>     } else {
>       // At the very least, we don't know that this has failed yet.
>       return false;
>     }
>   }
> I'm unsure why this is occurring but it could be that the two calls to toCompletableFuture are returning different futures for some reason, or that the completion stage is somehow changing from done to undone.  In either case this could by using a single call to  CompletableFuture.getNow method which guarantees not to block.
> This affects the V1 Runner harness which isn't generally used but might as well be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)