You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/12 15:26:04 UTC

[GitHub] [beam] mxm opened a new pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

mxm opened a new pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108
 
 
   When environment expiration is configured, environments are removed from a cache
   map after some amount of time. This cache map may be accessed by multiple
   ExecutableStage transforms in parallel. During environment expiration there is a
   small time window which allows for an environment to still being used while
   another transform removes, dereferences, and closes the environment. This is not
   expected behavior.
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r391700905
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -192,6 +202,28 @@ public static DefaultJobBundleFactory create(
     ImmutableList.Builder<LoadingCache<Environment, WrappedSdkHarnessClient>> caches =
         ImmutableList.builder();
     for (int i = 0; i < count; i++) {
+      final Lock refLock = environmentCacheLocks.get(i);
+      builder = builder.removalListener(
+          notification -> {
+            WrappedSdkHarnessClient client = (WrappedSdkHarnessClient) notification.getValue();
 
 Review comment:
   Again, not nice but unfortunately in this order we can't use proper type information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tweise commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
tweise commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#issuecomment-598843921
 
 
   @mxm I'm looking at this and will get back.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r391752678
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -192,6 +202,28 @@ public static DefaultJobBundleFactory create(
     ImmutableList.Builder<LoadingCache<Environment, WrappedSdkHarnessClient>> caches =
         ImmutableList.builder();
     for (int i = 0; i < count; i++) {
+      final Lock refLock = environmentCacheLocks.get(i);
+      builder = builder.removalListener(
+          notification -> {
+            WrappedSdkHarnessClient client = (WrappedSdkHarnessClient) notification.getValue();
 
 Review comment:
   This is resolved now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#issuecomment-598634728
 
 
   @tweise Any comments? I've been running this on a cluster for a day with a 15 minute environment expiration. Haven't seen any problems with regards to the environment expiration.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm merged pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm merged pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tweise commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
tweise commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r392417670
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -406,8 +438,16 @@ public RemoteBundle getBundle(
         availableCachesSemaphore.acquire();
         // The blocking queue of caches for serving multiple bundles concurrently.
         currentCache = availableCaches.take();
-        client = currentCache.getUnchecked(executableStage.getEnvironment());
-        client.ref();
+        // Lock because the environment expiration can remove the ref for the client
+        // which would close the underlying environment before we can ref it.
+        Lock refLock = environmentCacheLocks.get(environmentIndex);
 
 Review comment:
   This isn't picking the correct lock, addressed in https://github.com/apache/beam/pull/11108/commits/5ebae2cd735bc8b93af6d8ec6132c3f06e256fae

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r392583334
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -406,8 +438,16 @@ public RemoteBundle getBundle(
         availableCachesSemaphore.acquire();
         // The blocking queue of caches for serving multiple bundles concurrently.
         currentCache = availableCaches.take();
-        client = currentCache.getUnchecked(executableStage.getEnvironment());
-        client.ref();
+        // Lock because the environment expiration can remove the ref for the client
+        // which would close the underlying environment before we can ref it.
+        Lock refLock = environmentCacheLocks.get(environmentIndex);
 
 Review comment:
   Thanks for spotting this!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tweise commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
tweise commented on issue #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#issuecomment-598921032
 
 
   Run Python2_PVR_Flink PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r391700263
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -161,29 +167,33 @@ public static DefaultJobBundleFactory create(
     this.stageIdGenerator = stageIdGenerator;
     this.environmentExpirationMillis = getEnvironmentExpirationMillis(jobInfo);
     this.loadBalanceBundles = shouldLoadBalanceBundles(jobInfo);
+    this.environmentCacheLocks = createEnvironmentCacheLocks(getMaxEnvironmentClients(jobInfo));
     this.environmentCaches =
         createEnvironmentCaches(serverFactory -> serverInfo, getMaxEnvironmentClients(jobInfo));
     this.availableCachesSemaphore = new Semaphore(environmentCaches.size(), true);
     this.availableCaches = new LinkedBlockingDeque<>(environmentCaches);
-    this.evictedActiveClients = Sets.newIdentityHashSet();
+    this.evictedActiveClients = Sets.newConcurrentHashSet();
+  }
+
+  private ImmutableList<Lock> createEnvironmentCacheLocks(int count) {
+    ImmutableList.Builder<Lock> locksForCaches = ImmutableList.builder();
+    for (int i = 0; i < count; i++) {
+      final Lock refLock;
+      if (environmentExpirationMillis > 0) {
+        // The lock ensures there is no race condition between expiring an environment and a client
+        // still attempting to use it, hence referencing it.
+        refLock = new ReentrantLock(true);
+      } else {
+        refLock = NoopLock.get();
+      }
+      locksForCaches.add(refLock);
+    }
+    return locksForCaches.build();
   }
 
   private ImmutableList<LoadingCache<Environment, WrappedSdkHarnessClient>> createEnvironmentCaches(
       ThrowingFunction<ServerFactory, ServerInfo> serverInfoCreator, int count) {
-    CacheBuilder<Environment, WrappedSdkHarnessClient> builder =
-        CacheBuilder.newBuilder()
-            .removalListener(
-                (RemovalNotification<Environment, WrappedSdkHarnessClient> notification) -> {
-                  WrappedSdkHarnessClient client = notification.getValue();
-                  int refCount = client.unref();
-                  if (refCount > 0) {
-                    LOG.warn(
-                        "Expiring environment {} with {} remaining bundle references. Taking note to clean it up during shutdown if the references are not removed by then.",
-                        notification.getKey(),
-                        refCount);
-                    evictedActiveClients.add(client);
-                  }
-                });
+    CacheBuilder builder = CacheBuilder.newBuilder();
 
 Review comment:
   Unfortunately, we lose the type infos here. There is no way without calling a builder method to get the correct type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11108: [BEAM-9490] Guard referencing for environment expiration via a lock
URL: https://github.com/apache/beam/pull/11108#discussion_r391752567
 
 

 ##########
 File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##########
 @@ -161,29 +167,33 @@ public static DefaultJobBundleFactory create(
     this.stageIdGenerator = stageIdGenerator;
     this.environmentExpirationMillis = getEnvironmentExpirationMillis(jobInfo);
     this.loadBalanceBundles = shouldLoadBalanceBundles(jobInfo);
+    this.environmentCacheLocks = createEnvironmentCacheLocks(getMaxEnvironmentClients(jobInfo));
     this.environmentCaches =
         createEnvironmentCaches(serverFactory -> serverInfo, getMaxEnvironmentClients(jobInfo));
     this.availableCachesSemaphore = new Semaphore(environmentCaches.size(), true);
     this.availableCaches = new LinkedBlockingDeque<>(environmentCaches);
-    this.evictedActiveClients = Sets.newIdentityHashSet();
+    this.evictedActiveClients = Sets.newConcurrentHashSet();
+  }
+
+  private ImmutableList<Lock> createEnvironmentCacheLocks(int count) {
+    ImmutableList.Builder<Lock> locksForCaches = ImmutableList.builder();
+    for (int i = 0; i < count; i++) {
+      final Lock refLock;
+      if (environmentExpirationMillis > 0) {
+        // The lock ensures there is no race condition between expiring an environment and a client
+        // still attempting to use it, hence referencing it.
+        refLock = new ReentrantLock(true);
+      } else {
+        refLock = NoopLock.get();
+      }
+      locksForCaches.add(refLock);
+    }
+    return locksForCaches.build();
   }
 
   private ImmutableList<LoadingCache<Environment, WrappedSdkHarnessClient>> createEnvironmentCaches(
       ThrowingFunction<ServerFactory, ServerInfo> serverInfoCreator, int count) {
-    CacheBuilder<Environment, WrappedSdkHarnessClient> builder =
-        CacheBuilder.newBuilder()
-            .removalListener(
-                (RemovalNotification<Environment, WrappedSdkHarnessClient> notification) -> {
-                  WrappedSdkHarnessClient client = notification.getValue();
-                  int refCount = client.unref();
-                  if (refCount > 0) {
-                    LOG.warn(
-                        "Expiring environment {} with {} remaining bundle references. Taking note to clean it up during shutdown if the references are not removed by then.",
-                        notification.getKey(),
-                        refCount);
-                    evictedActiveClients.add(client);
-                  }
-                });
+    CacheBuilder builder = CacheBuilder.newBuilder();
 
 Review comment:
   This is resolved now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services