You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Marcos Klein (Jira)" <ji...@apache.org> on 2020/06/17 23:31:00 UTC
[jira] [Created] (FLINK-18352)
org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread
safe
Marcos Klein created FLINK-18352:
------------------------------------
Summary: org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe
Key: FLINK-18352
URL: https://issues.apache.org/jira/browse/FLINK-18352
Project: Flink
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Marcos Klein
The singleton nature of the *org.apache.flink.core.execution.DefaultExecutorServiceLoader* class is not thread-safe due to the fact that *java.util.ServiceLoader* class is not thread-safe.
[https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#Concurrency]
This can result in *ServiceLoader* class entering into an inconsistent state for processes which attempt to self-heal. This then requires bouncing the process/container in the hopes the race condition does not re-occur.
[https://stackoverflow.com/questions/60391499/apache-flink-cannot-find-compatible-factory-for-specified-execution-target-lo]
Additionally the following stack traces have been seen when using a *org.apache.flink.streaming.api.environment.RemoteStreamEnvironment* instances.
{code:java}
java.lang.ArrayIndexOutOfBoundsException: 2
at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:61)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
{code}
{code:java}
java.util.NoSuchElementException: null
at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:59)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
{code}
The workaround for using the ***StreamExecutionEnvironment* implementations is to write a custom implementation of *DefaultExecutorServiceLoader* which is thread-safe and pass that to the *StreamExecutionEnvironment* constructors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)