You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Wei Zhong (Jira)" <ji...@apache.org> on 2023/02/22 11:33:00 UTC

[jira] [Created] (FLINK-31184) Failed to get python udf runner directory via running GET_RUNNER_DIR_SCRIPT

Wei Zhong created FLINK-31184:
---------------------------------

             Summary: Failed to get python udf runner directory via running GET_RUNNER_DIR_SCRIPT 
                 Key: FLINK-31184
                 URL: https://issues.apache.org/jira/browse/FLINK-31184
             Project: Flink
          Issue Type: Bug
          Components: API / Python
    Affects Versions: 1.16.1, 1.15.3, 1.17.0
            Reporter: Wei Zhong


The following exception is thrown when using python udf in user job:

 
{code:java}
Caused by: java.io.IOException: Cannot run program "ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored.
/mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/pyflink-udf-runner.sh": error=2, No such file or directory
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
  at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:147)
  at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:122)
  at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:106)
  at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
  at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
  at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
  ... 19 more
  Suppressed: java.lang.NullPointerException: Process for id does not exist: 1-1
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:895)
    at org.apache.beam.runners.fnexecution.environment.ProcessManager.stopProcess(ProcessManager.java:172)
    at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:126)
    ... 29 more
Caused by: java.io.IOException: error=2, No such file or directory
  at java.lang.UNIXProcess.forkAndExec(Native Method)
  at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
  at java.lang.ProcessImpl.start(ProcessImpl.java:134)
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
  ... 32 more {code}
 

 

This is because SRE introduce a environment param 

 
{code:java}
LD_PRELOAD=/usr/lib64/libjemalloc.so.1 {code}
The logic of the python process itself can be executed normally, but an extra error message will be printed. So the whole output looks like:
{code:java}
ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored.
/mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/{code}
And the whole output is treated as a command, which caused the exception.

It seems the output is not very reliable. Maybe we need to find another way to transfer data, or filter the output before using.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)