You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/01/16 14:00:01 UTC
[jira] [Updated] (BEAM-9007) beam.DoFn setup() will call several times when using python subprocess
[ https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-9007:
----------------------------------
Status: Open (was: Triage Needed)
> beam.DoFn setup() will call several times when using python subprocess
> ----------------------------------------------------------------------
>
> Key: BEAM-9007
> URL: https://issues.apache.org/jira/browse/BEAM-9007
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.15.0, 2.16.0
> Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
> Reporter: Hokuto Tateyama
> Priority: P3
>
> Hello.
> I`m trying to use a make command on dataflow to use OpenCV source written in C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time before the process runs.
> So I tried to run build commands on the setup() function, and it will run successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> List[Dict[str, Any]]:
> outputs = []
> try:
> for cmd in command_list:
> logging.info(cmd)
> proc = subprocess.check_output(
> cmd, shell=shell, stderr=subprocess.STDOUT, universal_newlines=True)
> outputs.append({“Input: “: cmd, “Output: “: proc})
> except subprocess.CalledProcessError as e:
> logging.warning(“Return code:{}, Output:{}”.format(e.returncode, e.output))
> return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
> [“cat /etc/issue”],
> [“apt-get —assume-yes update”],
> [
> “apt-get —assume-yes install —no-install-recommends ffmpeg git software-properties-common”
> ],
> [“apt-get install -y software-properties-common”],
> [
> ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu bionic-security main”’
> ],
> [
> “apt-get install -y build-essential checkinstall cmake unzip pkg-config yasm unzip”
> ],
> [“apt-get -y install git gfortran python3-dev”],
> [
> “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-dev”
> ],
> [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
> [
> “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
> ],
> [“apt-get clean”],
> [“rm -rf /var/lib/apt/lists/*”],
> [“git clone https://github.com/opencv/opencv.git”],
> [“git clone https://github.com/opencv/opencv_contrib.git”],
> [“cd opencv_contrib”],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“cd ../opencv/“],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“mkdir build”],
> [“cd build”],
> [
> “cmake -D CMAKE_BUILD_TYPE=Release \
> -D CMAKE_INSTALL_PREFIX=/usr/local \
> -D WITH_TBB=ON \
> -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
> ],
> [“make -j8”],
> [“make install”],
> [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
> [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
> # What is the reason for calling setup() several times?
> # Is there any solution to set up these commands only once in the total running? This is a method what I tried.
> ## Using os.system() instead of subprocess. I think subprocess will create another process on setup() so, it can not extract process finished successfully.
> ## Writing commands on setup.py and use it for CustomCommand
> [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>
> Regards, Collonville
--
This message was sent by Atlassian Jira
(v8.20.1#820001)