You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Hokuto Tateyama (Jira)" <ji...@apache.org> on 2019/12/20 03:30:00 UTC

[jira] [Updated] (BEAM-9007) beam.DoFn setup() will call several times when using python subprocess

     [ https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hokuto Tateyama updated BEAM-9007:
----------------------------------
    Summary: beam.DoFn setup() will call several times when using python subprocess  (was: beam.ParDo setup() will call several times when using python subprocess)

> beam.DoFn setup() will call several times when using python subprocess
> ----------------------------------------------------------------------
>
>                 Key: BEAM-9007
>                 URL: https://issues.apache.org/jira/browse/BEAM-9007
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-community, examples-python
>    Affects Versions: 2.15.0, 2.16.0
>         Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
>            Reporter: Hokuto Tateyama
>            Assignee: Aizhamal Nurmamat kyzy
>            Priority: Minor
>
> Hello. 
>  I`m trying to use a make command on dataflow to use OpenCV source written in C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time before the process runs.
>  So I tried to run build commands on the setup() function, and it will run successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> List[Dict[str, Any]]:
> 	outputs = []
> 	try:
> 		for cmd in command_list:
> 			logging.info(cmd)
> 			proc = subprocess.check_output(
> 			cmd, shell=shell, stderr=subprocess.STDOUT, universal_newlines=True)
> 			outputs.append({“Input: “: cmd, “Output: “: proc})
> 	except subprocess.CalledProcessError as e:
> 		logging.warning(“Return code:{}, Output:{}”.format(e.returncode, e.output))
> 	return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
>     [“cat /etc/issue”],
>     [“apt-get —assume-yes update”],
>     [
>         “apt-get —assume-yes install —no-install-recommends ffmpeg git software-properties-common”
>     ],
>     [“apt-get install -y software-properties-common”],
>     [
>         ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu bionic-security main”’
>     ],
>     [
>         “apt-get install -y build-essential checkinstall cmake unzip pkg-config yasm unzip”
>     ],
>     [“apt-get -y install git gfortran python3-dev”],
>     [
>         “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-dev”
>     ],
>     [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
>     [
>         “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
>     ],
>     [“apt-get clean”],
>     [“rm -rf /var/lib/apt/lists/*”],
>     [“git clone https://github.com/opencv/opencv.git”],
>     [“git clone https://github.com/opencv/opencv_contrib.git”],
>     [“cd opencv_contrib”],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“cd ../opencv/“],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“mkdir build”],
>     [“cd build”],
>     [
>         “cmake -D CMAKE_BUILD_TYPE=Release \
>                 -D CMAKE_INSTALL_PREFIX=/usr/local \
>                 -D WITH_TBB=ON \
>                 -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
>     ],
>     [“make -j8”],
>     [“make install”],
>     [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
>     [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
>  # What is the reason for calling setup() several times?
>  # Is there any solution to set up these commands only once in the total running? This is a method what I tried.
>  ## Using os.system() instead of subprocess. I think subprocess will create another process on setup() so, it can not extract process finished successfully.
>  ## Writing commands on setup.py and use it for CustomCommand
>  [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>  
> Regards, Collonville



--
This message was sent by Atlassian Jira
(v8.3.4#803005)