You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Hokuto Tateyama (Jira)" <ji...@apache.org> on 2019/12/20 03:29:00 UTC

[jira] [Updated] (BEAM-9007) beam.ParDo setup() will call several times when using python subprocess

     [ https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hokuto Tateyama updated BEAM-9007:
----------------------------------
    Description: 
Hello. 
 I`m trying to use a make command on dataflow to use OpenCV source written in C++.

I was thinking, *setup()* function on *beam.DoFn* will run only once a time before the process runs.
 So I tried to run build commands on the setup() function, and it will run successfully.
h1. Problem

After the running process, the setup() function will run again and try to build commands several times. I`ve checked these logs from my stack driver.
h1. Codes

These are my codes using dataflow. I defined the command_list in the class that inheritance from beam.DoFn and call run_cmd() from setup().

・Run command lines.
{code:python}
def run_cmd(command_list: List[List[str]], shell: bool = False) -> List[Dict[str, Any]]:
	outputs = []
	try:
		for cmd in command_list:
			logging.info(cmd)
			proc = subprocess.check_output(
			cmd, shell=shell, stderr=subprocess.STDOUT, universal_newlines=True)
			outputs.append({“Input: “: cmd, “Output: “: proc})
	except subprocess.CalledProcessError as e:
		logging.warning(“Return code:{}, Output:{}”.format(e.returncode, e.output))

	return outputs{code}
・Command list to pass run_cmd() function.
{code:python}
command_list = [
    [“cat /etc/issue”],
    [“apt-get —assume-yes update”],
    [
        “apt-get —assume-yes install —no-install-recommends ffmpeg git software-properties-common”
    ],
    [“apt-get install -y software-properties-common”],
    [
        ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu bionic-security main”’
    ],
    [
        “apt-get install -y build-essential checkinstall cmake unzip pkg-config yasm unzip”
    ],
    [“apt-get -y install git gfortran python3-dev”],
    [
        “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-dev”
    ],
    [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
    [
        “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
    ],
    [“apt-get clean”],
    [“rm -rf /var/lib/apt/lists/*”],
    [“git clone https://github.com/opencv/opencv.git”],
    [“git clone https://github.com/opencv/opencv_contrib.git”],
    [“cd opencv_contrib”],
    [“git checkout -b 3.4.3 refs/tags/3.4.3”],
    [“cd ../opencv/“],
    [“git checkout -b 3.4.3 refs/tags/3.4.3”],
    [“mkdir build”],
    [“cd build”],
    [
        “cmake -D CMAKE_BUILD_TYPE=Release \
                -D CMAKE_INSTALL_PREFIX=/usr/local \
                -D WITH_TBB=ON \
                -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
    ],
    [“make -j8”],
    [“make install”],
    [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
    [“ldconfig -v”]
]
{code}
h1. Question

For my summary, I`m wondering if these are bugs for apache beam.
 # What is the reason for calling setup() several times?
 # Is there any solution to set up these commands only once in the total running? This is a method what I tried.
 ## Using os.system() instead of subprocess. I think subprocess will create another process on setup() so, it can not extract process finished successfully.
 ## Writing commands on setup.py and use it for CustomCommand
 [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]

 

Regards, Collonville

  was:
Hello. 
 I`m trying to use a make command on dataflow to use OpenCV source written in C++.

I was thinking, *setup()* function on *beam.DoFn* will run only once a time before the process runs.
 So I tried to run build commands on the setup() function, and it will run successfully.
h1. Problem

After the running process, the setup() function will run again and try to build commands several times. I`ve checked these logs from my stack driver.
h1. Codes

These are my codes using dataflow. I defined the command_list in the class that inheritance from beam.DoFn and call run_cmd() from setup().

・Run command lines.
{code:python}
def run_cmd(command_list: List[List[str]], shell: bool = False) -> List[Dict[str, Any]]:
	outputs = []
	try:
		for cmd in command_list:
			logging.info(cmd)
			proc = subprocess.check_output(
			cmd, shell=shell, stderr=subprocess.STDOUT, universal_newlines=True)
			outputs.append({“Input: “: cmd, “Output: “: proc})
	except subprocess.CalledProcessError as e:
		logging.warning(“Return code:{}, Output:{}”.format(e.returncode, e.output))

	return outputs{code}
・Command list to pass run_cmd() function.
{code:python}
command_list = [
    [“cat /etc/issue”],
    [“apt-get —assume-yes update”],
    [
        “apt-get —assume-yes install —no-install-recommends ffmpeg git software-properties-common”
    ],
    [“apt-get install -y software-properties-common”],
    [
        ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu bionic-security main”’
    ],
    [
        “apt-get install -y build-essential checkinstall cmake unzip pkg-config yasm unzip”
    ],
    [“apt-get -y install git gfortran python3-dev”],
    [
        “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-dev”
    ],
    [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
    [
        “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
    ],
    [“apt-get clean”],
    [“rm -rf /var/lib/apt/lists/*”],
    [“git clone https://github.com/opencv/opencv.git”],
    [“git clone https://github.com/opencv/opencv_contrib.git”],
    [“cd opencv_contrib”],
    [“git checkout -b 3.4.3 refs/tags/3.4.3”],
    [“cd ../opencv/“],
    [“git checkout -b 3.4.3 refs/tags/3.4.3”],
    [“mkdir build”],
    [“cd build”],
    [
        “cmake -D CMAKE_BUILD_TYPE=Release \
                -D CMAKE_INSTALL_PREFIX=/usr/local \
                -D WITH_TBB=ON \
                -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
    ],
    [“make -j8”],
    [“make install”],
    [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
    [“ldconfig -v”]
]
{code}
h1. Question

For my summary, I`m wondering if these are bugs for apache beam.
 # What is the reason for calling setup() several times?
 # Is there any solution to set up these commands only once in the total running? This is a method what I tried.
 ## Using os.system() instead of subprocess. I think subprocess will create another process on setup() so, it can not extract process finished successfully.
 ## Writing commands on setup.py and use it for CustomCommand
[https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]

 

R_egards, Collonville_


> beam.ParDo setup() will call several times when using python subprocess
> -----------------------------------------------------------------------
>
>                 Key: BEAM-9007
>                 URL: https://issues.apache.org/jira/browse/BEAM-9007
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-community, examples-python
>    Affects Versions: 2.15.0, 2.16.0
>         Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
>            Reporter: Hokuto Tateyama
>            Assignee: Aizhamal Nurmamat kyzy
>            Priority: Minor
>
> Hello. 
>  I`m trying to use a make command on dataflow to use OpenCV source written in C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time before the process runs.
>  So I tried to run build commands on the setup() function, and it will run successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> List[Dict[str, Any]]:
> 	outputs = []
> 	try:
> 		for cmd in command_list:
> 			logging.info(cmd)
> 			proc = subprocess.check_output(
> 			cmd, shell=shell, stderr=subprocess.STDOUT, universal_newlines=True)
> 			outputs.append({“Input: “: cmd, “Output: “: proc})
> 	except subprocess.CalledProcessError as e:
> 		logging.warning(“Return code:{}, Output:{}”.format(e.returncode, e.output))
> 	return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
>     [“cat /etc/issue”],
>     [“apt-get —assume-yes update”],
>     [
>         “apt-get —assume-yes install —no-install-recommends ffmpeg git software-properties-common”
>     ],
>     [“apt-get install -y software-properties-common”],
>     [
>         ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu bionic-security main”’
>     ],
>     [
>         “apt-get install -y build-essential checkinstall cmake unzip pkg-config yasm unzip”
>     ],
>     [“apt-get -y install git gfortran python3-dev”],
>     [
>         “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-dev”
>     ],
>     [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
>     [
>         “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
>     ],
>     [“apt-get clean”],
>     [“rm -rf /var/lib/apt/lists/*”],
>     [“git clone https://github.com/opencv/opencv.git”],
>     [“git clone https://github.com/opencv/opencv_contrib.git”],
>     [“cd opencv_contrib”],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“cd ../opencv/“],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“mkdir build”],
>     [“cd build”],
>     [
>         “cmake -D CMAKE_BUILD_TYPE=Release \
>                 -D CMAKE_INSTALL_PREFIX=/usr/local \
>                 -D WITH_TBB=ON \
>                 -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
>     ],
>     [“make -j8”],
>     [“make install”],
>     [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
>     [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
>  # What is the reason for calling setup() several times?
>  # Is there any solution to set up these commands only once in the total running? This is a method what I tried.
>  ## Using os.system() instead of subprocess. I think subprocess will create another process on setup() so, it can not extract process finished successfully.
>  ## Writing commands on setup.py and use it for CustomCommand
>  [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>  
> Regards, Collonville



--
This message was sent by Atlassian Jira
(v8.3.4#803005)