You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/08 19:18:42 UTC

[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

kennknowles commented on a change in pull request #11792:
URL: https://github.com/apache/beam/pull/11792#discussion_r451761061



##########
File path: runners/portability/java/build.gradle
##########
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: "testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+    workingDir pythonSdkDir
+    commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+    println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {
+  doLast {
+    exec {
+      commandLine "virtualenv", virtualenvDir, "--python=python3"
+    }
+    execInVirtualenv "pip", "install", "--retries", "10", "--upgrade", "tox==3.11.1", "--requirement", "${project.rootDir}/sdks/python/build-requirements.txt"
+    execInVirtualenv "python", "setup.py", "build", "--build-base=${buildDir}"
+    execInVirtualenv "pip", "install", "-e", "."
+  }
+}
+
+task startLocalJobService {
+  dependsOn virtualenv
+
+  doLast {
+    execBackgroundInVirtualenv "python",
+        "-m", "apache_beam.runners.portability.local_job_service_main",
+        "--background",
+        "--stdout_file=${localJobServiceStdoutFile}",
+        "--pid_file=${localJobServicePidFile}",
+        "--port_file=${localJobServicePortFile}"
+
+    File pidFile = new File(localJobServicePidFile)
+    int totalSleep = 0
+    while (!pidFile.exists()) {
+      sleep(500)

Review comment:
       Removed. This code was left over from when I was struggling to get gradle to allow the thing to daemonize itself.

##########
File path: sdks/python/apache_beam/runners/portability/local_job_service_main.py
##########
@@ -99,11 +105,23 @@ def run(argv):
       options.port_file = os.path.splitext(options.pid_file)[0] + '.port'
       argv.append('--port_file')
       argv.append(options.port_file)
+
+    if not options.stdout_file:
+      raise RuntimeError('--stdout_file must be specified with --background')
+    stdout_dest = open(options.stdout_file, mode='w')
+
+    if options.stderr_file:
+      stderr_dest=open(options.stderr_file, mode='w')
+    else:
+      stderr_dest=subprocess.STDOUT
+
     subprocess.Popen([
         sys.executable,
         '-m',
         'apache_beam.runners.portability.local_job_service_main'
-    ] + argv)
+    ] + argv,
+        stderr=stderr_dest,

Review comment:
       I didn't read the `subprocess` code, and the docs are vague. The special `subprocess.STDOUT` token indicates that the output should be "captured" into the same file handle. Comments at https://stackoverflow.com/questions/31980411/closing-files-from-subprocess-stdout imply that closing the file is the responsibility of this process. I did not run the experiments suggested there. I also did not try to refactor this code to allow a `with` statement.

##########
File path: runners/portability/java/build.gradle
##########
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: "testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+    workingDir pythonSdkDir
+    commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+    println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {

Review comment:
       I'm not suggesting moving each `build.gradle` to do their own thing. Beam Java modules and vendored libraries can share as much code as they like. They shouldn't share an entrypoint because they are different things.
   
   In this case, the code suggested to be shared is coupled with creating a Beam Python module, which this is not. "Do Python things" is not an adequate or meaningful abstraction. Applying the vague blanket logic is a liability, even if it worked here, which it does not. It is likely that I can do some tweaks to `applyPythonNature` to "make it work", but that would be bad engineering.
   
    - Adding near-duplicate code when maybe there is an abstraction: tech debt
    - Adding a dependency on something that isn't ready/meant for it: tech debt
   
   I interpret Robert's comment as an invitation to improve our build code into some kind of meaningful abstraction that can be shared without incurring yet more tech debt.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org