You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/19 19:23:31 UTC

[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r427430657



##########
File path: sdks/python/test-suites/dataflow/py2/build.gradle
##########
@@ -205,3 +205,20 @@ task chicagoTaxiExample {
     }
   }
 }
+
+task runPerformanceTest {

Review comment:
       Can we import this from common.gradle?

##########
File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
##########
@@ -58,117 +26,59 @@ def dataflowPipelineArgs = [
     temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
 ]
 
-
-// Configurations of each Jenkins job.
-def testConfigurations = [
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py27',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py27 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python27 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py27_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py2',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py35',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py35 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python35 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py35_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py35',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py36',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py36 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python36 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py36_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py36',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py37',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py37 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python37 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py37_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py37',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-]
-
+testConfigurations = []
+pythonVersions = ['27', '35', '36', '37']

Review comment:
       I don't think there is significant value to run across all Python version. We can keep Py 27 and Py37 for now, and then switch to one of "high-priority"[1] versions once we introduce that concept. cc: @lazylynx 
   [1] https://lists.apache.org/thread.html/re621331e10896ac65f487c1a83cc4a91152e2fd6d7e363c115b1857f%40%3Cdev.beam.apache.org%3E

##########
File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
##########
@@ -58,117 +26,59 @@ def dataflowPipelineArgs = [
     temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
 ]
 
-
-// Configurations of each Jenkins job.
-def testConfigurations = [
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py27',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py27 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python27 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py27_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py2',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py35',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py35 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python35 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py35_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py35',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py36',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py36 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python36 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py36_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py36',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py37',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py37 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python37 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py37_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py37',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-]
-
+testConfigurations = []
+pythonVersions = ['27', '35', '36', '37']
+
+for (pythonVersion in pythonVersions) {
+    def taskVersion = pythonVersion == '27' ? '2' : pythonVersion
+    testConfigurations.add([
+            jobName           : "beam_PerformanceTests_WordCountIT_Py${pythonVersion}",
+            jobDescription    : "Python SDK Performance Test - Run WordCountIT in Py${pythonVersion} with 1Gb files",
+            jobTriggerPhrase  : "Run Python${pythonVersion} WordCountIT Performance Test",
+            test              : "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it",
+            gradleTaskName    : ":sdks:python:test-suites:dataflow:py${taskVersion}:runPerformanceTest",
+            pipelineOptions   : dataflowPipelineArgs + [
+                    runner               : 'TestDataflowRunner',
+                    publish_to_big_query : true,
+                    metrics_dataset      : 'beam_performance',
+                    metrics_table        : "wordcount_py${pythonVersion}_pkb_results",
+                    input                : "gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*", // 1Gb
+                    output               : "gs://temp-storage-for-end-to-end-tests/py-it-cloud/output",
+                    expect_checksum      : "ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710",
+                    num_workers          : '10',
+                    autoscaling_algorithm: "NONE",  // Disable autoscale the worker pool.
+            ]
+    ])
+}
 
 for (testConfig in testConfigurations) {
   createPythonPerformanceTestJob(testConfig)
 }
 
-
-private void createPythonPerformanceTestJob(PerformanceTestConfigurations testConfig) {
-  // This job runs the Beam Python performance tests on PerfKit Benchmarker.
+private void createPythonPerformanceTestJob(Map testConfig) {
+  // This job runs the Beam Python performance tests
   job(testConfig.jobName) {
     // Set default Beam job properties.
     commonJobProperties.setTopLevelMainJobProperties(delegate)
 
     // Run job in postcommit, don't trigger every push.
-    commonJobProperties.setAutoJob(
-        delegate,
-        testConfig.buildSchedule)
+    commonJobProperties.setAutoJob(delegate, 'H */6 * * *')

Review comment:
       fyi, @lazylynx - I think this stanza would be useful to configure low priority jobs.

##########
File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
##########
@@ -58,117 +26,59 @@ def dataflowPipelineArgs = [
     temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
 ]
 
-
-// Configurations of each Jenkins job.
-def testConfigurations = [
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py27',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py27 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python27 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py27_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py2',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py35',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py35 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python35 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py35_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py35',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py36',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py36 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python36 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py36_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py36',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-    new PerformanceTestConfigurations(
-        jobName           : 'beam_PerformanceTests_WordCountIT_Py37',
-        jobDescription    : 'Python SDK Performance Test - Run WordCountIT in Py37 with 1Gb files',
-        jobTriggerPhrase  : 'Run Python37 WordCountIT Performance Test',
-        resultTable       : 'beam_performance.wordcount_py37_pkb_results',
-        test              : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-        itModule          : ':sdks:python:test-suites:dataflow:py37',
-        extraPipelineArgs : dataflowPipelineArgs + [
-            input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*', // 1Gb
-            output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-            expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-            num_workers: '10',
-            autoscaling_algorithm: 'NONE',  // Disable autoscale the worker pool.
-        ],
-    ),
-]
-
+testConfigurations = []
+pythonVersions = ['27', '35', '36', '37']
+
+for (pythonVersion in pythonVersions) {

Review comment:
       Where do we configure the dashboards for these tests? Do we need to configure python version bit in the dashboard configuration as well?

##########
File path: sdks/python/test-suites/dataflow/common.gradle
##########
@@ -109,4 +109,21 @@ task validatesRunnerStreamingTests {
       args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs"
     }
   }
-}
\ No newline at end of file
+}
+
+task runPerformanceTest {
+    dependsOn 'installGcpTest'
+    dependsOn ':sdks:python:sdist'
+
+    def test = project.findProperty('test')
+    def testOpts = project.findProperty('test-pipeline-options')
+    testOpts += " --sdk_location=${files(configurations.distTarBall.files).singleFile}"
+
+  doLast {
+    exec {
+      workingDir "${project.rootDir}/sdks/python"
+      executable 'sh'
+      args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py nosetests --tests=${test}  --test-pipeline-options=\"${testOpts}\" --ignore-files \'.*py3\\d?\\.py\$\'"

Review comment:
       Please make sure these tests export the xml logs that can be inspected in Jenkins in case of test failure:
   
   Relevant bits are: 
   1. https://github.com/apache/beam/blob/03d99dfa359f44a29a772fcc8ec8b0a237cab113/.test-infra/jenkins/job_PostCommit_Python37.groovy#L32
   2. https://github.com/apache/beam/blob/03d99dfa359f44a29a772fcc8ec8b0a237cab113/sdks/python/scripts/run_integration_test.sh#L276
   
   cc: @udim who may have additional feedback on this. Udi, would it make sense to use [pytest](https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-RunningTestsusingpytest) here instead of `nose`?

##########
File path: sdks/python/test-suites/dataflow/common.gradle
##########
@@ -109,4 +109,21 @@ task validatesRunnerStreamingTests {
       args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs"
     }
   }
-}
\ No newline at end of file
+}
+
+task runPerformanceTest {
+    dependsOn 'installGcpTest'
+    dependsOn ':sdks:python:sdist'
+
+    def test = project.findProperty('test')
+    def testOpts = project.findProperty('test-pipeline-options')
+    testOpts += " --sdk_location=${files(configurations.distTarBall.files).singleFile}"
+
+  doLast {
+    exec {
+      workingDir "${project.rootDir}/sdks/python"
+      executable 'sh'
+      args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py nosetests --tests=${test}  --test-pipeline-options=\"${testOpts}\" --ignore-files \'.*py3\\d?\\.py\$\'"

Review comment:
       Do we need to pass `--ignore-files` given that we control which tests to run?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org