You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/06 20:03:27 UTC

[GitHub] [beam] tysonjh commented on a change in pull request #12435: [BEAM-10616] Added Python Pardo load tests for streaming on Dataflow

tysonjh commented on a change in pull request #12435:
URL: https://github.com/apache/beam/pull/12435#discussion_r466648416



##########
File path: .test-infra/jenkins/job_LoadTests_ParDo_Python.groovy
##########
@@ -151,3 +151,35 @@ CronJobBuilder.cronJob('beam_LoadTests_Python_ParDo_Dataflow_Batch', 'H 13 * * *
   ]
   batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT)
 }
+
+def streamingLoadTestJob = { scope, triggeringContext ->
+  scope.description('Runs Python ParDo load tests on Dataflow runner in streaming mode')
+  commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 120)
+
+  def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext)
+  for (testConfiguration in loadTestConfigurations("streaming", datasetName)) {
+    // Skipping case 2 in streaming because it timeouts. To be checked TODO: kkucharc
+    if(testConfiguration.title != "ParDo Python Load test: 2GB 100 byte records 200 times") {

Review comment:
       This should be a property in the testConfiguration instead of relying on title implicitly? That would also eliminate adding a 'streaming' property below as well.
   
   In fact, doing this may also allow refactoring 'batchLoadTestJob' to a more generic 'loadTestJob' for reuse in both Streaming/Batch cases. It may unnecessarily complicate things, I'll leave it up to your judgement.

##########
File path: .test-infra/jenkins/job_LoadTests_ParDo_Python.groovy
##########
@@ -151,3 +151,35 @@ CronJobBuilder.cronJob('beam_LoadTests_Python_ParDo_Dataflow_Batch', 'H 13 * * *
   ]
   batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT)
 }
+
+def streamingLoadTestJob = { scope, triggeringContext ->
+  scope.description('Runs Python ParDo load tests on Dataflow runner in streaming mode')
+  commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 120)
+
+  def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext)
+  for (testConfiguration in loadTestConfigurations("streaming", datasetName)) {
+    // Skipping case 2 in streaming because it timeouts. To be checked TODO: kkucharc

Review comment:
       Are you planning to investigate this now, soon, or in the future? If it isn't something you'll be looking into immediately and there is some additional context related to the timeouts that you've found while testing it would be helpful to create a Jira issue and link it here instead of a TODO for yourself.

##########
File path: sdks/python/apache_beam/testing/load_tests/pardo_test.py
##########
@@ -125,7 +125,9 @@ def process(self, element, state=state_param):
             state.add(1)
         yield element
 
-    if self.get_option_or_default('streaming', False):
+    if self.get_option_or_default(
+        'streaming',
+        False) and self.pipeline.get_option('runner') == "PortableRunner":

Review comment:
       Why only for the PortableRunner now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org