You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/13 20:29:08 UTC

[GitHub] [beam] saavannanavati opened a new pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

saavannanavati opened a new pull request #12242:
URL: https://github.com/apache/beam/pull/12242


   This PR adds a load test for evaluating the performance of the runtime typechecking system for the Python SDK. It works by comparing the performance of a pipeline with the runtime typechecking feature on versus the same pipeline with it off. This load test also allows the user to specify whether the pipeline should be run with simple typehints (e.g. str) or complex, nested typehints (e.g. Tuple[str, int, Iterable[bool]].
   
   This PR is in development and is not ready for merge.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on pull request #12242:
URL: https://github.com/apache/beam/pull/12242#issuecomment-667259686


   R: @udim 
   R: @robertwb 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on a change in pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on a change in pull request #12242:
URL: https://github.com/apache/beam/pull/12242#discussion_r468091611



##########
File path: .test-infra/jenkins/job_LoadTests_RuntimeTypeChecking_Python.groovy
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import LoadTestsBuilder as loadTestsBuilder
+import PhraseTriggeringPostCommitBuilder
+import InfluxDBCredentialsHelper
+
+def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC'))
+
+def loadTestConfigurations = { datasetName ->
+  [
+    [
+      title          : 'Runtime Type Checking Python Load Test: On | Simple Type Hints',
+      test           : 'python -m apache_beam.testing.load_tests.runtime_type_check_on_test_py3',
+      runner         : CommonTestProperties.Runner.DATAFLOW,
+      pipelineOptions: [
+        job_name             : 'load-tests-python-dataflow-batch-runtime-type-check-1-' + now,
+        project              : 'apache-beam-testing',
+        region               : 'us-central1',
+        temp_location        : 'gs://temp-storage-for-perf-tests/loadtests',
+        publish_to_big_query : true,
+        metrics_dataset      : datasetName,
+        metrics_table        : 'python_dataflow_batch_runtime_type_check_1',
+        influx_measurement   : 'python_batch_runtime_type_check_1',
+        input_options        : '\'{"num_records": 0,"key_size": 0,"value_size": 0}\''
+        num_records          : 1000,
+        fanout               : 300,
+        num_workers          : 5,
+        autoscaling_algorithm: "NONE",
+        nested_typehint: 0  // False
+      ]
+    ],
+    [
+      title          : 'Runtime Type Checking Python Load Test: Off | Simple Type Hints',
+      test           : 'python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3',
+      runner         : CommonTestProperties.Runner.DATAFLOW,
+      pipelineOptions: [
+        job_name             : 'load-tests-python-dataflow-batch-runtime-type-check-2-' + now,
+        project              : 'apache-beam-testing',
+        region               : 'us-central1',
+        temp_location        : 'gs://temp-storage-for-perf-tests/loadtests',
+        publish_to_big_query : true,
+        metrics_dataset      : datasetName,
+        metrics_table        : 'python_dataflow_batch_runtime_type_check_2',
+        influx_measurement   : 'python_batch_runtime_type_check_2',
+        input_options        : '\'{"num_records": 0,"key_size": 0,"value_size": 0}\''
+        num_records          : 1000,
+        fanout               : 300,
+        num_workers          : 5,
+        autoscaling_algorithm: "NONE",
+        nested_typehint: 0  // False
+      ]
+    ],
+    [
+      title          : 'Runtime Type Checking Python Load Test: On | Nested Type Hints',

Review comment:
       Yeah sure
   
   I think the original goal of having both nested type hints and simple type hints as separate tests was to see if there was any performance difference between the two when runtime type checking was on, which would help us narrow down whether the performance drop came from the overhead of the decorator versus the actual type check itself, and also to test for the regressions separately.
   
   I can merge them if it's okay with @udim




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati closed pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati closed pull request #12242:
URL: https://github.com/apache/beam/pull/12242


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12242:
URL: https://github.com/apache/beam/pull/12242#discussion_r468083210



##########
File path: .test-infra/jenkins/job_LoadTests_RuntimeTypeChecking_Python.groovy
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import LoadTestsBuilder as loadTestsBuilder
+import PhraseTriggeringPostCommitBuilder
+import InfluxDBCredentialsHelper
+
+def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC'))
+
+def loadTestConfigurations = { datasetName ->
+  [
+    [
+      title          : 'Runtime Type Checking Python Load Test: On | Simple Type Hints',
+      test           : 'python -m apache_beam.testing.load_tests.runtime_type_check_on_test_py3',
+      runner         : CommonTestProperties.Runner.DATAFLOW,
+      pipelineOptions: [
+        job_name             : 'load-tests-python-dataflow-batch-runtime-type-check-1-' + now,
+        project              : 'apache-beam-testing',
+        region               : 'us-central1',
+        temp_location        : 'gs://temp-storage-for-perf-tests/loadtests',
+        publish_to_big_query : true,
+        metrics_dataset      : datasetName,
+        metrics_table        : 'python_dataflow_batch_runtime_type_check_1',
+        influx_measurement   : 'python_batch_runtime_type_check_1',
+        input_options        : '\'{"num_records": 0,"key_size": 0,"value_size": 0}\''
+        num_records          : 1000,
+        fanout               : 300,
+        num_workers          : 5,
+        autoscaling_algorithm: "NONE",
+        nested_typehint: 0  // False
+      ]
+    ],
+    [
+      title          : 'Runtime Type Checking Python Load Test: Off | Simple Type Hints',
+      test           : 'python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3',
+      runner         : CommonTestProperties.Runner.DATAFLOW,
+      pipelineOptions: [
+        job_name             : 'load-tests-python-dataflow-batch-runtime-type-check-2-' + now,
+        project              : 'apache-beam-testing',
+        region               : 'us-central1',
+        temp_location        : 'gs://temp-storage-for-perf-tests/loadtests',
+        publish_to_big_query : true,
+        metrics_dataset      : datasetName,
+        metrics_table        : 'python_dataflow_batch_runtime_type_check_2',
+        influx_measurement   : 'python_batch_runtime_type_check_2',
+        input_options        : '\'{"num_records": 0,"key_size": 0,"value_size": 0}\''
+        num_records          : 1000,
+        fanout               : 300,
+        num_workers          : 5,
+        autoscaling_algorithm: "NONE",
+        nested_typehint: 0  // False
+      ]
+    ],
+    [
+      title          : 'Runtime Type Checking Python Load Test: On | Nested Type Hints',

Review comment:
       Running load tests is a limited resource; could you test them both in the same pipeline? This should be sufficient for the purposes of noticing regressions. 

##########
File path: sdks/python/apache_beam/testing/load_tests/runtime_type_check_off_test_py3.py
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+For more information, see documentation in the parent class.
+
+Example test run:
+
+python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3 \
+    --test-pipeline-options="
+    --project=apache-beam-testing
+    --region=us-centrall
+    --publish_to_big_query=true
+    --metrics_dataset=python_load_tests
+    --metrics_table=gbk
+    --nested_typehint=0
+    --fanout=200
+    --num_records=1000
+    --input_options='{
+    \"num_records\": 0,
+    \"key_size\": 0,
+    \"value_size\": 0
+    }'"
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.testing.load_tests.base_runtime_type_check_test_py3 import \
+    BaseRunTimeTypeCheckTest
+
+
+class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest):
+  def __init__(self):
+    self.runtime_type_check = False

Review comment:
       Rather than pass this as an attribute of self, can't we pass it as an additional flag in the test? This would allow us to get rid of much if not all of this duplication. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on pull request #12242:
URL: https://github.com/apache/beam/pull/12242#issuecomment-671743091


   Closing this PR because we've collectively come to a consensus that micro-benchmarking will provide better precision, and consume less resources than load testing. This PR can be a historical marker in case we decide to add a load test in the future to check for regressions. Therefore, the micro-benchmark will be added in a different PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on a change in pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on a change in pull request #12242:
URL: https://github.com/apache/beam/pull/12242#discussion_r468089367



##########
File path: sdks/python/apache_beam/testing/load_tests/runtime_type_check_off_test_py3.py
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+For more information, see documentation in the parent class.
+
+Example test run:
+
+python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3 \
+    --test-pipeline-options="
+    --project=apache-beam-testing
+    --region=us-centrall
+    --publish_to_big_query=true
+    --metrics_dataset=python_load_tests
+    --metrics_table=gbk
+    --nested_typehint=0
+    --fanout=200
+    --num_records=1000
+    --input_options='{
+    \"num_records\": 0,
+    \"key_size\": 0,
+    \"value_size\": 0
+    }'"
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.testing.load_tests.base_runtime_type_check_test_py3 import \
+    BaseRunTimeTypeCheckTest
+
+
+class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest):
+  def __init__(self):
+    self.runtime_type_check = False

Review comment:
       We looked into this but the CLI flags are only parsed after the LoadTest has been initialized, and by that point it's too late to modify the Pipeline options because that happens in the LoadTest constructor.. or can I still modify the options?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] udim commented on pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
udim commented on pull request #12242:
URL: https://github.com/apache/beam/pull/12242#issuecomment-667442049


   Please update PR desc. on whether it's ready for review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] udim commented on pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
udim commented on pull request #12242:
URL: https://github.com/apache/beam/pull/12242#issuecomment-670239967


   > Sorry I didn't have a chance to look at this sooner.
   > 
   > Since this is a purely local change, I would suggest rather than adding all this infrastructure simply creating a variant of (or even adding an option to)
   > 
   > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/map_fn_microbenchmark.py
   > 
   > which enables the two runtime type checking. This'll be much cleaner numbers, and a lot faster turn-around time. (It'd probably be valuable to run this before and after your changes even with typechecking turned off to ensure you're not adding overhead in that case too.)
   
   My goal in suggesting this framework was to gather historical numbers, and to test for performance regressions in future PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on a change in pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on a change in pull request #12242:
URL: https://github.com/apache/beam/pull/12242#discussion_r468089367



##########
File path: sdks/python/apache_beam/testing/load_tests/runtime_type_check_off_test_py3.py
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+For more information, see documentation in the parent class.
+
+Example test run:
+
+python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3 \
+    --test-pipeline-options="
+    --project=apache-beam-testing
+    --region=us-centrall
+    --publish_to_big_query=true
+    --metrics_dataset=python_load_tests
+    --metrics_table=gbk
+    --nested_typehint=0
+    --fanout=200
+    --num_records=1000
+    --input_options='{
+    \"num_records\": 0,
+    \"key_size\": 0,
+    \"value_size\": 0
+    }'"
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.testing.load_tests.base_runtime_type_check_test_py3 import \
+    BaseRunTimeTypeCheckTest
+
+
+class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest):
+  def __init__(self):
+    self.runtime_type_check = False

Review comment:
       We looked into this but all flags that are passed to the test are parsed after the LoadTest has been initialized, and it's in this initialization process that the pipeline options are defined and used. We tried parsing `runtime_type_check` before the Pipeline was initialized in the constructor so we could pass it through as an option, but the method of parsing used by LoadTest is a non-static class function that we can't call until the constructor has finished.
   
   Is there a workaround?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on pull request #12242:
URL: https://github.com/apache/beam/pull/12242#issuecomment-668157896


   R: @udim 
   R: @robertwb 
   
   PTAL - this is ready for review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] saavannanavati commented on a change in pull request #12242: [BEAM-10427] Benchmark runtime typechecking for the Python SDK

Posted by GitBox <gi...@apache.org>.
saavannanavati commented on a change in pull request #12242:
URL: https://github.com/apache/beam/pull/12242#discussion_r468089367



##########
File path: sdks/python/apache_beam/testing/load_tests/runtime_type_check_off_test_py3.py
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+For more information, see documentation in the parent class.
+
+Example test run:
+
+python -m apache_beam.testing.load_tests.runtime_type_check_off_test_py3 \
+    --test-pipeline-options="
+    --project=apache-beam-testing
+    --region=us-centrall
+    --publish_to_big_query=true
+    --metrics_dataset=python_load_tests
+    --metrics_table=gbk
+    --nested_typehint=0
+    --fanout=200
+    --num_records=1000
+    --input_options='{
+    \"num_records\": 0,
+    \"key_size\": 0,
+    \"value_size\": 0
+    }'"
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+
+from apache_beam.testing.load_tests.base_runtime_type_check_test_py3 import \
+    BaseRunTimeTypeCheckTest
+
+
+class RunTimeTypeCheckOffTest(BaseRunTimeTypeCheckTest):
+  def __init__(self):
+    self.runtime_type_check = False

Review comment:
       We looked into this but the CLI flags are only parsed after the LoadTest has been initialized, and by that point it's too late to modify the Pipeline options because that happens in the LoadTest constructor.. or can I still modify the options?
   
   It happens [here](https://github.com/apache/beam/pull/12242/files#diff-4ecc0b51b10c78064d087c1db98ec780R105)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org