You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/06 14:16:36 UTC

[GitHub] [airflow] MrGeorgeOwl opened a new pull request, #26915: Rewrite system tests for ML Engine service

MrGeorgeOwl opened a new pull request, #26915:
URL: https://github.com/apache/airflow/pull/26915

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #26915:
URL: https://github.com/apache/airflow/pull/26915#issuecomment-1296546504

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #26915:
URL: https://github.com/apache/airflow/pull/26915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on a diff in pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
bhirsz commented on code in PR #26915:
URL: https://github.com/apache/airflow/pull/26915#discussion_r989702419


##########
tests/system/providers/google/cloud/ml_engine/example_mlengine.py:
##########
@@ -216,28 +222,38 @@
     # [END howto_operator_gcp_mlengine_get_prediction]
 
     # [START howto_operator_gcp_mlengine_delete_version]
-    delete_version = MLEngineDeleteVersionOperator(
-        task_id="delete-version", project_id=PROJECT_ID, model_name=MODEL_NAME, version_name="v1"
+    delete_version_v1 = MLEngineDeleteVersionOperator(
+        task_id="delete-version-v1",
+        project_id=PROJECT_ID,
+        model_name=MODEL_NAME,
+        version_name="v1",
+        trigger_rule=TriggerRule.ALL_DONE,

Review Comment:
   keep trigger_rule out of doc example - as unnecessary addition:
   # [START..
   operator code..
   # [END..
   operator.trigger_rule = TriggerRule.ALL_DONE



##########
tests/system/providers/google/cloud/ml_engine/example_mlengine.py:
##########
@@ -216,28 +222,38 @@
     # [END howto_operator_gcp_mlengine_get_prediction]
 
     # [START howto_operator_gcp_mlengine_delete_version]
-    delete_version = MLEngineDeleteVersionOperator(
-        task_id="delete-version", project_id=PROJECT_ID, model_name=MODEL_NAME, version_name="v1"
+    delete_version_v1 = MLEngineDeleteVersionOperator(
+        task_id="delete-version-v1",
+        project_id=PROJECT_ID,
+        model_name=MODEL_NAME,
+        version_name="v1",
+        trigger_rule=TriggerRule.ALL_DONE,

Review Comment:
   keep trigger_rule out of doc example - as unnecessary addition:
   ```
   # [START..
   operator code..
   # [END..
   operator.trigger_rule = TriggerRule.ALL_DONE
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on a diff in pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
bhirsz commented on code in PR #26915:
URL: https://github.com/apache/airflow/pull/26915#discussion_r989700523


##########
tests/system/providers/google/cloud/ml_engine/example_mlengine.py:
##########
@@ -37,70 +40,74 @@
     MLEngineStartBatchPredictionJobOperator,
     MLEngineStartTrainingJobOperator,
 )
+from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator
 from airflow.providers.google.cloud.utils import mlengine_operator_utils
+from airflow.utils.trigger_rule import TriggerRule
 
-PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DAG_ID = "example_gcp_mlengine"
+BASE_DIR = pathlib.Path(__file__).parent.resolve()
+PREDICT_FILE_NAME = 'predict.json'
+PATH_TO_PREDICT_FILE = BASE_DIR / PREDICT_FILE_NAME
 
-MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", "model_name")
-
-SAVED_MODEL_PATH = os.environ.get("GCP_MLENGINE_SAVED_MODEL_PATH", "gs://INVALID BUCKET NAME/saved-model/")
-JOB_DIR = os.environ.get("GCP_MLENGINE_JOB_DIR", "gs://INVALID BUCKET NAME/keras-job-dir")
-PREDICTION_INPUT = os.environ.get(
-    "GCP_MLENGINE_PREDICTION_INPUT", "gs://INVALID BUCKET NAME/prediction_input.json"
-)
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
+ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", f"example_mlengine_model_{ENV_ID}")

Review Comment:
   Just define constant: MODEL_NAME = f"example_mlengine_model_{ENV_ID}"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on a diff in pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
bhirsz commented on code in PR #26915:
URL: https://github.com/apache/airflow/pull/26915#discussion_r989700784


##########
tests/system/providers/google/cloud/ml_engine/example_mlengine.py:
##########
@@ -37,70 +40,74 @@
     MLEngineStartBatchPredictionJobOperator,
     MLEngineStartTrainingJobOperator,
 )
+from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator
 from airflow.providers.google.cloud.utils import mlengine_operator_utils
+from airflow.utils.trigger_rule import TriggerRule
 
-PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DAG_ID = "example_gcp_mlengine"
+BASE_DIR = pathlib.Path(__file__).parent.resolve()
+PREDICT_FILE_NAME = 'predict.json'
+PATH_TO_PREDICT_FILE = BASE_DIR / PREDICT_FILE_NAME
 
-MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", "model_name")
-
-SAVED_MODEL_PATH = os.environ.get("GCP_MLENGINE_SAVED_MODEL_PATH", "gs://INVALID BUCKET NAME/saved-model/")
-JOB_DIR = os.environ.get("GCP_MLENGINE_JOB_DIR", "gs://INVALID BUCKET NAME/keras-job-dir")
-PREDICTION_INPUT = os.environ.get(
-    "GCP_MLENGINE_PREDICTION_INPUT", "gs://INVALID BUCKET NAME/prediction_input.json"
-)
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
+ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", f"example_mlengine_model_{ENV_ID}")
+BUCKET_NAME = os.environ.get("BUCKET_NAME", f"example_mlengine_bucket_{ENV_ID}")

Review Comment:
   BUCKET_NAME = f"example_mlengine_bucket_{ENV_ID}"
   
   same for other variables - don't use env variables if it's not necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on a diff in pull request #26915: Rewrite system tests for ML Engine service

Posted by GitBox <gi...@apache.org>.
bhirsz commented on code in PR #26915:
URL: https://github.com/apache/airflow/pull/26915#discussion_r989701818


##########
tests/system/providers/google/cloud/ml_engine/example_mlengine.py:
##########
@@ -37,70 +40,74 @@
     MLEngineStartBatchPredictionJobOperator,
     MLEngineStartTrainingJobOperator,
 )
+from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator
 from airflow.providers.google.cloud.utils import mlengine_operator_utils
+from airflow.utils.trigger_rule import TriggerRule
 
-PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DAG_ID = "example_gcp_mlengine"
+BASE_DIR = pathlib.Path(__file__).parent.resolve()
+PREDICT_FILE_NAME = 'predict.json'
+PATH_TO_PREDICT_FILE = BASE_DIR / PREDICT_FILE_NAME
 
-MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", "model_name")
-
-SAVED_MODEL_PATH = os.environ.get("GCP_MLENGINE_SAVED_MODEL_PATH", "gs://INVALID BUCKET NAME/saved-model/")
-JOB_DIR = os.environ.get("GCP_MLENGINE_JOB_DIR", "gs://INVALID BUCKET NAME/keras-job-dir")
-PREDICTION_INPUT = os.environ.get(
-    "GCP_MLENGINE_PREDICTION_INPUT", "gs://INVALID BUCKET NAME/prediction_input.json"
-)
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
+ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+MODEL_NAME = os.environ.get("GCP_MLENGINE_MODEL_NAME", f"example_mlengine_model_{ENV_ID}")
+BUCKET_NAME = os.environ.get("BUCKET_NAME", f"example_mlengine_bucket_{ENV_ID}")
+BUCKET_PATH = os.environ.get("BUCKET_PATH", f"gs://{BUCKET_NAME}")
+JOB_DIR = os.environ.get("GCP_MLENGINE_JOB_DIR", f"{BUCKET_PATH}/job-dir")
+SAVED_MODEL_PATH = os.environ.get("GCP_MLENGINE_SAVED_MODEL_PATH", f"{JOB_DIR}/")
+PREDICTION_INPUT = os.environ.get("GCP_MLENGINE_PREDICTION_INPUT", f"{BUCKET_PATH}/{PREDICT_FILE_NAME}")
 PREDICTION_OUTPUT = os.environ.get(
-    "GCP_MLENGINE_PREDICTION_OUTPUT", "gs://INVALID BUCKET NAME/prediction_output"
+    "GCP_MLENGINE_PREDICTION_OUTPUT", "gs://INVALID BUCKET NAME/prediction_output/"
+)
+TRAINER_URI = os.environ.get(
+    "GCP_MLENGINE_TRAINER_URI",
+    "gs://system-tests-resources/example_gcp_mlengine/trainer-0.1.tar.gz",
+)
+TRAINER_PY_MODULE = os.environ.get(
+    "GCP_MLENGINE_TRAINER_TRAINER_PY_MODULE",
+    "trainer.task",
 )
-TRAINER_URI = os.environ.get("GCP_MLENGINE_TRAINER_URI", "gs://INVALID BUCKET NAME/trainer.tar.gz")
-TRAINER_PY_MODULE = os.environ.get("GCP_MLENGINE_TRAINER_TRAINER_PY_MODULE", "trainer.task")
+SUMMARY_TMP = os.environ.get("GCP_MLENGINE_DATAFLOW_TMP", f"{BUCKET_PATH}/tmp/")
+SUMMARY_STAGING = os.environ.get("GCP_MLENGINE_DATAFLOW_STAGING", f"{BUCKET_PATH}/staging/")
 
-SUMMARY_TMP = os.environ.get("GCP_MLENGINE_DATAFLOW_TMP", "gs://INVALID BUCKET NAME/tmp/")
-SUMMARY_STAGING = os.environ.get("GCP_MLENGINE_DATAFLOW_STAGING", "gs://INVALID BUCKET NAME/staging/")
+
+def generate_model_predict_input_data() -> list[int]:
+    return [i for i in range(0, 201, 10)]
 
 
 with models.DAG(
-    "example_gcp_mlengine",
+    dag_id=DAG_ID,
+    schedule="@once",
     start_date=datetime(2021, 1, 1),
     catchup=False,
-    tags=['example'],
+    tags=['example', 'ml_engine'],
     params={"model_name": MODEL_NAME},
 ) as dag:
-    hyperparams: dict[str, Any] = {
-        'goal': 'MAXIMIZE',
-        'hyperparameterMetricTag': 'metric1',
-        'maxTrials': 30,
-        'maxParallelTrials': 1,
-        'enableTrialEarlyStopping': True,
-        'params': [],
-    }
-
-    hyperparams['params'].append(
-        {
-            'parameterName': 'hidden1',
-            'type': 'INTEGER',
-            'minValue': 40,
-            'maxValue': 400,
-            'scaleType': 'UNIT_LINEAR_SCALE',
-        }
+    create_bucket = GCSCreateBucketOperator(
+        task_id="create-bucket",
+        bucket_name=BUCKET_NAME,
     )
 
-    hyperparams['params'].append(
-        {'parameterName': 'numRnnCells', 'type': 'DISCRETE', 'discreteValues': [1, 2, 3, 4]}
+    def write_predict_file(path_to_file: str):
+        predict_data = generate_model_predict_input_data()
+        with open(path_to_file, 'w') as file:
+            for p in predict_data:
+                file.write(f'{{"input_layer": [{p}]}}\n')
+
+    write_data = PythonOperator(
+        task_id="write-predict-data-file",
+        python_callable=write_predict_file,
+        op_args=(PATH_TO_PREDICT_FILE,),
     )

Review Comment:
   The code is fine but with the introduction of ``@task`` it's recommended more: https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org