You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/21 12:59:49 UTC

[GitHub] [airflow] suhanovv opened a new pull request #17769: Fix dag_processing.last_duration metric random holes

suhanovv opened a new pull request #17769:
URL: https://github.com/apache/airflow/pull/17769


   Fixes for [17513](https://github.com/apache/airflow/issues/17513)
   
   Problem with metrics arises at the moment when between the intervals AIRFLOV__SHEDULER__PRINT_STATS_INTERVAL file processing ends and dag processor is removed from DagFileProcessorManager._processors and as a result the DagFileProcessorManager.get_start_time method returns Nona and no statistics are sent on the file, moreover, this metric does not display the total processing time of the file, it shows time from the beginning of processing file until the call to DagFileProcessorManager._log_file_processing_stats.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang merged pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
xinbinhuang merged pull request #17769:
URL: https://github.com/apache/airflow/pull/17769


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903181242


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903921667


   I think the test is just flaky. Merging now 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv edited a comment on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv edited a comment on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903859306


   @ashb, i did, but tests for mssql fail in random places. =(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#discussion_r693405701



##########
File path: tests/dag_processing/test_manager.py
##########
@@ -694,6 +695,48 @@ def fake_processor_(*args, **kwargs):
             child_pipe.close()
             thread.join(timeout=1.0)
 
+    @conf_vars({('core', 'load_examples'): 'False'})
+    @mock.patch('airflow.dag_processing.manager.Stats.timing')
+    def test_send_file_processing_statsd_timing(self, statsd_timing_mock, tmpdir):
+        # arrange
+        filename_to_parse = tmpdir / 'temp_dag.py'
+        # Generate dag
+        dag_code = dedent(
+            """
+        from airflow import DAG
+        dag = DAG(dag_id='temp_dag', schedule_interval='0 0 * * *')
+        """
+        )
+        with open(filename_to_parse, 'w') as file_to_parse:
+            file_to_parse.writelines(dag_code)
+
+        child_pipe, parent_pipe = multiprocessing.Pipe()
+
+        async_mode = 'sqlite' not in conf.get('core', 'sql_alchemy_conn')
+        manager = DagFileProcessorManager(
+            dag_directory=tmpdir,
+            max_runs=1,
+            processor_timeout=timedelta.max,
+            signal_conn=child_pipe,
+            dag_ids=[],
+            pickle_dags=False,
+            async_mode=async_mode,
+        )
+
+        # act

Review comment:
       ```suggestion
   ```

##########
File path: tests/dag_processing/test_manager.py
##########
@@ -694,6 +695,48 @@ def fake_processor_(*args, **kwargs):
             child_pipe.close()
             thread.join(timeout=1.0)
 
+    @conf_vars({('core', 'load_examples'): 'False'})
+    @mock.patch('airflow.dag_processing.manager.Stats.timing')
+    def test_send_file_processing_statsd_timing(self, statsd_timing_mock, tmpdir):
+        # arrange

Review comment:
       ```suggestion
   ```

##########
File path: tests/dag_processing/test_manager.py
##########
@@ -694,6 +695,48 @@ def fake_processor_(*args, **kwargs):
             child_pipe.close()
             thread.join(timeout=1.0)
 
+    @conf_vars({('core', 'load_examples'): 'False'})
+    @mock.patch('airflow.dag_processing.manager.Stats.timing')
+    def test_send_file_processing_statsd_timing(self, statsd_timing_mock, tmpdir):
+        # arrange
+        filename_to_parse = tmpdir / 'temp_dag.py'
+        # Generate dag

Review comment:
       ```suggestion
   ```

##########
File path: tests/dag_processing/test_manager.py
##########
@@ -694,6 +695,48 @@ def fake_processor_(*args, **kwargs):
             child_pipe.close()
             thread.join(timeout=1.0)
 
+    @conf_vars({('core', 'load_examples'): 'False'})
+    @mock.patch('airflow.dag_processing.manager.Stats.timing')
+    def test_send_file_processing_statsd_timing(self, statsd_timing_mock, tmpdir):
+        # arrange
+        filename_to_parse = tmpdir / 'temp_dag.py'
+        # Generate dag
+        dag_code = dedent(
+            """
+        from airflow import DAG
+        dag = DAG(dag_id='temp_dag', schedule_interval='0 0 * * *')
+        """
+        )
+        with open(filename_to_parse, 'w') as file_to_parse:
+            file_to_parse.writelines(dag_code)
+
+        child_pipe, parent_pipe = multiprocessing.Pipe()
+
+        async_mode = 'sqlite' not in conf.get('core', 'sql_alchemy_conn')
+        manager = DagFileProcessorManager(
+            dag_directory=tmpdir,
+            max_runs=1,
+            processor_timeout=timedelta.max,
+            signal_conn=child_pipe,
+            dag_ids=[],
+            pickle_dags=False,
+            async_mode=async_mode,
+        )
+
+        # act
+        with create_session():
+            self.run_processor_manager_one_loop(manager, parent_pipe)
+
+        child_pipe.close()
+        parent_pipe.close()
+
+        # assert
+        # we check that after processing the file and removing it from DagFileProcessorManager._processors,
+        # the statistics on the last processing was sent to the statsd

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903859306


   @ashb, i did, but tests for mssql fail in random places.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903592983


   > Is there a reason we don't emit the metric when each file is finished parsing instead? Tying metrics emission to how often we _print_ them in the logs seems slightly unexpected.
   
   I think that sending the metric `dag_processing.last_duration.{file_name}` right after the file has finished processing is much better and more obvious. And I would still leave the metric `dag_processing.last_run.seconds_ago.{file_name}` in the binding to _print_stat.
   
   if there is no reason why should not transfer the metric, then I will transfer the emission of the `dag_processing.last_duration.{file_name}` to the DagFileProcessorManager._collect_results_from_processor method, okay?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903183708


   @suhanovv tests are failing. Can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903283152


   @xinbinhuang
   
   I fixed tests, except mssql, it looks like there is a problem with the environment, I started the container locally
   
   `./breeze --github-image-id 80b6a627a2968c0a2e4d69ea0f970f3178b254dd --backend mssql --python 3.9 --db-reset --skip-mounting-local-sources --test-type Other` and tests worked in it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903922435


   @xinbinhuang good, thnx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903183708


   @suhanovv tests are failing. Can you take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
suhanovv commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903227292


   > @suhanovv tests are failing. Can you take a look? Thanks!
   
   I'll see


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903716491


   Yup, sounds good @suhanovv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #17769: Fix dag_processing.last_duration metric random holes

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #17769:
URL: https://github.com/apache/airflow/pull/17769#issuecomment-903551658


   Is there a reason we don't emit the metric when each file is finished parsing instead? Tying metrics emission to how often we _print_ them in the logs seems slightly unexpected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org