You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/26 13:29:38 UTC

[GitHub] [airflow] bharanidharan14 opened a new pull request, #25309: Add `op_classpath` in log

bharanidharan14 opened a new pull request, #25309:
URL: https://github.com/apache/airflow/pull/25309

   Adding op class path for the task instance log in order to track where the operator/sensor originated from


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb closed pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
ashb closed pull request #25309: Add `op_classpath` in log
URL: https://github.com/apache/airflow/pull/25309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb commented on pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
ashb commented on PR #25309:
URL: https://github.com/apache/airflow/pull/25309#issuecomment-1199268430

   Going to close this one since (as it is currently implemented it is "expensive" for minimal general benefit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bharanidharan14 commented on a diff in pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
bharanidharan14 commented on code in PR #25309:
URL: https://github.com/apache/airflow/pull/25309#discussion_r930231470


##########
airflow/jobs/scheduler_job.py:
##########
@@ -636,8 +636,24 @@ def _process_executor_events(self, session: Session) -> int:
                 "run_start_date=%s, run_end_date=%s, "
                 "run_duration=%s, state=%s, executor_state=%s, try_number=%s, max_tries=%s, job_id=%s, "
                 "pool=%s, queue=%s, priority_weight=%d, operator=%s, queued_dttm=%s, "
-                "queued_by_job_id=%s, pid=%s"
+                "queued_by_job_id=%s, pid=%s, op_classpath=%s"
             )
+
+            # Adding op_class path for the task instance log in order to track where the
+            # operator/sensor originated from
+            # Get task from the Serialized DAG
+            try:
+                dag = self.dagbag.get_dag(ti.dag_id)

Review Comment:
   This part of code was there already, I just moved few line above and made use of it for the log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bharanidharan14 commented on a diff in pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
bharanidharan14 commented on code in PR #25309:
URL: https://github.com/apache/airflow/pull/25309#discussion_r930231470


##########
airflow/jobs/scheduler_job.py:
##########
@@ -636,8 +636,24 @@ def _process_executor_events(self, session: Session) -> int:
                 "run_start_date=%s, run_end_date=%s, "
                 "run_duration=%s, state=%s, executor_state=%s, try_number=%s, max_tries=%s, job_id=%s, "
                 "pool=%s, queue=%s, priority_weight=%d, operator=%s, queued_dttm=%s, "
-                "queued_by_job_id=%s, pid=%s"
+                "queued_by_job_id=%s, pid=%s, op_classpath=%s"
             )
+
+            # Adding op_class path for the task instance log in order to track where the
+            # operator/sensor originated from
+            # Get task from the Serialized DAG
+            try:
+                dag = self.dagbag.get_dag(ti.dag_id)

Review Comment:
   Actually this part of code was there already I made use of it for the log



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb commented on a diff in pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
ashb commented on code in PR #25309:
URL: https://github.com/apache/airflow/pull/25309#discussion_r930046477


##########
airflow/jobs/scheduler_job.py:
##########
@@ -636,8 +636,24 @@ def _process_executor_events(self, session: Session) -> int:
                 "run_start_date=%s, run_end_date=%s, "
                 "run_duration=%s, state=%s, executor_state=%s, try_number=%s, max_tries=%s, job_id=%s, "
                 "pool=%s, queue=%s, priority_weight=%d, operator=%s, queued_dttm=%s, "
-                "queued_by_job_id=%s, pid=%s"
+                "queued_by_job_id=%s, pid=%s, op_classpath=%s"
             )
+
+            # Adding op_class path for the task instance log in order to track where the
+            # operator/sensor originated from
+            # Get task from the Serialized DAG
+            try:
+                dag = self.dagbag.get_dag(ti.dag_id)

Review Comment:
   This is, comparatively a _very_ expensive operation so I am wary of adding this feature like this without knowing how much it would impact performance in many large dags+high scheduling throughput cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] steveyz-astro commented on a diff in pull request #25309: Add `op_classpath` in log

Posted by GitBox <gi...@apache.org>.
steveyz-astro commented on code in PR #25309:
URL: https://github.com/apache/airflow/pull/25309#discussion_r930336226


##########
airflow/jobs/scheduler_job.py:
##########
@@ -636,8 +636,24 @@ def _process_executor_events(self, session: Session) -> int:
                 "run_start_date=%s, run_end_date=%s, "
                 "run_duration=%s, state=%s, executor_state=%s, try_number=%s, max_tries=%s, job_id=%s, "
                 "pool=%s, queue=%s, priority_weight=%d, operator=%s, queued_dttm=%s, "
-                "queued_by_job_id=%s, pid=%s"
+                "queued_by_job_id=%s, pid=%s, op_classpath=%s"
             )
+
+            # Adding op_class path for the task instance log in order to track where the
+            # operator/sensor originated from
+            # Get task from the Serialized DAG
+            try:
+                dag = self.dagbag.get_dag(ti.dag_id)

Review Comment:
   yes, but the original placement of the code would only be executed in a corner case, rather than every time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org