You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/15 14:43:02 UTC

[GitHub] [airflow] yesemsanthoshkumar opened a new pull request #10343: [WIP] Add extra links for google dataproc

yesemsanthoshkumar opened a new pull request #10343:
URL: https://github.com/apache/airflow/pull/10343


   Add extra links for google cloud operators - Dataproc
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-681830809


   @yesemsanthoshkumar can you please rebase? We started using black on providers package πŸ‘ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-745517622


   [The Workflow run](https://github.com/apache/airflow/actions/runs/423979013) is cancelling this PR. Building image for the PR has been cancelled


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471320145



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -590,7 +681,7 @@ def test_execute(self, mock_hook, mock_uuid):
             query=self.query,
             variables=self.variables,
         )
-        op.execute(context={})
+        op.execute(context=MagicMock())

Review comment:
       I suppose you did this because of adding `xcom_push` in `execute`, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r470999708



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()

Review comment:
       Deete=>delete




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471618635



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -1704,6 +1743,8 @@ def execute(self, context: Dict):
         )
         self.log.info("Job completed successfully.")
 
+        context['task_instance'].xcom_push(key='job_id', value=job_id)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471310137



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(
+            # pylint: disable=line-too-long
+            'https://console.cloud.google.com/dataproc/clusters/{cluster_name}/job?region={region}&project={project_id}'.format(    # noqa: E501
+                cluster_name=CLUSTER_NAME,
+                region=GCP_LOCATION,
+                project_id=GCP_PROJECT
+            ),
+            op.get_extra_links(execution_date, DataprocJobLink.name)
+        )
+
+        ti.xcom_push(key='job_id', value=job_id)
+
+        self.assertEqual(
+            # pylint: disable=line-too-long

Review comment:
       Same here with fstring. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479757412



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -1051,17 +1051,17 @@ def execute(self, context):
                 project_id=self.project_id, job=self.job["job"], location=self.region,
             )
             job_id = job_object.reference.job_id
+            # XCom push is referenced by extra links and has to be before polling for job completion

Review comment:
       Yeah. Clear than my comment. Will make changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471627286



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -590,7 +681,7 @@ def test_execute(self, mock_hook, mock_uuid):
             query=self.query,
             variables=self.variables,
         )
-        op.execute(context={})
+        op.execute(context=MagicMock())

Review comment:
       Yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471316200



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Let's use `setUpClass` so we do this logic once




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r625310534



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -173,11 +218,11 @@ def __init__(
         properties: Optional[Dict] = None,
         optional_components: Optional[List[str]] = None,
         num_masters: int = 1,
-        master_machine_type: str = 'n1-standard-4',
-        master_disk_type: str = 'pd-standard',
+        master_machine_type: str = "n1-standard-4",
+        master_disk_type: str = "pd-standard",
         master_disk_size: int = 1024,
-        worker_machine_type: str = 'n1-standard-4',
-        worker_disk_type: str = 'pd-standard',
+        worker_machine_type: str = "n1-standard-4",
+        worker_disk_type: str = "pd-standard",

Review comment:
       Unrelated changes, please revert all quote changes as asked previously by @ashb 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #10343:
URL: https://github.com/apache/airflow/pull/10343


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-675511442


   > We should also add the console link to
   > https://github.com/apache/airflow/blob/558be73ae808b551a8ccf59a5c1896b5718f2f44/airflow/serialization/serialized_objects.py#L44-L49
   > 
   > And as mentioned in a comment - we should add the link in `DataprocSubmitJobOperator` as other ops are deprecated. So, we may even consider to add this **only** to this one operator πŸ˜‰
   
   @turbaszek I've added to those other operator as well already.
   :D


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471624064



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Have made those changes. One question though, I've added the tests in DataProcSubmiSparkOperator(DataprocJobBaseOperator). Should I add the tests for other subclass operators as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471318268



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       And we can use `tests.tests_utils.db.clear_db_runs` and `clear_db_xcom` as we push something to XCom




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471318268



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       And we can use `tests.tests_utils.db.clear_db_runs` (and `clear_db_xcom` as we push something to XCom) instead duplicating the logic




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471309592



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(

Review comment:
       Little bit over engineering but maybe this way:
   Them `# pylint: disable=line-too-long` won't be needed anymore.
   ```
   
   URL = "https://console.cloud.google.com/dataproc/"
   cluster_name = CLUSTER_NAME
   region = GCP_LOCATION
   project_id = GCP_PROJECT
   
   
   self.assertEqual(
     # pylint: disable=line-too-long
     f'{URL}clusters/{cluster_name}/job?region={region}&project={project_id}'
     op.get_extra_links(execution_date, DataprocJobLink.name)
   )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek edited a comment on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek edited a comment on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-697997022


   @ashb why do rebase now? We should wait with this till 2.0 otherwise we will break serialization one more time or we will need to remove the links when preparing backport packages. Reference:
   https://github.com/apache/airflow/issues/10171
   https://github.com/apache/airflow/pull/10318
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479943237



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -41,13 +41,63 @@
 from google.protobuf.json_format import MessageToDict
 
 from airflow.exceptions import AirflowException
-from airflow.models import BaseOperator
-from airflow.providers.google.cloud.hooks.dataproc import DataprocHook, DataProcJobBuilder
+from airflow.models import BaseOperator, BaseOperatorLink
+from airflow.models.taskinstance import TaskInstance
+from airflow.providers.google.cloud.hooks.dataproc import (
+    DataprocHook,
+    DataProcJobBuilder,
+)
 from airflow.providers.google.cloud.hooks.gcs import GCSHook
 from airflow.utils import timezone
 from airflow.utils.decorators import apply_defaults
 from airflow.version import version as airflow_version
 
+DATAPROC_BASE_LINK = "https://console.cloud.google.com/dataproc"
+DATAPROC_JOB_LOG_LINK = DATAPROC_BASE_LINK + "/jobs/{job_id}?region={region}&project={project_id}"
+DATAPROC_CLUSTER_LINK = (
+    DATAPROC_BASE_LINK + "/clusters/{cluster_name}/monitoring?" "region={region}&project={project_id}"

Review comment:
       ```suggestion
       DATAPROC_BASE_LINK + "/clusters/{cluster_name}/monitoring?region={region}&project={project_id}"
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471302080



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -39,13 +39,41 @@
 from google.protobuf.json_format import MessageToDict
 
 from airflow.exceptions import AirflowException
-from airflow.models import BaseOperator
+from airflow.models import BaseOperator, BaseOperatorLink
+from airflow.models.taskinstance import TaskInstance
 from airflow.providers.google.cloud.hooks.dataproc import DataprocHook, DataProcJobBuilder
 from airflow.providers.google.cloud.hooks.gcs import GCSHook
 from airflow.utils import timezone
 from airflow.utils.decorators import apply_defaults
 from airflow.version import version as airflow_version
 
+# pylint: disable=line-too-long
+DATAPROC_JOB_LOG_LINK = "https://console.cloud.google.com/dataproc/jobs/{job_id}?region={region}&project={project_id}"      # noqa: E501

Review comment:
       Maybe this way?
   ```suggestion
   DATAPROC_JOB_LOG_LINK = \
     "https://console.cloud.google.com/dataproc/jobs/{job_id}?region={region}&project={project_id}" 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-698002943


   Oh sorry, I misread the conversation in this issue!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479758381



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -627,6 +681,11 @@ def execute(self, context):
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
+        self.xcom_push(
+            context,
+            key="cluster_conf",
+            value={"cluster_name": self.cluster_name, "region": self.region, "project_id": self.project_id,},
+        )

Review comment:
       Awesome, let me know once I can re-run DAG to check if the links work as expected




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479820695



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -627,6 +681,11 @@ def execute(self, context):
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
+        self.xcom_push(
+            context,
+            key="cluster_conf",
+            value={"cluster_name": self.cluster_name, "region": self.region, "project_id": self.project_id,},
+        )

Review comment:
       @turbaszek Updated. Please take a look.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471766806



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Those operators are depecated so it's ok if we test with only one. But please add tests for `DataprocSubmitJobOperator`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-739777974


   @yesemsanthoshkumar  -> can you please rebase so that we could merge it before 2.0.0 rc1 ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-683651990


   @turbaszek Should I rebase now before merging?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r472246752



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -590,7 +681,7 @@ def test_execute(self, mock_hook, mock_uuid):
             query=self.query,
             variables=self.variables,
         )
-        op.execute(context={})
+        op.execute(context=MagicMock())

Review comment:
       Have added a mock task instance for testing the xcom push. This change is regarding the operator extra links. Please take a look at the updated tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479562688



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -434,7 +591,10 @@ def test_execute(self, mock_hook):
             request_id=REQUEST_ID,
             impersonation_chain=IMPERSONATION_CHAIN,
         )
-        op.execute(context={})
+        op.execute(context=self.mock_context)
+
+        # Test whether xcom push occurs before polling for job
+        self.extra_links_manager_mock.assert_has_calls(self.extra_links_expected_calls, any_order=False)

Review comment:
       @turbaszek Have added checks for xcom_push before wait_for_job too. Please take a look.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479622301



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -627,6 +681,11 @@ def execute(self, context):
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
+        self.xcom_push(
+            context,
+            key="cluster_conf",
+            value={"cluster_name": self.cluster_name, "region": self.region, "project_id": self.project_id,},
+        )

Review comment:
       Let's move it to right after hook invocation. In this way, there will be cluster link no matter what is the state of the cluster. Only problem I see is when deleting cluster on error state but this is an edge case




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-683681753


   @yesemsanthoshkumar no need to rebase. However, we will need to wait with the merge until #10014 is done. That's because 1.10.2 and earlier versions will not support those new extra links. So this PR should be targeted to Airflow 2.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-674406582


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better πŸš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://apache-airflow-slack.herokuapp.com/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471616705



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(

Review comment:
       Have moved the expected link to a global variable. Let me know if it needs further changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-747598371


   @turbaszek Updated the PR. Please take a look and let me know of any further changes required.
   
   @ashb I guess the quote changes(' -> ") you mentioned are the result of applying black to those files. Or are you referring to some other changes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471309592



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(

Review comment:
       Little bit over engineering but maybe this way:
   Them `# pylint: disable=line-too-long` won't be needed anymore.
   
   `URL = "https://console.cloud.google.com/dataproc/"
   cluster_name = CLUSTER_NAME
   region = GCP_LOCATION
   project_id = GCP_PROJECT
   
   
   self.assertEqual(
     # pylint: disable=line-too-long
     f'{URL}clusters/{cluster_name}/job?region={region}&project={project_id}'
     op.get_extra_links(execution_date, DataprocJobLink.name)
   )`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471320990



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -763,6 +854,23 @@ class TestDataProcSparkOperator(unittest.TestCase):
         "spark_job": {"jar_file_uris": jars, "main_class": main_class},
     }
 
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Same here, we can use existing methods from `tests.test_utils.db`. And here we also should clear XCom to avoid side effect that is created on L953




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-681832293


   Nice to see this as part of #10014 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-741540445


   REmove it from 2.0rc1 for now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479757447



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -627,6 +681,11 @@ def execute(self, context):
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
+        self.xcom_push(
+            context,
+            key="cluster_conf",
+            value={"cluster_name": self.cluster_name, "region": self.region, "project_id": self.project_id,},
+        )

Review comment:
       Similar flow with the jobs. Sure. Will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-745965878


   @yesemsanthoshkumar could you take a look the CI issues? Some static checks seem to be failing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471317265



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +480,70 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+        # pylint: disable=line-too-long
+        expected_extra_link = 'https://console.cloud.google.com/dataproc/jobs/{job_id}?region={region}&project={project_id}'.format(     # noqa: E501
+            job_id=job_id,
+            region=GCP_LOCATION,
+            project_id=GCP_PROJECT
+        )

Review comment:
       ```suggestion
           expected_extra_link = \
               f'https://console.cloud.google.com/dataproc/jobs/{job_id}?'
               f'region={GCP_LOCATION}&project={GCP_PROJECT}'
   ```
   How about using f-string?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek edited a comment on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek edited a comment on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-683681753


   @yesemsanthoshkumar no need to rebase. However, we will need to wait with the merge until #10014 is done. That's because 1.10.12 and earlier versions will not support those new extra links. So this PR should be targeted to Airflow 2.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-834203888


   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-697997022


   @ashb why do rebase now? We should wait with this till 2.0 otherwise we will break serialization one more time. Reference:
   https://github.com/apache/airflow/issues/10171
   https://github.com/apache/airflow/pull/10318
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471309592



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(

Review comment:
       Little bit over engineering but maybe this way:
   Them `# pylint: disable=line-too-long` won't be needed anymore.
   ```
   
   URL = "https://console.cloud.google.com/dataproc/"
   cluster_name = CLUSTER_NAME
   region = GCP_LOCATION
   project_id = GCP_PROJECT
   
   
   self.assertEqual(
     f'{URL}clusters/{cluster_name}/job?region={region}&project={project_id}'
     op.get_extra_links(execution_date, DataprocJobLink.name)
   )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471780005



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -419,6 +428,24 @@ def test_execute(self, mock_hook):
 
 
 class TestDataprocSubmitJobOperator(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Added already.
   https://github.com/apache/airflow/pull/10343/files#diff-68c5fc800ea76a624d56445376a5744fR485




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471617008



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(
+            # pylint: disable=line-too-long
+            'https://console.cloud.google.com/dataproc/clusters/{cluster_name}/job?region={region}&project={project_id}'.format(    # noqa: E501
+                cluster_name=CLUSTER_NAME,
+                region=GCP_LOCATION,
+                project_id=GCP_PROJECT
+            ),
+            op.get_extra_links(execution_date, DataprocJobLink.name)
+        )
+
+        ti.xcom_push(key='job_id', value=job_id)
+
+        self.assertEqual(
+            # pylint: disable=line-too-long

Review comment:
       Same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-675516198


   @mik-laj
   > 2. I'm directing the link to the cluster if the job hasn't run and not generated any JobId. Is this approach valid? Do you want me to change this behaviour?
   
   The link is not being redirected to the cluster URL if there is no detail of the job in the system.
   
   > 3. I coudn't find the cluster_name argument in [DataprocSubmitJobOperator](https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/dataproc/index.html#airflow.providers.google.cloud.operators.dataproc.DataprocSubmitJobOperator) for which I've added the links. Shoud I add the links to [DataprocJobBaseOperator](https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/dataproc/index.html#airflow.providers.google.cloud.operators.dataproc.DataprocJobBaseOperator)? Or Should I add on both?
   
   Implemented on both the operators. For subclasses of DataprocJobBaseOperators, I've added tests only for DataprocSubmitSparkJob operator as the other operators are anyway deprecated. Please have a look at the above comments by @turbaszek regarding this.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-834203191


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471980799



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(

Review comment:
       Great. :)

##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +477,50 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+
+        op = DataprocSubmitJobOperator(
+            task_id=TASK_ID,
+            location=GCP_LOCATION,
+            project_id=GCP_PROJECT,
+            job=job,
+            gcp_conn_id=GCP_CONN_ID,
+            cluster_name=CLUSTER_NAME
+        )
+        self.dag.clear()
+        session.query(XCom).deete()
+
+        ti = TaskInstance(
+            task=op,
+            execution_date=execution_date
+        )
+
+        self.assertEqual(
+            # pylint: disable=line-too-long
+            'https://console.cloud.google.com/dataproc/clusters/{cluster_name}/job?region={region}&project={project_id}'.format(    # noqa: E501
+                cluster_name=CLUSTER_NAME,
+                region=GCP_LOCATION,
+                project_id=GCP_PROJECT
+            ),
+            op.get_extra_links(execution_date, DataprocJobLink.name)
+        )
+
+        ti.xcom_push(key='job_id', value=job_id)
+
+        self.assertEqual(
+            # pylint: disable=line-too-long

Review comment:
       Cool




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r479621704



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -1051,17 +1051,17 @@ def execute(self, context):
                 project_id=self.project_id, job=self.job["job"], location=self.region,
             )
             job_id = job_object.reference.job_id
+            # XCom push is referenced by extra links and has to be before polling for job completion

Review comment:
       ```suggestion
               # Save data required to display extra link no matter what job status will be
   ```
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471626971



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -453,6 +480,70 @@ def test_execute(self, mock_hook):
             job_id=job_id, project_id=GCP_PROJECT, location=GCP_LOCATION
         )
 
+    @provide_session
+    def test_operator_extra_links(self, session):
+        job = {}
+        job_id = 'test_job_id_12345'
+        execution_date = datetime(2020, 7, 20)
+        # pylint: disable=line-too-long
+        expected_extra_link = 'https://console.cloud.google.com/dataproc/jobs/{job_id}?region={region}&project={project_id}'.format(     # noqa: E501
+            job_id=job_id,
+            region=GCP_LOCATION,
+            project_id=GCP_PROJECT
+        )

Review comment:
       Moved the expected link to a variable. Also converted it into f-string. Let me know if there is a better approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471629234



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -763,6 +854,23 @@ class TestDataProcSparkOperator(unittest.TestCase):
         "spark_job": {"jar_file_uris": jars, "main_class": main_class},
     }
 
+    def setUp(self):
+        self.dagbag = DagBag(
+            dag_folder='/dev/null', include_examples=False
+        )
+        self.dag = DAG(TEST_DAG_ID, default_args={
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        })
+        self.mock_context = MagicMock()
+
+    def tearDown(self):
+        session = Session()
+        session.query(TaskInstance).filter_by(
+            dag_id=TEST_DAG_ID).delete()
+        session.commit()
+        session.close()

Review comment:
       Done using test_utils




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-744016937


   [The Workflow run](https://github.com/apache/airflow/actions/runs/418966296) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-834204142


   Wooho! :tada: -> took some time :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek edited a comment on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek edited a comment on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-683681753






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-674410120


   @mik-laj 
   I've added extra link for dataproc jobs.
   
   However, I've the following questions.
   
   1. While running the pre-commit hooks, I see both pylint and fake8 running checks. I've marked both of them to disable line too long error for some lines inside the files. Why do we have both of them?
   2. I'm directing the link to the cluster if the job hasn't run and not generated any JobId. Is this approach valid? Do you want me to change this behaviour?
   3. I coudn't find the cluster_name argument in [DataprocSubmitJobOperator](https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/dataproc/index.html#airflow.providers.google.cloud.operators.dataproc.DataprocSubmitJobOperator) for which I've added the links. Shoud I add the links to [DataprocJobBaseOperator](https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/dataproc/index.html#airflow.providers.google.cloud.operators.dataproc.DataprocJobBaseOperator)? Or Should I add on both?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471315343



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -1704,6 +1743,8 @@ def execute(self, context: Dict):
         )
         self.log.info("Job completed successfully.")
 
+        context['task_instance'].xcom_push(key='job_id', value=job_id)

Review comment:
       ```suggestion
           self.xcom_push(key='job_id', value=job_id)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-683610869


   Tested both links. This rocks! Thanks @yesemsanthoshkumar πŸš€ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471315343



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -1704,6 +1743,8 @@ def execute(self, context: Dict):
         )
         self.log.info("Job completed successfully.")
 
+        context['task_instance'].xcom_push(key='job_id', value=job_id)

Review comment:
       ```suggestion
           self.xcom_push(context, key='job_id', value=job_id)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10343: Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#issuecomment-745517275


   [The Workflow run](https://github.com/apache/airflow/actions/runs/424011369) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10343: [WIP] Add extra links for google dataproc

Posted by GitBox <gi...@apache.org>.
yesemsanthoshkumar commented on a change in pull request #10343:
URL: https://github.com/apache/airflow/pull/10343#discussion_r471615299



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -39,13 +39,41 @@
 from google.protobuf.json_format import MessageToDict
 
 from airflow.exceptions import AirflowException
-from airflow.models import BaseOperator
+from airflow.models import BaseOperator, BaseOperatorLink
+from airflow.models.taskinstance import TaskInstance
 from airflow.providers.google.cloud.hooks.dataproc import DataprocHook, DataProcJobBuilder
 from airflow.providers.google.cloud.hooks.gcs import GCSHook
 from airflow.utils import timezone
 from airflow.utils.decorators import apply_defaults
 from airflow.version import version as airflow_version
 
+# pylint: disable=line-too-long
+DATAPROC_JOB_LOG_LINK = "https://console.cloud.google.com/dataproc/jobs/{job_id}?region={region}&project={project_id}"      # noqa: E501

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org