You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/14 23:41:26 UTC

[GitHub] [airflow] ferruzzi commented on a change in pull request #21863: Add Quicksight create ingestion Hook and Operator

ferruzzi commented on a change in pull request #21863:
URL: https://github.com/apache/airflow/pull/21863#discussion_r826462628



##########
File path: airflow/providers/amazon/aws/sensors/quicksight.py
##########
@@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from airflow.exceptions import AirflowException
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+from airflow.sensors.base import BaseSensorOperator
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+
+class QuickSightSensor(BaseSensorOperator):
+    """
+    Watches for the status of a QuickSight Ingestion.

Review comment:
       ```suggestion
       Watches for the status of an Amazon QuickSight Ingestion.
   ```

##########
File path: airflow/providers/amazon/aws/example_dags/example_quicksight.py
##########
@@ -0,0 +1,67 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from datetime import datetime
+
+from airflow import DAG
+from airflow.providers.amazon.aws.operators.quicksight import QuickSightCreateIngestionOperator
+from airflow.providers.amazon.aws.sensors.quicksight import QuickSightSensor
+
+DATA_SET_ID = "DemoDataSet_Test"
+INGESTION_WAITING_ID = "DemoDataSet_Ingestion_Waiting_Test"
+INGESTION_NO_WAITING_ID = "DemoDataSet_Ingestion_No_Waiting_Test"
+AWS_ACCOUNT_ID = "123456789012"
+
+with DAG(
+    "sample_quicksight_dag",
+    schedule_interval=None,
+    start_date=datetime(2022, 2, 21),
+    catchup=False,

Review comment:
       ```suggestion
       dag_id='example_quicksight',
       schedule_interval=None,
       start_date=datetime(2021, 1, 1),
       tags=['example'],
       catchup=False,
   ```
   
   Nitpick:  We're working on organizing and standardizing the AWS examples and docs.  If you don't mind applying these changes to match what we're doing, I'd appreciate it.

##########
File path: airflow/providers/amazon/aws/operators/quicksight.py
##########
@@ -0,0 +1,100 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional, Sequence
+
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+DEFAULT_CONN_ID = "aws_default"
+
+
+class QuickSightCreateIngestionOperator(BaseOperator):
+    """
+    Creates and starts a new SPICE ingestion for a dataset.
+    Also, helps to Refresh existing SPICE datasets
+
+    :param data_set_id:  ID of the dataset used in the ingestion.
+    :param ingestion_id: ID for the ingestion.
+    :param aws_account_id: Amazon Web Services account ID.
+    :param ingestion_type: Type of ingestion. Values Can be  INCREMENTAL_REFRESH or FULL_REFRESH.
+        Default FULL_REFRESH.
+    :param wait_for_completion: If wait is set to True, the time interval, in seconds,
+        that the operation waits to check the status of the QuickSight Ingestion.

Review comment:
       ```suggestion
           that the operation waits to check the status of the Amazon QuickSight Ingestion.
   ```
   
   Nitpicking, if you don't mind.

##########
File path: airflow/providers/amazon/aws/sensors/quicksight.py
##########
@@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from airflow.exceptions import AirflowException
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+from airflow.sensors.base import BaseSensorOperator
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+
+class QuickSightSensor(BaseSensorOperator):
+    """
+    Watches for the status of a QuickSight Ingestion.
+
+    :param aws_account_id: An AWS Account ID
+    :param data_set_id: QuickSight Data Set ID
+    :param ingestion_id: QuickSight Ingestion ID
+    :param aws_conn_id: aws connection to use, defaults to "aws_default"
+    """
+
+    def __init__(
+        self,
+        *,
+        aws_account_id: str,
+        data_set_id: str,
+        ingestion_id: str,
+        aws_conn_id: str = "aws_default",
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.aws_account_id = aws_account_id
+        self.data_set_id = data_set_id
+        self.aws_account_id = aws_account_id
+        self.ingestion_id = ingestion_id
+        self.aws_conn_id = aws_conn_id
+        self.success_status = "COMPLETED"
+        self.errored_statuses = ("FAILED", "CANCELLED")
+        self.hook: Optional[QuickSightHook] = None
+
+    def poke(self, context: "Context"):
+        """
+        Pokes until the QuickSight Ingestion has successfully finished.
+
+        :param context: The task context during execution.
+        :return: True if it COMPLETED and False if not.
+        :rtype: bool
+        """
+        hook = self.get_hook()
+        self.log.info("Poking for Amazon QuickSight Ingestion ID: %s", self.ingestion_id)
+        quicksight_ingestion_state = hook.get_status(self.aws_account_id, self.data_set_id, self.ingestion_id)
+        self.log.info("QuickSight Status: %s", quicksight_ingestion_state)
+        if quicksight_ingestion_state in self.errored_statuses:
+            raise AirflowException("The QuickSight Ingestion failed!")

Review comment:
       ```suggestion
               raise AirflowException("The Amazon QuickSight Ingestion failed!")
   ```
   
   Nitpick for consistency.

##########
File path: airflow/providers/amazon/aws/operators/quicksight.py
##########
@@ -0,0 +1,100 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional, Sequence
+
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+DEFAULT_CONN_ID = "aws_default"
+
+
+class QuickSightCreateIngestionOperator(BaseOperator):
+    """
+    Creates and starts a new SPICE ingestion for a dataset.
+    Also, helps to Refresh existing SPICE datasets
+
+    :param data_set_id:  ID of the dataset used in the ingestion.
+    :param ingestion_id: ID for the ingestion.
+    :param aws_account_id: Amazon Web Services account ID.
+    :param ingestion_type: Type of ingestion. Values Can be  INCREMENTAL_REFRESH or FULL_REFRESH.
+        Default FULL_REFRESH.
+    :param wait_for_completion: If wait is set to True, the time interval, in seconds,
+        that the operation waits to check the status of the QuickSight Ingestion.
+    :param check_interval: if wait is set to be true, this is the time interval
+        in seconds which the operator will check the status of the QuickSight Ingestion
+    :param aws_conn_id: The Airflow connection used for AWS credentials. (templated)
+         If this is None or empty then the default boto3 behaviour is used. If
+         running Airflow in a distributed manner and aws_conn_id is None or
+         empty, then the default boto3 configuration would be used (and must be
+         maintained on each worker node).
+    :param region: Which AWS region the connection should use. (templated)
+         If this is None or empty then the default boto3 behaviour is used.
+    """
+
+    template_fields: Sequence[str] = (
+        "data_set_id",
+        "ingestion_id",
+        "aws_account_id",
+        "ingestion_type",
+        "wait_for_completion",
+        "check_interval",
+        "aws_conn_id",
+        "region",
+    )
+    ui_color = "#ffd700"
+
+    def __init__(
+        self,
+        data_set_id: str,
+        ingestion_id: str,
+        aws_account_id: str,
+        ingestion_type: str = "FULL_REFRESH",
+        wait_for_completion: bool = True,
+        check_interval: int = 30,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        region: Optional[str] = None,
+        **kwargs,
+    ):
+        self.data_set_id = data_set_id
+        self.ingestion_id = ingestion_id
+        self.aws_account_id = aws_account_id
+        self.ingestion_type = ingestion_type
+        self.wait_for_completion = wait_for_completion
+        self.check_interval = check_interval
+        self.aws_conn_id = aws_conn_id
+        self.region = region
+        super().__init__(**kwargs)
+
+    def execute(self, context: "Context"):
+        hook = QuickSightHook(
+            aws_conn_id=self.aws_conn_id,
+            region_name=self.region,
+        )
+        self.log.info("Running the QuickSight Spice Ingestion on Dataset ID: %s)", self.data_set_id)

Review comment:
       ```suggestion
           self.log.info("Running the Amazon QuickSight SPICE Ingestion on Dataset ID: %s)", self.data_set_id)
   ```
   
   Based on other uses, it looks like this is an acronym.

##########
File path: airflow/providers/amazon/aws/sensors/quicksight.py
##########
@@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from airflow.exceptions import AirflowException
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+from airflow.sensors.base import BaseSensorOperator
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+
+class QuickSightSensor(BaseSensorOperator):
+    """
+    Watches for the status of a QuickSight Ingestion.
+
+    :param aws_account_id: An AWS Account ID
+    :param data_set_id: QuickSight Data Set ID
+    :param ingestion_id: QuickSight Ingestion ID

Review comment:
       Nitpick: Same parameter names in the Operator above has a different descriptions, consider standardizing?

##########
File path: airflow/providers/amazon/aws/operators/quicksight.py
##########
@@ -0,0 +1,100 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional, Sequence
+
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+DEFAULT_CONN_ID = "aws_default"
+
+
+class QuickSightCreateIngestionOperator(BaseOperator):
+    """
+    Creates and starts a new SPICE ingestion for a dataset.
+    Also, helps to Refresh existing SPICE datasets
+
+    :param data_set_id:  ID of the dataset used in the ingestion.
+    :param ingestion_id: ID for the ingestion.
+    :param aws_account_id: Amazon Web Services account ID.
+    :param ingestion_type: Type of ingestion. Values Can be  INCREMENTAL_REFRESH or FULL_REFRESH.
+        Default FULL_REFRESH.
+    :param wait_for_completion: If wait is set to True, the time interval, in seconds,
+        that the operation waits to check the status of the QuickSight Ingestion.
+    :param check_interval: if wait is set to be true, this is the time interval
+        in seconds which the operator will check the status of the QuickSight Ingestion
+    :param aws_conn_id: The Airflow connection used for AWS credentials. (templated)
+         If this is None or empty then the default boto3 behaviour is used. If
+         running Airflow in a distributed manner and aws_conn_id is None or
+         empty, then the default boto3 configuration would be used (and must be
+         maintained on each worker node).
+    :param region: Which AWS region the connection should use. (templated)
+         If this is None or empty then the default boto3 behaviour is used.
+    """
+
+    template_fields: Sequence[str] = (
+        "data_set_id",
+        "ingestion_id",
+        "aws_account_id",
+        "ingestion_type",
+        "wait_for_completion",
+        "check_interval",
+        "aws_conn_id",
+        "region",
+    )
+    ui_color = "#ffd700"
+
+    def __init__(
+        self,
+        data_set_id: str,
+        ingestion_id: str,
+        aws_account_id: str,
+        ingestion_type: str = "FULL_REFRESH",
+        wait_for_completion: bool = True,
+        check_interval: int = 30,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        region: Optional[str] = None,
+        **kwargs,
+    ):
+        self.data_set_id = data_set_id
+        self.ingestion_id = ingestion_id
+        self.aws_account_id = aws_account_id
+        self.ingestion_type = ingestion_type
+        self.wait_for_completion = wait_for_completion
+        self.check_interval = check_interval
+        self.aws_conn_id = aws_conn_id
+        self.region = region
+        super().__init__(**kwargs)
+
+    def execute(self, context: "Context"):
+        hook = QuickSightHook(
+            aws_conn_id=self.aws_conn_id,
+            region_name=self.region,
+        )
+        self.log.info("Running the QuickSight Spice Ingestion on Dataset ID: %s)", self.data_set_id)
+        create_ingestion_response = hook.create_ingestion(
+            data_set_id=self.data_set_id,
+            ingestion_id=self.ingestion_id,
+            aws_account_id=self.aws_account_id,
+            ingestion_type=self.ingestion_type,
+            wait_for_completion=self.wait_for_completion,
+            check_interval=self.check_interval,
+        )
+        return create_ingestion_response

Review comment:
       Is there a particular reason to store it and then return it instead of just returning it on L92?

##########
File path: airflow/providers/amazon/aws/operators/quicksight.py
##########
@@ -0,0 +1,100 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import TYPE_CHECKING, Optional, Sequence
+
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.quicksight import QuickSightHook
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+DEFAULT_CONN_ID = "aws_default"
+
+
+class QuickSightCreateIngestionOperator(BaseOperator):
+    """
+    Creates and starts a new SPICE ingestion for a dataset.
+    Also, helps to Refresh existing SPICE datasets
+
+    :param data_set_id:  ID of the dataset used in the ingestion.
+    :param ingestion_id: ID for the ingestion.
+    :param aws_account_id: Amazon Web Services account ID.
+    :param ingestion_type: Type of ingestion. Values Can be  INCREMENTAL_REFRESH or FULL_REFRESH.
+        Default FULL_REFRESH.
+    :param wait_for_completion: If wait is set to True, the time interval, in seconds,
+        that the operation waits to check the status of the QuickSight Ingestion.
+    :param check_interval: if wait is set to be true, this is the time interval
+        in seconds which the operator will check the status of the QuickSight Ingestion

Review comment:
       ```suggestion
           in seconds which the operator will check the status of the Amazon QuickSight Ingestion
   ```

##########
File path: airflow/providers/amazon/aws/hooks/quicksight.py
##########
@@ -0,0 +1,141 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import time
+
+from botocore.exceptions import ClientError
+
+from airflow import AirflowException
+from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+
+
+class QuickSightHook(AwsBaseHook):
+    """
+    Interact with Amazon QuickSight.
+
+    Additional arguments (such as ``aws_conn_id``) may be specified and
+    are passed down to the underlying AwsBaseHook.
+    .. seealso::
+    :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
+    """
+
+    NON_TERMINAL_STATES = {"INITIALIZED", "QUEUED", "RUNNING"}
+    FAILED_STATES = {"FAILED"}
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(client_type="quicksight", *args, **kwargs)
+
+    def create_ingestion(
+        self,
+        data_set_id: str,
+        ingestion_id: str,
+        aws_account_id: str,

Review comment:
       Yeah, it seems odd that all of their client calls for Quicksight take the account ID as a parameter.  If you don't want to take it as an input, you get get it from [STS.get_caller_identity(): ]( https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts.html#STS.Client.get_caller_identity) and I'd suggest storing it as a field in the QuicksightHook perhaps rather than passing it all over the place.
   
   This should do the trick inline:
   
   ```
   import boto3
   boto3.client('sts').get_caller_identity()['Account']
   ``` 
   
   But if you want to follow Airflow structure and add an STS hook it would look like this:
   
   ```
   class StsHook(AwsBaseHook):
       def __init__(self) -> None:
           super().__init__(client_type='sts')
   
       def get_account_number(self) -> Dict:
           return self.conn.get_caller_identity()['Account']
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org