You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/14 15:53:19 UTC

[GitHub] [airflow] josh-fell commented on a diff in pull request #22315: Airflow file sensor by prefix for azure data lake storage

josh-fell commented on code in PR #22315:
URL: https://github.com/apache/airflow/pull/22315#discussion_r850571175


##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults

Review Comment:
   ```suggestion
   from airflow.providers.microsoft.azure.hooks.data_lake import AzureDataLakeHook
   from airflow.sensors.base import BaseSensorOperator
   ```
   The import paths for `AzureDataLakeHook` and `BaseSensorOperator` are deprecated. Also using `apply_defaults` is not necessary as of Airflow 2.



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
+
+    :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection
+
+    DataLake_path : directory of the file
+
+    prefix : file name
+
+    """
+
+    @apply_defaults

Review Comment:
   ```suggestion
   ```
   No longer needed as of Airflow 2.



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
+
+    :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection
+
+    DataLake_path : directory of the file
+
+    prefix : file name

Review Comment:
   Can you update the docstring for the parameters to match the Sphinx style used. The `azure_data_lake_conn_id` above is a good example or other examples in operators and sensors.



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
+
+    :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection
+
+    DataLake_path : directory of the file
+
+    prefix : file name
+
+    """
+
+    @apply_defaults
+    def __init__(
+        self,
+        DataLake_path,
+        prefix,
+        azure_data_lake_conn_id="azure_data_lake_default",
+        check_options=None,
+        *args,
+        **kwargs
+    ):
+        super(DataLakePrefixSensor, self).__init__(*args, **kwargs)
+        if check_options is None:
+            check_options = {}
+        self.azure_data_lake_conn_id = azure_data_lake_conn_id
+        self.DataLake_path = DataLake_path
+        self.prefix = prefix
+        self.check_options = check_options
+
+    def poke(self, context):
+        self.log.info(
+            "Poking for prefix: %s in Data Lake path: %s",
+            self.prefix,
+            self.DataLake_path,
+        )
+        hook = AzureDataLakeHook(azure_data_lake_conn_id=self.azure_data_lake_conn_id)
+        return len(hook.list(self.DataLake_path + "/" + self.prefix)) > 0
+
+
+        # return TRUE => 1 ou more file detected
+        # return FALSE => No file detected

Review Comment:
   ```suggestion
           num_files = len(hook.list(self.DataLake_path + "/" + self.prefix))
           return num_files > 0
   ```
   Small refactor for readability. The comments are better suited for the sensor's description in the docstring since this is the fundamental operation of the sensor.



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
+
+    :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection
+
+    DataLake_path : directory of the file
+
+    prefix : file name
+
+    """
+
+    @apply_defaults
+    def __init__(
+        self,
+        DataLake_path,
+        prefix,
+        azure_data_lake_conn_id="azure_data_lake_default",
+        check_options=None,

Review Comment:
   This doesn't look used in the sensor. Should it be removed?



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.

Review Comment:
   This description doesn't quite describe the purpose of the sensor. Would you mind updating and elaborating here?



##########
airflow/contrib/sensors/DataLake_PrefixFile_Sensor.py:
##########
@@ -0,0 +1,54 @@
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+# File detector by prefix sensor for azure data lake storage
+
+class DataLakePrefixSensor(BaseSensorOperator):
+
+    """
+    Interacts with Azure Data Lake:
+
+    Client ID and client secret should be in user and password parameters.
+    Tenant and account name should be extra field as
+    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
+
+    :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection
+
+    DataLake_path : directory of the file
+
+    prefix : file name
+
+    """
+
+    @apply_defaults
+    def __init__(
+        self,
+        DataLake_path,

Review Comment:
   We typically use snake case for variables, args, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org