You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/08 15:18:03 UTC

[GitHub] [airflow] luanmorenomaciel opened a new issue, #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

luanmorenomaciel opened a new issue, #28223:
URL: https://github.com/apache/airflow/issues/28223

   ### Description
   
   Microsoft Azure has created a [new Hadoop Filesystem compatible driver (ABFS)](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) that employs a URI format to address files and directories more efficiently and performant for Big Data & Analytics use-cases.
   
   As of now, Microsoft Azure offers three different ways to connect to its object storage system [Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction), which are:
   
   * [WASB](https://learn.microsoft.com/en-us/azure/databricks/external-data/wasb-blob) =  allows you to use either a storage account access key or a shared access signature (SAS) (**marked as legacy**)
   
   * [Data Lake Storage Gen1 (ADLS1)](https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview) = is an enterprise-wide hyper-scale repository for big data analytics workloads (**to be retired on feb 29, 2024**)
   
   * [Data Lake Storage Gen2 (ADLS2)](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) = is a set of capabilities dedicated to big data analytics (**the new standard**)
   
   Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) for [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs/) in favor of the Azure Blob Filesystem driver (ABFS); see [Access Azure Data Lake Storage Gen2 and Blob Storage](https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage). 
   
   ABFS has numerous benefits over WASB; see [Azure documentation on ABFS](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-abfs-driver).
   
   The current [Microsoft Azure Provider on Apache Airflow](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/5.0.0/index.html) supports the following hooks:
   
   - [WASB Hook](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_api/airflow/providers/microsoft/azure/hooks/wasb/index.html)
   
   - [Azure Data Lake Gen1 Hook](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/1.0.0/_api/airflow/providers/microsoft/azure/hooks/azure_data_lake/index.html#module-airflow.providers.microsoft.azure.hooks.azure_data_lake)
   
   Because of the **legacy** of the (WASB) and **retirement** of the (ADLS1) it finds necessary to implement the new protocol (ABFS) for Azure Blob Storage for feature state compatibility and to be compliant to the new standard.
   
   ### Use case/motivation
   
   Due to the deprecation (**WASB**) and retirement (**ADLS1**) of the protocols, it's key to implement a new Hook that would allow access to the ABFS protocol over the **Azure Blob Storage system.**
   
   There is also a multitude of **use-cases** that would benefit from this implementation, such as:
   
   - Support for Faster Storage Access of [Microsoft Azure Provider](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/5.0.0/index.html)
   - Seamless Integration with [Astro Python SDK Standard for Native Transfer](https://astro-sdk-python.readthedocs.io/en/stable/guides/concepts.html#improving-bottlenecks-by-using-native-transfer)
   - Enhancement of [Metadata Support](https://pypi.org/project/azure-storage-file-datalake/) 
   - Compliant with [New Standard Protocol (ABFS)](https://learn.microsoft.com/en-us/azure/databricks/external-data/wasb-blob) Usage
   
   
   
   ### Related issues
   
   Related Issues & Stop Blockers:
   
   - https://github.com/astronomer/astro-sdk/issues/905
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #28223:
URL: https://github.com/apache/airflow/issues/28223#issuecomment-1342889629

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28223:
URL: https://github.com/apache/airflow/issues/28223#issuecomment-1377138958

   > @kaxil should we consider this ticket closed, since #28262 was merged, or are there pending changes?
   
   @luanmorenomaciel  -> possibly you are the best to comment here, since you created the original request. Is your original intention completed now with all the merged PRs? Can you double-check it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] tatiana commented on issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by GitBox <gi...@apache.org>.
tatiana commented on issue #28223:
URL: https://github.com/apache/airflow/issues/28223#issuecomment-1377045963

   @kaxil should we consider this ticket closed, since #28262  was merged, or are there pending changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28223:
URL: https://github.com/apache/airflow/issues/28223#issuecomment-1342908948

   Assigned you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #28223:
URL: https://github.com/apache/airflow/issues/28223#issuecomment-1408831108

   I'm going to close this issue as completed. should something else is needed please open a new issue about the specific task needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #28223: Add Support to ABFS Azure Data Lake Storage Gen2 (ADLS2) Protocol on Microsoft.Azure.Hooks
URL: https://github.com/apache/airflow/issues/28223


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org