You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/19 15:58:52 UTC

[GitHub] [airflow] manugarri opened a new issue, #28468: Make pandas an optional dependency for amazon provider

manugarri opened a new issue, #28468:
URL: https://github.com/apache/airflow/issues/28468

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   latest
   
   ### Operating System
   
   any
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   First of all, apologies if this is not the right section to post a GH issue. I looked for provider specific feature requests but couldnt find such section.
   
   We use the aws provider at my company to interact from airflow with AWS services. We are using poetry for building the testing environment to test our dags.
   
   However the build times are quite long, and the reason is building pandas, which is a [dependency ](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/provider.yaml#L62) of the amazon provider.
   
   By checking the provider's code, it seems pandas is used in a small minority of functions inside the provider:
   ```
   ./aws/transfers/hive_to_dynamodb.py:93:        data = hive.get_pandas_df(self.sql, schema=self.schema)
   ```
   and
   ```
   ./aws/transfers/sql_to_s3.py:159:        data_df = sql_hook.get_pandas_df(sql=self.query, parameters=self.parameters)
   ```
   
   Forcing every AWS Airflow user that do not use hive or want to turn sql into an s3 file to install pandas is a bit cumbersome.
   
   ### What you think should happen instead
   
   given how heavy the package is and how little is used in the amazon provider, pandas should be an optional dependency.
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359554284

   @manugarri Do you use Python 3.7 or Alpine Linux? 
   
   According to https://pypi.org/project/pandas/#files
   - No pre-build wheels for Python 3.7 (EOL is 27 Jun 2023)
   - No pre-build wheels for [musllinux](https://peps.python.org/pep-0656/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360016607

   Or where it applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360012551

   Is it might be better make in optional for all providers where it listed as core dependency?
   
   - amazon
   - apache.hive
   - exasol
   - google
   - presto
   - salesforce
   - trino


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr closed issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
uranusjr closed issue #28468: Make pandas an optional dependency for amazon provider
URL: https://github.com/apache/airflow/issues/28468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360055481

   BTW. this is one of the 2.3+ features so we are again benefiting from bumping min airflow version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] manugarri commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
manugarri commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1361357263

   ![image](https://user-images.githubusercontent.com/3670355/208924631-2bb78a86-25d6-40b3-930c-60ed6404e573.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359898233

   MWAA right now support Airflow 2.2.2 however even if someone add this changes, which to be honest not a straightforward and required introduce breaking chages, it wouldn't be applicable to Airflow<2.3: https://github.com/apache/airflow/blob/main/README.md#release-process-for-providers
   
   However I see at Slack that AWS might add support of new version Airflow into MWAA soon
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1357882615

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359974090

   Yeah. Pandas is quite a "drag" as a dependency- regardles if it is a binary wheel or not so making it optional for Amazon provider would be a good idea. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] manugarri commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
manugarri commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359877907

   @Taragolis we use AWS MWAA, so we are stuck with python 3.7 unfortunately :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
vincbeck commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360007187

   I dont have much context on this one so if creating a PR for it is easy for you @potiuk, be my guest :) Happy to review it if needed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359981860

   I can even make a draft PR for that - we have a few prior-art cases (plyvel in google provider was pretty similar and we made it optional, we even have the `AirflowOptionalProviderFeatureException` foreseen for the case where people would use an exssting code in the provider and the needed dependency is not installed. It's actually pretty easy to do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1359972491

   This is probably leftovers from where pandas was a core dependency.
   
   @o-nikolas @ferruzzi @vincbeck probbaly something worth looking into.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360051845

   > Is it might be better make in optional for all providers where it listed as core dependency?
   
   I can do an example PR for amazon as an example and we can split the job for others :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360076239

   I told you it should be simple :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28468: Make pandas an optional dependency for amazon provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28468:
URL: https://github.com/apache/airflow/issues/28468#issuecomment-1360075806

   PR in #28505


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org