You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/15 22:38:11 UTC

[GitHub] [airflow] iblaine opened a new pull request #17628: Create cols in df object so we avoid exception when no rows returned

iblaine opened a new pull request #17628:
URL: https://github.com/apache/airflow/pull/17628


   Improves get_pandas_df() in HiveServer2Hook by properly adding columns when an empty dataframe is encountered.
   
   Currently in hive hooks, when get_pandas_df() is used to create a dataframe, the next step is to add columns to the existing dataframe object.  pandas throws an exception when trying to add columns to an empty dataframe with no columns.  By moving adding columns to the step where the dataframe is created, we can avoid throwing an exception on empty dataframes.
   
   Current behavior using get_pandas_df() to read and an empty table:
   ```
   hh = HiveServer2Hook()
   sql = "SELECT * FROM <table> WHERE 1=0"
   df = hh.get_pandas_df(sql)
   
   [2021-08-15 21:10:15,282] {{hive.py:449}} INFO - SELECT * FROM <table> WHERE 1=0
   Traceback (most recent call last):
     File "<stdin>", line 2, in <module>
     File "/venv/lib/python3.7/site-packages/airflow/providers/apache/hive/hooks/hive.py", line 1073, in get_pandas_df
       df.columns = [c[0] for c in res['header']]
     File "/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 5154, in __setattr__
       return object.__setattr__(self, name, value)
     File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
     File "/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 564, in _set_axis
       self._mgr.set_axis(axis, labels)
     File "/venv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 227, in set_axis
       f"Length mismatch: Expected axis has {old_len} elements, new "
   ValueError: Length mismatch: Expected axis has 0 elements, new values have 1 elements
   ```
   
   New behavior w/this PR
   ```
   hh = HiveServer2Hook()
   sql = "SELECT * FROM <table> WHERE 1=0"
   df = hh.get_pandas_df(sql)
   len(df.index)
   0
   ```
   
   I need help testing this against the `airflow.static_babynames` hive table, if that test is needed. Also curious how to set that up.  I have tested this locally against my own hive server & it is working as expected.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iblaine closed pull request #17628: Create cols in df object so we avoid exception when no rows returned

Posted by GitBox <gi...@apache.org>.
iblaine closed pull request #17628:
URL: https://github.com/apache/airflow/pull/17628


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #17628: Create cols in df object so we avoid exception when no rows returned

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #17628:
URL: https://github.com/apache/airflow/pull/17628#issuecomment-899119914


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iblaine commented on pull request #17628: Create cols in df object so we avoid exception when no rows returned

Posted by GitBox <gi...@apache.org>.
iblaine commented on pull request #17628:
URL: https://github.com/apache/airflow/pull/17628#issuecomment-899242604


   Closing PR, will open up a new PR on the main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org