You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/24 00:15:00 UTC
[jira] [Commented] (AIRFLOW-5730) Enable get_pandas_df on Druid and
Pinot DbApiHooks
[ https://issues.apache.org/jira/browse/AIRFLOW-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958424#comment-16958424 ]
ASF GitHub Bot commented on AIRFLOW-5730:
-----------------------------------------
sekikn commented on pull request #6399: [AIRFLOW-5730] Enable get_pandas_df on Druid and Pinot DbApiHooks
URL: https://github.com/apache/airflow/pull/6399
Make sure you have checked _all_ steps below.
### Jira
- [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
- https://issues.apache.org/jira/browse/AIRFLOW-5730
- In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
- In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
- In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
### Description
- [x] Here are some details about my PR, including screenshots of any UI changes:
Currently, DruidDbApiHook and PinotDbApiHook disable their get_pandas_df
methods by raising NotImplementedError. But they actually work as
inherited from DbApiHook. This PR enables them.
### Tests
- [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
TestDruidHook.test_get_pandas_df in tests/hooks/test_druid_hook.py
TestPinotDbApiHook.test_get_pandas_df in tests/contrib/hooks/test_pinot_hook.py
### Commits
- [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [x] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
- If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
> Enable get_pandas_df on Druid and Pinot DbApiHooks
> --------------------------------------------------
>
> Key: AIRFLOW-5730
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5730
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.10.5
> Reporter: Kengo Seki
> Assignee: Kengo Seki
> Priority: Major
>
> Currently, DruidDbApiHook and PinotDbApiHook disable their {{get_pandas_df}} methods by raising {{NotImplementedError}}.
> But they actually work as inherited from DbApiHook, as follows:
> {code}
> $ git diff
> diff --git a/airflow/contrib/hooks/pinot_hook.py b/airflow/contrib/hooks/pinot_hook.py
> index e617f8e9b..0864b3584 100644
> --- a/airflow/contrib/hooks/pinot_hook.py
> +++ b/airflow/contrib/hooks/pinot_hook.py
> @@ -90,8 +90,5 @@ class PinotDbApiHook(DbApiHook):
> def set_autocommit(self, conn, autocommit):
> raise NotImplementedError()
>
> - def get_pandas_df(self, sql, parameters=None):
> - raise NotImplementedError()
> -
> def insert_rows(self, table, rows, target_fields=None, commit_every=1000):
> raise NotImplementedError()
> diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
> index c3cd3cd71..e2e20f1ec 100644
> --- a/airflow/hooks/druid_hook.py
> +++ b/airflow/hooks/druid_hook.py
> @@ -158,8 +158,5 @@ class DruidDbApiHook(DbApiHook):
> def set_autocommit(self, conn, autocommit):
> raise NotImplementedError()
>
> - def get_pandas_df(self, sql, parameters=None):
> - raise NotImplementedError()
> -
> def insert_rows(self, table, rows, target_fields=None, commit_every=1000):
> raise NotImplementedError()
> {code}
> {code:title=Druid example}
> $ airflow connections list
> (snip)
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> │ 'druid_broker_default' │ 'druid-broker' │ 'localhost' │ 8082 │ False │ True │ 'gAAAAABdrxvt...M1ideRO8233QG' │
> ╘════════════════════════════════╧═════════════════════════════╧═══════════════════════════╧════════╧════════════════╧══════════════════════╧════════════════════════════════╛
> $ ipython
> (snip)
> In [2]: from airflow.hooks.druid_hook import DruidDbApiHook
> In [3]: DruidDbApiHook().get_pandas_df("SELECT * FROM wikipedia WHERE sum_delta > %(num)d", {"num": 2000})
> [2019-10-23 23:28:18,606] {base_hook.py:89} INFO - Using connection to: id: druid_broker_default. Host: localhost, Port: 8082, Schema: None, Login: None, Password: None, extra: {'schema': 'http', 'endpoint': '/druid/v2/sql'}
> [2019-10-23 23:28:18,607] {druid_hook.py:140} INFO - Get the connection to druid broker on localhost using user None
> Out[3]:
> __time channel cityName comment ... sum_deleted sum_delta sum_metroCode user
> 0 2015-09-12T00:00:00.000Z #en.wikipedia Archiving case from [[Wikipedia:Sockpuppet inv... ... 0 3360 0 Bbb23
> 1 2015-09-12T00:00:00.000Z #ja.wikipedia [[Special:Contributions/119.224.209.170|119.22... ... 0 6853 0 Kkairri
> 2 2015-09-12T01:00:00.000Z #en.wikipedia /* Hong Kong */ ... 0 4500 0 Bertaut
> 3 2015-09-12T01:00:00.000Z #en.wikipedia Archiving 1 discussion(s) from [[User talk:New... ... 0 3599 0 Lowercase sigmabot III
> 4 2015-09-12T01:00:00.000Z #en.wikipedia [[WP:AES|←]]Created page with '{{Infobox wildf... ... 0 13335 0 Orygun
> .. ... ... ... ... ... ... ... ... ...
> 851 2015-09-12T23:00:00.000Z #pt.wikipedia Bem-vindo (usando [[WP:H|Huggle]]) (3.1.16) ... 0 2588 0 Mobyduck
> 852 2015-09-12T23:00:00.000Z #pt.wikipedia adição de informação, renovação de conteúdos e... ... 0 3666 0 Templarius 01
> 853 2015-09-12T23:00:00.000Z #ru.wikipedia [[ВП:←|←]] Новая страница: «{{редактирую|~~~~|... ... 0 6766 0 Dulamas
> 854 2015-09-12T23:00:00.000Z #ru.wikipedia Tver [[ВП:×|отмена]] правки 73302711 участника [[Sp... ... 0 9302 0 94.241.56.71
> 855 2015-09-12T23:00:00.000Z #sr.wikipedia Нова страница: [[Датотека:US Open.svg|десно|20... ... 0 38443 0 Самарџија
> [856 rows x 21 columns]
> {code}
> {code:title=Pinot example}
> $ airflow connections list
> (snip)
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> │ 'pinot_broker_default' │ 'pinot_broker_conn_id' │ 'localhost' │ 8000 │ False │ True │ 'gAAAAABdrxRj...Afd51PZY94nfa' │
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> $ ipython
> (snip)
> In [2]: from airflow.contrib.hooks.pinot_hook import PinotDbApiHook
> In [3]: PinotDbApiHook().get_pandas_df("select sum('runs') from baseballStats where yearID>=%(num)d group by playerName", {"num": 2000})
> [2019-10-23 23:31:06,058] {base_hook.py:89} INFO - Using connection to: id: pinot_broker_default. Host: localhost, Port: 8000, Schema: None, Login: None, Password: None, extra: {'endpoint': '/query', 'schema': 'http'}
> [2019-10-23 23:31:06,059] {pinot_hook.py:48} INFO - Get the connection to pinot broker on localhost
> select sum('runs') from baseballStats where yearID>=2000 group by playerName
> Out[3]:
> playerName sum_runs
> 0 Adrian 1820.00000
> 1 Jose Antonio 1692.00000
> 2 Rafael 1565.00000
> 3 Brian Michael 1500.00000
> 4 Jose Alberto 1426.00000
> 5 Alexander Emmanuel 1426.00000
> 6 Derek Sanderson 1390.00000
> 7 Carlos 1314.00000
> 8 Johnny David 1300.00000
> 9 Ichiro 1261.00000
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)