You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/24 00:15:00 UTC

[jira] [Commented] (AIRFLOW-5730) Enable get_pandas_df on Druid and Pinot DbApiHooks

    [ https://issues.apache.org/jira/browse/AIRFLOW-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958424#comment-16958424 ] 

ASF GitHub Bot commented on AIRFLOW-5730:
-----------------------------------------

sekikn commented on pull request #6399: [AIRFLOW-5730] Enable get_pandas_df on Druid and Pinot DbApiHooks
URL: https://github.com/apache/airflow/pull/6399
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
     - https://issues.apache.org/jira/browse/AIRFLOW-5730
     - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
     - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   Currently, DruidDbApiHook and PinotDbApiHook disable their get_pandas_df
   methods by raising NotImplementedError. But they actually work as
   inherited from DbApiHook. This PR enables them.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   TestDruidHook.test_get_pandas_df in tests/hooks/test_druid_hook.py
   TestPinotDbApiHook.test_get_pandas_df in tests/contrib/hooks/test_pinot_hook.py
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain docstrings that explain what it does
     - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Enable get_pandas_df on Druid and Pinot DbApiHooks
> --------------------------------------------------
>
>                 Key: AIRFLOW-5730
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5730
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.5
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>
> Currently, DruidDbApiHook and PinotDbApiHook disable their {{get_pandas_df}} methods by raising {{NotImplementedError}}.
> But they actually work as inherited from DbApiHook, as follows:
> {code}
> $ git diff
> diff --git a/airflow/contrib/hooks/pinot_hook.py b/airflow/contrib/hooks/pinot_hook.py
> index e617f8e9b..0864b3584 100644
> --- a/airflow/contrib/hooks/pinot_hook.py
> +++ b/airflow/contrib/hooks/pinot_hook.py
> @@ -90,8 +90,5 @@ class PinotDbApiHook(DbApiHook):
>      def set_autocommit(self, conn, autocommit):
>          raise NotImplementedError()
>  
> -    def get_pandas_df(self, sql, parameters=None):
> -        raise NotImplementedError()
> -
>      def insert_rows(self, table, rows, target_fields=None, commit_every=1000):
>          raise NotImplementedError()
> diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
> index c3cd3cd71..e2e20f1ec 100644
> --- a/airflow/hooks/druid_hook.py
> +++ b/airflow/hooks/druid_hook.py
> @@ -158,8 +158,5 @@ class DruidDbApiHook(DbApiHook):
>      def set_autocommit(self, conn, autocommit):
>          raise NotImplementedError()
>  
> -    def get_pandas_df(self, sql, parameters=None):
> -        raise NotImplementedError()
> -
>      def insert_rows(self, table, rows, target_fields=None, commit_every=1000):
>          raise NotImplementedError()
> {code}
> {code:title=Druid example}
> $ airflow connections list
> (snip)
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> │ 'druid_broker_default'         │ 'druid-broker'              │ 'localhost'               │ 8082   │ False          │ True                 │ 'gAAAAABdrxvt...M1ideRO8233QG' │
> ╘════════════════════════════════╧═════════════════════════════╧═══════════════════════════╧════════╧════════════════╧══════════════════════╧════════════════════════════════╛
> $ ipython
> (snip)
> In [2]: from airflow.hooks.druid_hook import DruidDbApiHook                                                                                                                  
> In [3]: DruidDbApiHook().get_pandas_df("SELECT * FROM wikipedia WHERE sum_delta > %(num)d", {"num": 2000})                                                                   
> [2019-10-23 23:28:18,606] {base_hook.py:89} INFO - Using connection to: id: druid_broker_default. Host: localhost, Port: 8082, Schema: None, Login: None, Password: None, extra: {'schema': 'http', 'endpoint': '/druid/v2/sql'}
> [2019-10-23 23:28:18,607] {druid_hook.py:140} INFO - Get the connection to druid broker on localhost using user None
> Out[3]: 
>                        __time        channel cityName                                            comment  ...  sum_deleted sum_delta sum_metroCode                    user
> 0    2015-09-12T00:00:00.000Z  #en.wikipedia           Archiving case from [[Wikipedia:Sockpuppet inv...  ...            0      3360             0                   Bbb23
> 1    2015-09-12T00:00:00.000Z  #ja.wikipedia           [[Special:Contributions/119.224.209.170|119.22...  ...            0      6853             0                 Kkairri
> 2    2015-09-12T01:00:00.000Z  #en.wikipedia                                             /* Hong Kong */  ...            0      4500             0                 Bertaut
> 3    2015-09-12T01:00:00.000Z  #en.wikipedia           Archiving 1 discussion(s) from [[User talk:New...  ...            0      3599             0  Lowercase sigmabot III
> 4    2015-09-12T01:00:00.000Z  #en.wikipedia           [[WP:AES|←]]Created page with '{{Infobox wildf...  ...            0     13335             0                  Orygun
> ..                        ...            ...      ...                                                ...  ...          ...       ...           ...                     ...
> 851  2015-09-12T23:00:00.000Z  #pt.wikipedia                Bem-vindo (usando [[WP:H|Huggle]])  (3.1.16)  ...            0      2588             0                Mobyduck
> 852  2015-09-12T23:00:00.000Z  #pt.wikipedia           adição de informação, renovação de conteúdos e...  ...            0      3666             0           Templarius 01
> 853  2015-09-12T23:00:00.000Z  #ru.wikipedia           [[ВП:←|←]] Новая страница: «{{редактирую|~~~~|...  ...            0      6766             0                 Dulamas
> 854  2015-09-12T23:00:00.000Z  #ru.wikipedia     Tver  [[ВП:×|отмена]] правки 73302711 участника [[Sp...  ...            0      9302             0            94.241.56.71
> 855  2015-09-12T23:00:00.000Z  #sr.wikipedia           Нова страница: [[Датотека:US Open.svg|десно|20...  ...            0     38443             0               Самарџија
> [856 rows x 21 columns]
> {code}
> {code:title=Pinot example}
> $ airflow connections list
> (snip)
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> │ 'pinot_broker_default'         │ 'pinot_broker_conn_id'      │ 'localhost'               │ 8000   │ False          │ True                 │ 'gAAAAABdrxRj...Afd51PZY94nfa' │
> ├────────────────────────────────┼─────────────────────────────┼───────────────────────────┼────────┼────────────────┼──────────────────────┼────────────────────────────────┤
> $ ipython
> (snip)
> In [2]: from airflow.contrib.hooks.pinot_hook import PinotDbApiHook
> In [3]: PinotDbApiHook().get_pandas_df("select sum('runs') from baseballStats where yearID>=%(num)d group by playerName", {"num": 2000})
> [2019-10-23 23:31:06,058] {base_hook.py:89} INFO - Using connection to: id: pinot_broker_default. Host: localhost, Port: 8000, Schema: None, Login: None, Password: None, extra: {'endpoint': '/query', 'schema': 'http'}
> [2019-10-23 23:31:06,059] {pinot_hook.py:48} INFO - Get the connection to pinot broker on localhost
> select sum('runs') from baseballStats where yearID>=2000 group by playerName
> Out[3]: 
>            playerName    sum_runs
> 0              Adrian  1820.00000
> 1        Jose Antonio  1692.00000
> 2              Rafael  1565.00000
> 3       Brian Michael  1500.00000
> 4        Jose Alberto  1426.00000
> 5  Alexander Emmanuel  1426.00000
> 6     Derek Sanderson  1390.00000
> 7              Carlos  1314.00000
> 8        Johnny David  1300.00000
> 9              Ichiro  1261.00000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)