You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2016/06/30 20:01:10 UTC

[jira] [Updated] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor

     [ https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Riccomini updated AIRFLOW-243:
------------------------------------
    Affects Version/s:     (was: Airflow 2.0)
                       Airflow 1.7.1.3

> Use a more efficient Thrift call for HivePartitionSensor
> --------------------------------------------------------
>
>                 Key: AIRFLOW-243
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-243
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Paul Yang
>            Assignee: Li Xuanji
>            Priority: Minor
>             Fix For: Airflow 1.8
>
>
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call that can result in some expensive SQL queries for tables that have many partitions and are partitioned by multiple keys. We've seen our metastore DB get hammered by these sensors resulting in service degradation for other metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many connections to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that translates into more efficient SQL queries. Because connections will be pooled on the Thrift server, the DB won't get overloaded as with the {{MetastorePartitionSensor}}. The semantics of the arguments will change, so either a new argument needs to be introduced, or a new operator needs to be created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)