You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Hongbo Zeng (JIRA)" <ji...@apache.org> on 2016/06/25 16:28:37 UTC

[jira] [Closed] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

     [ https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hongbo Zeng closed AIRFLOW-118.
-------------------------------
    Assignee: Hongbo Zeng

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator 
> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-118
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-118
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Hongbo Zeng
>            Assignee: Hongbo Zeng
>
> The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users need to tune the numbers repeatedly, and do that again when the data size changes. This is not scalable as the number of data sources grows. targetPartitionSize approach calculates the number of segments automatically and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)