You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2016/05/16 20:18:12 UTC

[jira] [Commented] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

    [ https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285213#comment-15285213 ] 

Chris Riccomini commented on AIRFLOW-118:
-----------------------------------------

Was there a PR for this?

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator 
> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-118
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-118
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Hongbo Zeng
>
> The definition of the two partition spec can be found http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that is users need to tune the numbers repeatedly, and do that again when the data size changes. This is not scalable as the number of data sources grows. targetPartitionSize approach calculates the number of segments automatically and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)