You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Daniel Imberman (Jira)" <ji...@apache.org> on 2020/03/29 15:32:00 UTC
[jira] [Commented] (AIRFLOW-601) Airflow's Hive integration doesn't
scale up to tables with more than 32,767 partitions (and this is really
easy to fix)
[ https://issues.apache.org/jira/browse/AIRFLOW-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070394#comment-17070394 ]
Daniel Imberman commented on AIRFLOW-601:
-----------------------------------------
This issue has been moved to https://github.com/apache/airflow/issues/7963
> Airflow's Hive integration doesn't scale up to tables with more than 32,767 partitions (and this is really easy to fix)
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-601
> URL: https://issues.apache.org/jira/browse/AIRFLOW-601
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Reporter: Michael MacFadden
> Priority: Major
> Labels: hive, hive-hooks
>
> The Hive metastore API has a rather confusing method signature for {{listPartitions}}. The last method parameter specifies the maximum number of partitions to return, and its type is a Java short. So Airflow passes the maximum Java short value (32,767) and notes the limitation in its API docs:
> https://github.com/apache/incubator-airflow/blob/92064398c4c982a310925da376745a1713bf96e2/airflow/hooks/hive_hooks.py#L497-L499
> *However*, if you pass the magic number -1 as the "limit", then the metastore API will return *all* partitions. I found this documented here:
> https://issues.cloudera.org/browse/IMPALA-749
> I've also tried this myself on a Hive table with 80,000+ partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)