You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "roland (Jira)" <ji...@apache.org> on 2024/04/16 02:44:00 UTC

[jira] [Created] (FLINK-35118) StreamingHiveSource cannot track tables that have more than 32,767 partitions

roland created FLINK-35118:
------------------------------

             Summary: StreamingHiveSource cannot track tables that have more than 32,767 partitions
                 Key: FLINK-35118
                 URL: https://issues.apache.org/jira/browse/FLINK-35118
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Hive
    Affects Versions: 1.19.0
            Reporter: roland


*Description:*

The Streaming Hive Source cannot track tables that have more than 32,767 partitions.

 *Root Cause:*

The Streaming Hive Source uses the following lines to get all partitions of a table:

([git hub link|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HivePartitionFetcherContextBase.java#L130])

HivePartitionFetcherContextBase.java:
{code:java}
    @Override
    public List<ComparablePartitionValue> getComparablePartitionValueList() throws Exception {
        List<ComparablePartitionValue> partitionValueList = new ArrayList<>();
        switch (partitionOrder) {
            case PARTITION_NAME:
                List<String> partitionNames =
                        metaStoreClient.listPartitionNames(
                                tablePath.getDatabaseName(),
                                tablePath.getObjectName(),
                                Short.MAX_VALUE);
                for (String partitionName : partitionNames) {
                    partitionValueList.add(getComparablePartitionByName(partitionName));
                }
                break;
            case CREATE_TIME:
                Map<List<String>, Long> partValuesToCreateTime = new HashMap<>();
                partitionNames =
                        metaStoreClient.listPartitionNames(
                                tablePath.getDatabaseName(),
                                tablePath.getObjectName(),
                                Short.MAX_VALUE); {code}
Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the function `listPartitionNames` can only list no more than `Short.MAX_VALUE` partitions, which is 32,767.

 

For tables that have more partitions, the source fails to track new partitions and read from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)