You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2023/06/28 13:07:00 UTC

[jira] [Created] (IMPALA-12251) Table migration to run on multiple partitions in parallel

Gabor Kaszab created IMPALA-12251:
-------------------------------------

             Summary: Table migration to run on multiple partitions in parallel
                 Key: IMPALA-12251
                 URL: https://issues.apache.org/jira/browse/IMPALA-12251
             Project: IMPALA
          Issue Type: New Feature
          Components: Frontend
            Reporter: Gabor Kaszab


https://issues.apache.org/jira/browse/IMPALA-11013 Introduces table migration from legacy Hive tables to Iceberg tables. The parallelization in this patch is based on files within a partition. But if there are a lot of partitions and only few files in them this approach is not performant.

Instead, as an improvement we can implement the parallelisation based on partitions and then decide which one to used based on a # partitions / avg # of files in a partition ratio.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org