You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2023/06/28 13:07:00 UTC
[jira] [Created] (IMPALA-12251) Table migration to run on multiple partitions in parallel
Gabor Kaszab created IMPALA-12251:
-------------------------------------
Summary: Table migration to run on multiple partitions in parallel
Key: IMPALA-12251
URL: https://issues.apache.org/jira/browse/IMPALA-12251
Project: IMPALA
Issue Type: New Feature
Components: Frontend
Reporter: Gabor Kaszab
https://issues.apache.org/jira/browse/IMPALA-11013 Introduces table migration from legacy Hive tables to Iceberg tables. The parallelization in this patch is based on files within a partition. But if there are a lot of partitions and only few files in them this approach is not performant.
Instead, as an improvement we can implement the parallelisation based on partitions and then decide which one to used based on a # partitions / avg # of files in a partition ratio.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org