You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2023/09/29 10:57:00 UTC

[jira] [Updated] (IMPALA-12476) Single thread permission check can bottleneck table loading

     [ https://issues.apache.org/jira/browse/IMPALA-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Csaba Ringhofer updated IMPALA-12476:
-------------------------------------
    Summary: Single thread permission check can bottleneck table loading  (was: Single thread permission check can bottleneck table loadin)

> Single thread permission check can bottleneck table loading
> -----------------------------------------------------------
>
>                 Key: IMPALA-12476
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12476
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> Partitioned tables use multiple threads to list files in different partitions, but access permission checks are done before this on a single thread. IMPALA-7320 optimized this for full table loads (more exactly if a high percentage of partitions have changed), but in some cases we still do namenode RPCs on a single thread to get access level:
> 1. as mentioned above, if only a small subset of partitions are changed
> 2. if the path has ACL (access control list), then after getting file status an extra getAclStatus RPC is done, leading to partition_count number of RPCs on a single thread if ACL is enabled for all partitions
> 3. if there is some error when doing the optimized path
> 1. is especially problematic for metastore event processing, as partition events will often change  only a subset of partitions. Even if all partitions are changed, the catalogd may not process them in one batch (see IMPALA-12463), leading to choosing the unoptimized path for several smaller batches
> Besides the optimization in  IMPALA-7320, there is no good reason for doing access level check on a single thread, so both case 1 and 2 good be made faster by moving to the multithreaded stage of table loading.
> Note it is also a question whether all these access permission checks are really needed, see 	IMPALA-12472.
> An anomaly caused by doing these on a single thread is that the affect of flag max_hdfs_partitions_parallel_load can be ambiguous - while it can significantly speed up loading tables with multiple partitions, if the namenode (or the thread that communicates with namenode) is contended, then parallel loads will get an unfair share of the limited resources, meaning the tables where large amount of work is done on single thread can actually get slower.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org