You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/11/09 17:48:00 UTC
[jira] [Created] (IMPALA-10314) Planning time for simple SELECT
with LIMIT could be improved
Aman Sinha created IMPALA-10314:
-----------------------------------
Summary: Planning time for simple SELECT with LIMIT could be improved
Key: IMPALA-10314
URL: https://issues.apache.org/jira/browse/IMPALA-10314
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
Consider a table t1 with following characteristics:
{noformat}
HDFS, Parquet format, external table
number of partitions in t1 : 39000 (2 level partitioning)
number of column : 72
number of files : 350000
{noformat}
The planning time for the following query with LIMIT without order-by is fairly long:
{noformat}
select * from t1 limit 10;
Query Compilation: 4s411ms
- Single node plan created: 3s812ms (3s259ms)
{noformat}
The bulk of the time is spent in HdfsScanNode.computeScanRangeLocations() which iterates over all the partitions and file descriptors within the partitions to assign scan ranges based on data affinity. For trivial LIMIT queries especially with small LIMIT values, we should look at ways to improve the planning time.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org