You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/11/09 17:48:00 UTC

[jira] [Created] (IMPALA-10314) Planning time for simple SELECT with LIMIT could be improved

Aman Sinha created IMPALA-10314:
-----------------------------------

             Summary: Planning time for simple SELECT with LIMIT could be improved
                 Key: IMPALA-10314
                 URL: https://issues.apache.org/jira/browse/IMPALA-10314
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Consider a table t1 with following characteristics:
{noformat}
HDFS, Parquet format, external table
number of partitions in t1 : 39000 (2 level partitioning)
number of column : 72
number of files : 350000
{noformat}

The planning time for the following query with LIMIT without order-by is fairly long:
{noformat}
select * from t1 limit 10;

Query Compilation: 4s411ms
   - Single node plan created: 3s812ms (3s259ms)
{noformat}

The bulk of the time is spent in HdfsScanNode.computeScanRangeLocations() which iterates over all the partitions and file descriptors within the partitions to assign scan ranges based on data affinity.  For trivial LIMIT queries especially with small LIMIT values, we should look at ways to improve the planning time. 

{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org