You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/03/14 18:47:01 UTC
[jira] [Commented] (IMPALA-6625) Skip dictionary and collection conjunct assignment for non-Parquet scans.

    [ https://issues.apache.org/jira/browse/IMPALA-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792958#comment-16792958 ] 

ASF subversion and git services commented on IMPALA-6625:
---------------------------------------------------------

Commit aaca453859762f956c077c93454e7bf34e9ed028 in impala's branch refs/heads/2.x from poojanilangekar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aaca453 ]

IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.

Testing was carried out independently on parquet and non-parquet
scans:
  1. Parquet scans were tested via the existing parquet-filtering
     planner test. Additionally, a new test
     [parquet-filtering-disabled] was added to ensure that the
     explain plan generated skips parquet predicates based on the
     query options.
  2. Non-parquet scans were tested manually to ensure that the
     functions to compute parquet conjucts were not invoked.
     Additional test cases were added to the parquet-filtering
     planner test to scan non parquet tables and ensure that the
     plans do not contain conjuncts based on parquet statistics.
  3. A parquet partition was added to the alltypesmixedformat
     table in the functional database. Planner tests were added
     to ensure that Parquet conjuncts are constructed only when
     the Parquet partition is included in the query.

Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/12741
Reviewed-by: Tim Armstrong <ta...@cloudera.com>


> Skip dictionary and collection conjunct assignment for non-Parquet scans.
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-6625
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6625
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Alexander Behm
>            Assignee: Pooja Nilangekar
>            Priority: Critical
>              Labels: perf, planner
>
> In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans.
> The current behavior is undesirable because:
> * init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive
> * the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading
> Relevant code snippet from HdfsScanNode:
> {code}
> @Override
>   public void init(Analyzer analyzer) throws ImpalaException {
>     conjuncts_ = orderConjunctsByCost(conjuncts_);
>     checkForSupportedFileFormats();
>     assignCollectionConjuncts(analyzer);
>     computeDictionaryFilterConjuncts(analyzer);
>     // compute scan range locations with optional sampling
>     Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer);
> ...
>     if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here
>       computeMinMaxTupleAndConjuncts(analyzer);
>     }
> ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org