You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2018/09/13 16:16:00 UTC
[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

    [ https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613710#comment-16613710 ] 

Aman Sinha commented on DRILL-6381:
-----------------------------------

The index planning and execution feature has been in development in a private repository at MapR. We would like to contribute it to Apache Drill and hope that it spurs further development and adoption by the community. The main ideas are described below.

The feature is divided into two broad categories:
 # A comprehensive index planning and execution framework which supports distributed covering and non-covering indices. Index planning is done for WHERE clause filters, sort-based operation such as ORDER BY, GROUP BY (using StreamingAggregate) and joins (using MergeJoin). The framework is intended to be agnostic to the storage plugins. It provides a clean abstraction layer that allows the Drill planner and executor to work with only core Drill artifacts while storage plugins provide concrete implementations of the interfaces.
 # A reference implementation with MapR-DB JSON plugin whose backend supports secondary indexing. Other similar DB plugins whose backend supports secondary indices could potentially use the reference implementation as a guide.

Note that Drill is a query engine; it does not provide CRUD operations either on the primary table or indexes. These are assumed to be maintained by the respective backend servers to which Drill communicates via storage/format plugins.

+*Key design concepts:*+

*Covering index:* 

An index whose index fields plus non-index fields (a.k.a included fields) 'covers' all the columns referenced in the query. The Drill planner will generate a covering index plan (a.k.a index-only) plan where all the columns are retrieved from the index after pushing down relevant filter conditions to the index scan.

*Non-covering index:* 

An index whose index fields plus included fields only partially covers columns referenced in the query. For instance, suppose the index is created as follows: index keys:\{a, b}, included fields: \{c} and the query is SELECT d, e FROM T WHERE a > 10 AND b < 20. In this case, since columns d, e are not present in the index at all, this is a non-covering index. For such indexes, the Drill planner will generate a non-covering plan where only the row ids are fetched from the index by pushing down the WHERE clause filters and the rest of the columns are fetched after a join-back to the primary table. The join-back is performed using the row ids. A related notion is that of Global index: Drill planner assumes indexes are global in nature, i.e the index blocks are not necessarily co-located with the primary table's data blocks. This is the most general case since an index may be quite large and in order to fully utilize the cluster resources it is best to have it fully distributed.

*Functional index:* 

An index which is created not on the base table columns but on functions/expressions. Currently, only CAST functions have been tested since these are most commonly used in Drill views. If the filter condition is 'WHERE CAST(zip_code as BIGINT) = 95120 and a functional index exists on CAST(zip_code as BIGINT), then the Drill planner will leverage such indexes as long as they are exposed through appropriate metadata interfaces.

*Range partitioning:* 

This is applicable for non-covering indexes. Since the index is typically distributed across multiple nodes in the cluster, once we retrieve the row ids from a particular node, we have to send each row id to the appropriate destination node which contains primary table data for that row. This is done through range partitioning. The Drill executor contains a special operator (RangePartitionRecordBatch) and a corresponding exchange (RangePartitionExchange) for this purpose. For example, suppose the filter condition is WHERE state IN ('CA', 'TX', NY'). Since the primary table data for these states may be spread out over multiple nodes, once the row ids are fetched from the index, they have to be grouped into separate 'ranges' such that co-located row ids can be sent to the same node.

*Rowkey Join and Restricted Scan (skip scan):* 

A Rowkey (a.k.a rowid) join is used whenever we need to fetch the rows from primary table for a set of row ids retrieved from the index. Note that this is not a real join but more of a lookup based on rowid. This is a random I/O operation from the primary table since each row id may access a separate data block from the table. The Drill executor contains a special operator (RowKeyJoinBatch) for doing the join-back. The right side of the RowKeyJoin is the index sub-plan. The left side contains a special type of scan called Restricted Scan (or skip scan). The storage plugin must implement this scan where a list of row ids can be submitted at a time if the backend supports bulk fetch. Otherwise, single row ids may be submitted incurring performance overhead. Each minor fragment will have an instance of the RowKeyJoin operator coupled with a Restricted Scan operator. The DbSubScan interface contains methods to specify this coupling.

*Index Intersection:* 

Consider a filter condition 'WHERE a > 10 AND b < 20'. Suppose a single composite key index does not exist on both columns but 2 separate indexes exist on 'a' and 'b'. Drill planner will create an index intersect plan where row ids from each index are retrieved and intersected and only the common row ids are used for the join-back to primary table. The index intersection plan is costed along with the single index plans and whichever is cheaper is chosen. Note that intersectoin is done through a HashJoin operator, so this adds cost (CPU, memory, network I/O if the join inputs have to be distributed), so it is possible that in some cases the single index plan may turn out cheaper - this depends on how much reduction in selectivity happens after intersection.

*Index Metadata:* 

The index metadata is exposed to the Drill planner through the interface IndexDescriptor (which is a derived class of IndexDefinition). The interface contains methods such as getIndexColumns(), getNonIndexColumns(), getCollation() and others which are meant to assist the Drill planner. Regarding collation (sortedness), note that range indexes provide collation based on the leading prefix of the index columns. However, hash indexes don't provide collation. The storage plugin should provide implementations of these based on the index metadata exposed by its backend server.

*Statistics and Leading Rowcount:* 

Statistics such as estimated number of rows matching a filter condition and the average row size are important parameters for the cost based index selection. The Drill planner provides an interface (o.a.d.exec.planner.index.Statistics) that contains APIs which should be implemented by the underlying storage plugin in order to expose the appropriate statistics (or provide default statistics) from the backend. For example, consider the index filter 'WHERE a > 10 AND b < 20'. The interface method getLeadingRowCount() takes a filter condition as argument and should return the 'leading rowcount' for the condition. This is the estimated row count of the filter condition based on the leading prefix of the index columns. If the index columns are the composite key \{a, b}, then leading prefix is \{a, b} for the above filter condition and the row count of the full conjunct should be used. However, if the index columns are \{a, c}, then only 'a' qualifies as the leading prefix, so the leading row count should be estimated for a > 10 only. Similarly, if the index columns are \{b, c} then only b < 20 should be used for estimating leading row count. The Drill planner will determine the leading prefix based on the filter condition and the index metadata.

*Index Selection:* 

The Drill planner in conjunction with Calcite's Volcano planner provides a cost-based index selection. In addition, the Drill planner employs a heuristic to reduce the search space of planning as described in 'Selectivity and thresholds'. For each candidate index it estimates the total cost of the index access plus join-back to primary table cost (for non-covering index). Based on these, the candidate indexes are ranked based on leading selectivity, collation (sortedness) property and whether it is a covering or non-covering index. The top 5 indexes per table are chosen (this number is configurable) to be considered for plan generation. These may also include index intersection. Thus, it is possible that index i1 and i2 may individually not qualify based on selectivity but their combined selectivity after intersection could be low enough to qualify. Once these indexes are chosen, there are 3 types of plans that may be generated: covering index plan, non-covering index plan and index intersection plan. The Volcano planner compares the cumulative cost of all of these plans along with the original full table scan plan and picks the cheapest plan for execution.

*Selectivity and thresholds:* 

For covering index, the Drill planner will generate a covering index plan even up to 100 % estimated selectivity. The expectation is that an index-only plan is going to be cheaper compared to full table scan due to smaller row widths of the index. For non-covering indexes, due to the random I/O nature of the rowkey join-back to primary table, the default selectivity threshold is small: 2.5 %. This is configurable through the setting planner.index.noncovering_selectivity_threshold. If the estimated selectivity of the filter condition is above this threshold, the non-covering index plan is not generated for that index; the rationale for this is that each new plan adds to the search space and increases planning time, so if the estimated row count is already high it is unlikely to be chosen and it is better to prune it out early. In addition, a global configuration setting: planner.enable_index_planning (default is TRUE) enables or disables index planning altogether.

> Add capability to do index based planning and execution
> -------------------------------------------------------
>
>                 Key: DRILL-6381
>                 URL: https://issues.apache.org/jira/browse/DRILL-6381
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Relational Operators, Query Planning &amp; Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary indexes), Drill should leverage those during planning and execution in order to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an abstraction layer which express the index metadata and statistics.  Further, a cost-based index selection is needed to decide which index(es) are suitable.  
> On the execution side, appropriate operator enhancements would be needed to handle different categories of indexes such as covering, non-covering indexes, taking into consideration the index data may not be co-located with the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)