You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2018/12/17 23:33:38 UTC

[drill] branch gh-pages updated: add index content to docs

This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/drill.git


The following commit(s) were added to refs/heads/gh-pages by this push:
     new 2ac8a08  add index content to docs
2ac8a08 is described below

commit 2ac8a08da63e1d671b7a720e9760697c231ad2e9
Author: Bridget Bevens <bb...@maprtech.com>
AuthorDate: Mon Dec 17 15:32:47 2018 -0800

    add index content to docs
---
 _docs/img/query-plan-verify.png                    | Bin 0 -> 113092 bytes
 .../010-querying-indexes-introduction.md           |  15 +++
 ...ies-that-qualify-for-index-based-query-plans.md |  59 ++++++++++
 .../querying-indexes/021-types-of-indexes.md       | 124 +++++++++++++++++++++
 .../querying-indexes/030-index-selection.md        |  23 ++++
 .../040-designing-indexes-for-your-queries.md      |  97 ++++++++++++++++
 .../050-configuring-index-planning.md              |  37 ++++++
 .../querying-indexes/060-verifying-index-use.md    | 103 +++++++++++++++++
 8 files changed, 458 insertions(+)

diff --git a/_docs/img/query-plan-verify.png b/_docs/img/query-plan-verify.png
new file mode 100644
index 0000000..4d79109
Binary files /dev/null and b/_docs/img/query-plan-verify.png differ
diff --git a/_docs/query-data/querying-indexes/010-querying-indexes-introduction.md b/_docs/query-data/querying-indexes/010-querying-indexes-introduction.md
new file mode 100644
index 0000000..13f6d71
--- /dev/null
+++ b/_docs/query-data/querying-indexes/010-querying-indexes-introduction.md
@@ -0,0 +1,15 @@
+---
+title: "Querying Indexes Introduction"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---  
+
+{% include startnote.html %}Currently, Drill only supports indexes for the MapR-DB storage plugin. Examples in this document were created using Drill on MapR Database. {% include endnote.html %}
+
+An index is a structure, defined on a table, that contains a subset of columns from the table sorted by key values. Well-designed indexes eliminate full table scans and optimize access to data to significantly improve performance. Starting in Drill 1.15, Drill can leverage indexes (primary or secondary) in data sources to create index-based query plans. 
+
+{% include startnote.html %}You cannot create, update, or delete indexes or tables through Drill. You must create and manage the indexes and tables in the data sources themselves. Also, the data source must be configured as a storage plugin in Drill.{% include endnote.html %}
+
+An index-based query plan is a query plan that uses indexes versus full tables scans to access data. When you submit a query through Drill, the query planner in Drill evaluates the query and compares the cost of multiple query plans to find an optimal plan with the lowest cost. Index planning is enabled by default; however, you can disable the feature or configure the behavior of index planning in Drill through several options. 
+
+Certain types of queries qualify for index-based query plans. You can create indexes and write queries such that the query planner in Drill leverages the indexes to create an index-based query plan for optimized performance. Drill can leverage indexes to create covering, non-covering, and functional index plans. You can verify that the query planner generated an index-based query plan in the query profile (Drill Web UI) or from the command line using the [EXPLAIN PLAN FOR]({{site.baseurl [...]
diff --git a/_docs/query-data/querying-indexes/020-queries-that-qualify-for-index-based-query-plans.md b/_docs/query-data/querying-indexes/020-queries-that-qualify-for-index-based-query-plans.md
new file mode 100644
index 0000000..09f5420
--- /dev/null
+++ b/_docs/query-data/querying-indexes/020-queries-that-qualify-for-index-based-query-plans.md
@@ -0,0 +1,59 @@
+---
+title: "Queries that Qualify for Index-Based Query Plans"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---  
+
+In most cases, the planner chooses an index-based query plan for queries with [WHERE clause]({{site.baseurl}}/docs/where-clause/) filters and sort-based operations, such as [ORDER BY]({{site.baseurl}}/docs/order-by-clause/), [GROUP BY]({{site.baseurl}}/docs/group-by-clause/), and [joins]({{site.baseurl}}/docs/from-clause/#join-types).
+
+The following types of queries and conditions typically qualify for index-based query plans:  
+
+
+- Equality conditions (includes IN)
+- Range conditions (includes LIKE pattern matching)
+- ORDER BY queries
+- GROUP BY queries
+- JOIN queries
+- Projection queries (including DISTINCT)
+
+Drill can use indexes for queries that GROUP BY or ORDER BY the leading columns in an index. When a query contains GROUP BY and ORDER BY operations on the leading indexed column, Drill uses the sort order of the index to create index-based query plans with streaming aggregates and merge joins to improve query performance. Drill does not use indexes for queries that GROUP BY or ORDER BY the trailing or included columns in an index. 
+If a query contains conditions on multiple columns (multi-index query), Drill can scan multiple indexes and use the index intersection to optimize the query.  
+
+{% include startnote.html %}Each column added to an index increases the storage requirement for the index. As the storage size increases, the cost of reading the index also increases. Likewise, for the cost of adding and updating data. Consider the impact on storage and updates when including columns in an index.{% include endnote.html %}  
+
+##Queries Without Filters  
+
+Drill can create index-based query plans for the following types of queries without filters (queries that do not have a WHERE clause):  
+   
+
+- ORDER BY queries, as shown in the following example where L_LINENUMBER is an indexed column in the index selected for the query plan:    
+  
+		SELECT L_LINENUMBER FROM lineitem ORDER BY L_LINENUMBER;  
+  
+- GROUP BY queries, as shown in the following example where L_COMMITDate is an indexed column in the index selected for the query plan:  
+  
+		SELECT L_COMMITDate FROM lineitem GROUP BY L_COMMITDate;  
+
+- JOIN queries, as shown in the following example where L_ORDERKEY and O_ORDERKEY are indexed columns and L_LINESTATUS is an included column in the index selected for the query plan:  
+
+		SELECT L.L_LINESTATUS FROM lineitem L, orders O WHERE  L.L_ORDERKEY=O.O_ORDERKEY;  
+
+
+	**Note:** If the planner picks two indexes, one for lineitem and one for orders, a sort merge join is used instead of a hash join.  
+
+- Queries with DISTINCT projections, as shown in the following examples where L_LINENUMBER is an indexed column in the index selected for the query plan:  
+ 
+		SELECT DISTINCT L_LINENUMBER FROM lineitem;
+		SELECT COUNT(DISTINCT L_LINENUMBER) FROM lineitem;  
+
+
+
+
+
+
+
+
+  
+
+
+  
\ No newline at end of file
diff --git a/_docs/query-data/querying-indexes/021-types-of-indexes.md b/_docs/query-data/querying-indexes/021-types-of-indexes.md
new file mode 100644
index 0000000..0208f66
--- /dev/null
+++ b/_docs/query-data/querying-indexes/021-types-of-indexes.md
@@ -0,0 +1,124 @@
+---
+title: "Types of Indexes"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---  
+  
+The query planner in Drill can leverage indexes to create index-based query plans for improved query performance. An index is either covering or non-covering based on the columns referenced in the query and the columns in the index. Indexes for queries with ORDER BY may be hashed or non-hashed, affecting the sort order of the indexes. Drill also supports indexes on the CAST function.  
+
+The following sections describe these indexes in more detail and show examples of query plans generated for each.  
+
+{% include startnote.html %}Currently, Drill works with the types of indexes supported by MapR Database. The following sections demonstrate how Drill works with indexes in the MapR Database.{% include endnote.html %}  
+
+##Covering Index  
+A covering index is an index that “covers” all the columns referenced in a query. Only the index is needed to process the query. Covering indexes avoid the overhead of fetching data from the primary table. A covering index can include indexed and non-indexed (included) columns. 
+
+For example, suppose an index is created and queried, as follows:  
+
+	index keys:{a}, included columns: {b} and the query is SELECT a, b FROM T WHERE a > 10 AND b < 20.
+
+Since columns a, b are present in the index, this is a covering index. The Drill planner generates a covering index plan (index-only plan) where all the columns are retrieved from the index after pushing down relevant filter conditions to the index scan.  
+
+###Covering Index Example  
+
+This example uses an index, l_comp_1, created on a table named lineitem with indexed columns L_LINENUMBER and L_ORDERKEY and also includes columns L_LINESTATUS and L_QUANTITY.  
+
+The following query references the L_LINESTATUS, L_QUANTITY, L_LINENUMBER, and L_ORDERKEY columns in the lineitem table:  
+
+	SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;  
+
+Because the l_comp_1 index includes all columns referenced in the query, the query planner in Drill creates a covering index plan.
+
+Running the EXPLAIN PLAN FOR command with the query shows that Drill created a query plan that only uses the index to process the query:  
+
+	EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
+	
+	00-00    Screen
+	00-01      Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
+	00-02        Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`L_LINESTATUS`, `L_QUANTITY`]]])  
+
+Reading the query plan, you can see that the plan includes an index scan, as indicated by `groupscan=[JsonTableGroupScan` and `indexName=l_comp_1`. Drill and the data source can process this query using only the index.  
+
+##Non-Covering Index
+A non-covering index is an index that does not “cover” all the columns referenced in a query.  A non-covering index has indexed and/or non-indexed (included) columns that only partially cover the columns referenced in a query. A non-covering query plan includes an index scan and a join back to the primary table. In some scenarios, a full table scan is more cost efficient than an index scan and Drill will not create an index-based query plan.
+
+For example, suppose an index is created and queried, as follows:  
+
+	index keys:{a, b}, included columns: {c} and the query is SELECT d, e FROM T WHERE a > 10 AND b < 20.
+
+Since columns d, e are not present in the index, this is a non-covering index. For such indexes, the Drill planner generates a non-covering index plan where only the row ids are fetched from the index by pushing down the WHERE clause filters and the rest of the columns are fetched after a join-back to the primary table. The join-back is performed using the row ids.  
+
+###Non-Covering Index Example  
+This example uses an index, l_comp_1, created on a table named lineitem with indexed columns L_LINENUMBER and L_ORDERKEY and also included columns L_LINESTATUS and L_QUANTITY.  
+
+The following query references the L_RETURNFLAG, L_LINESTATUS, L_QUANTITY L_LINENUMBER, and L_ORDERKEY columns in the lineitem table:  
+	
+	SELECT L_RETURNFLAG, L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;  
+
+Because the l_comp_1 index does not include the L_RETURNFLAG column, the query planner in Drill creates a non-covering index plan that uses the index, but also includes a join on the primary table.  
+
+Running the EXPLAIN PLAN FOR command with the query shows that Drill includes an index scan and a table scan with a rowkey join:  
+
+	EXPLAIN PLAN FOR SELECT L_RETURNFLAG, L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
+	
+	00-00    Screen
+	00-01      Project(L_RETURNFLAG=[$0], L_LINESTATUS=[$1], L_QUANTITY=[$2])
+	00-02        Project(L_RETURNFLAG=[$2], L_LINESTATUS=[$3], L_QUANTITY=[$4])
+	00-03          Project(L_LINENUMBER=[$0], L_ORDERKEY=[$1], L_RETURNFLAG=[$2], L_LINESTATUS=[$3], L_QUANTITY=[$4])
+	00-04            RowKeyJoin(condition=[=($5, $6)], joinType=[inner])
+	00-06              Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[RestrictedJsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75}))], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_RETURNFLAG`, `L_LINESTATUS`, `L_QUANTITY`, `_id`], rowcount=60012.15000000001]])
+	00-05              Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`_id`]]])
+
+
+Reading the non-covering index plan, you can see that the plan includes an index scan, as indicated by the `groupscan=[JsonTableGroupScan` and `indexName=l_comp_1`, and also a scan on the primary table, as indicated by the `groupscan=[RestrictedJsonTableGroupScan` and the `RowKeyJoin`. To process this query, Drill can use the index with the data source, but the data source must also use the rowkey join on the primary table to fetch data in the L_RETURNFLAG column. Note that this is not a [...]
+
+If this query ran on a regular basis, you could remove the l_comp_1 index and create a new index that includes all columns referenced in the query, including the L_RETURNFLAG column, to improve query performance. However, running a query only once or a few times may not justify the overhead of removing the old index and creating a new index.  
+
+##Non-Hashed Indexes 
+Non-hashed indexes support conditional queries with an ORDER BY clause. When processing ORDER BY queries, Drill does not have to perform sort operations on the data.  
+
+###Non-Hashed Index Plan Example  
+
+A non-hashed index, l_comp_1, was created on a table, lineitem with indexed columns L_LINENUMBER and L_ORDERKEY and included columns L_LINESTATUSand L_QUANTITY.  
+
+Running the example query with the EXPLAIN PLAN FOR command shows that Drill produces an index plan without the additional sort and merge operations when using the non-hashed index to process the query, as follows:  
+
+	EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75 ORDER BY L_LINENUMBER;
+	
+	00-00    Screen
+	00-01      Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
+	00-02        Project(L_LINESTATUS=[$2], L_QUANTITY=[$3], L_LINENUMBER=[$0])
+	00-03          Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_LINESTATUS`, `L_QUANTITY`]]])  
+
+Reading the query plan, you can see that Drill uses the non-hashed index plan, as indicated by `indexName=l_comp_1`. To process the query, the data source uses the index and Drill does not have to perform sort and merge operations on the data, as indicated by the absence of the Sort and SingleMergeExchange operations in the query plan. The data source sorted the data in the index when the index was created.  
+
+##Hashed Indexes
+Hashed indexes support the same conditional queries as non-hashed indexes, but they do not have a guaranteed sort order. Hashed indexes enable the data source to evenly distribute new writes on an index across logical partitions to avoid hot spotting. Drill must perform a sort for ORDER BY queries that use hashed indexes. Sorting the data can increase the CPU costs and negatively impact performance.  
+
+If you notice performance issues with ORDER BY queries that use hashed indexes, review the query plans to see if the plans include sort and merge operations. If this is the case, create non-hashed indexes to support the queries and achieve the best performance.  
+
+###Hashed Index Plan Example
+A hashed index, l_hash_comp_1, was created with indexed columns L_LINENUMBER and L_ORDERKEY and included columns L_LINESTATUS and L_QUANTITY.  
+
+Running the example query with the EXPLAIN PLAN FOR command shows that Drill produces an index plan with sort and merge operations to process the query when using the hashed index, as follows:  
+
+	EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75 ORDER BY L_LINENUMBER;
+	
+	00-00    Screen
+	00-01      Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
+	00-02        SingleMergeExchange(sort0=[2])
+	01-01          SelectionVectorRemover
+	01-02            Sort(sort0=[$2], dir0=[ASC])
+	01-03              Project(L_LINESTATUS=[$2], L_QUANTITY=[$3], L_LINENUMBER=[$0])
+	01-04                Scan(table=[[si, tpch_sf1_maprdb_hash, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/hash/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_hash_comp_1], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_LINESTATUS`, `L_QUANTITY`]]])  
+
+Reading the query plan, you can see that Drill uses the hashed index in the plan, as indicated by `indexName=l_hash_comp_1`. To process the query, the data source can use the index, but Drill must sort and merge the data, as indicated by the `Sort` and `SingleMergeExchange` operations in the query plan.
+Using the hashed index plan for this ORDER BY query requires additional processing and negatively impacts performance.  
+
+##Functional Index  
+A functional index is an index created on functions or expressions instead of columns in a table. CAST functions are most commonly used in Drill views. For example, if the filter condition is 'WHERE CAST(zip_code as BIGINT) = 95120' and a functional index exists on CAST(zip_code as BIGINT), the query planner will leverage the index.  
+
+When issuing Drill queries through BI tools, you can include CAST functions in your queries to create [Drill views]({{site.baseurl}}/docs/create-view/). Including CAST functions provides the metadata needed to optimally process the queries. For more information about using the CAST function with Drill, see [Data Type Conversion]({{site.baseurl}}/docs/data-type-conversion/).  
+
+{% include startnote.html %}MapR Database supports functional indexes. The query planner in Drill can use the functional index to optimize queries with CAST functions.{% include endnote.html %}  
+ 
\ No newline at end of file
diff --git a/_docs/query-data/querying-indexes/030-index-selection.md b/_docs/query-data/querying-indexes/030-index-selection.md
new file mode 100644
index 0000000..586a230
--- /dev/null
+++ b/_docs/query-data/querying-indexes/030-index-selection.md
@@ -0,0 +1,23 @@
+---
+title: "Index Selection"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---  
+
+Drill leverages indexes during the physical planning phase of the query. The query planner in Drill evaluates the cost of multiple query plans and then selects the plan with the lowest cost for query execution. The planner uses CPU, memory, and network I/O estimates to calculate cost. Based on these costs, the query planner evaluates selectivity, collation (sortedness of data), and the type of index. For each candidate index, the query planner estimates the total cost of accessing the index.  
+
+The query planner can create three types of index plans: covering, non-covering, and index intersection. By default, the planner evaluates the top five indexes per table, though this number is configurable. The indexes chosen may also include an index intersection. For example, two indexes may not qualify for an index plan based on their individual selectivity, but their combined selectivity after intersection could be low enough to qualify.  
+ 
+##Selectivity    
+Selectivity is the estimated number of rows based on the selectivity of each conditional expression in the WHERE clause, calculated as (output row count)/(total table row count). For example, if a table has 100 rows and 25 of the rows qualify the filter condition, the selectivity is .25. Selectivity ranges between 0 and 1. The closer to 0, the more selective the filter. The more selective a filter, the lower the cost. High filter selectivity results in better query performance. If filter [...]
+
+The data source provides Drill with the estimated number of rows that match a filter filter condition and the average row size. The data source uses a filter condition, such as WHERE a > 10 AND b < 20, to return the estimated row count of the filter condition based on the leading prefix of the index columns. For example, if the index columns are the composite key {a, b}, the leading prefix of the filter condition is {a, b}, and the row count of the conjunct (number of rows that meet both [...]
+
+##Selectivity of Covering and Non-Covering Indexes
+For a covering index, the query planner always generates a covering index plan, even if the estimated selectivity is 100%. The planner always expects an index-only plan to be cheaper than a plan with a full table scan due to the smaller row widths in an index.  
+  
+For a non-covering index, the query planner estimates the cost of a join-back to the primary table. Due to the random I/O nature of the rowkey join-back to primary table, the default selectivity threshold is small: 2.5% (.025). You can configure the default selectivity threshold for non-covering indexes through the `planner.index.noncovering_selectivity_threshold` option.   
+
+If the estimated selectivity of the filter condition is above this threshold, the query planner does not generate a non-covering index plan for that index; the rationale is that each new plan adds to the search space and increases planning time. If the estimated row count is already high, the plan is unlikely to be chosen anyway, and therefore better to prune the plan out early. 
+
+ 
\ No newline at end of file
diff --git a/_docs/query-data/querying-indexes/040-designing-indexes-for-your-queries.md b/_docs/query-data/querying-indexes/040-designing-indexes-for-your-queries.md
new file mode 100644
index 0000000..0f5ac40
--- /dev/null
+++ b/_docs/query-data/querying-indexes/040-designing-indexes-for-your-queries.md
@@ -0,0 +1,97 @@
+---
+title: "Designing Indexes for Your Queries"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---   
+
+Design indexes that support your queries for maximum performance benefits. Use common query patterns that involve filters and ordering to define indexes. Weigh the benefits of indexes against their update and storage costs and take into consideration any index limitations.  
+
+##Identify Query Patterns  
+Query patterns, such as queries with filter conditions and ORDER BY clauses, indicate where indexes can improve performance. If a query does not contain selective filters, the overhead of using an index may cost more than a full table scan. You should also define your indexes such that a single index benefits either multiple queries or individual queries that you run most often.  
+
+###Determine Potential Indexes Based on Query Patterns  
+The following table describes the types and characteristics of indexes you might want to create based on some example query patterns:  
+
+| **Identified Query Pattern**                                                                    | **Potential Indexes to Create**                                                                                                                                                                                                                     |
+|---------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Compares individual columns with selective filter conditions                                | Define single column indexes on the columns that   you compare against. Verify that the columns contain supported data types.                                                                                                                   |
+| Filters against specific combinations of columns                                            | Define   composite column indexes instead of single column indexes. Specify the   sequence of the index keys so columns that appear in equality conditions are   the prefixes in the keys.                                                      |
+| Accesses a subset of columns   in a document, but does not filter or sort on these columns  | Add those columns as included columns in   indexes.                                                                                                                                                                                             |
+| Filters on a sub-column in a   nested document                                              | Define the index key on the sub-column.                                                                                                                                                                                                         |
+| Filters on sub-columns in   nested documents that are array elements                        | Define the index key using a container column   path: for example, arraycolumn[].subcolumn.                                                                                                                                                     |
+| Filters and projects using a   container column path                                        | Define the container column path as both an   indexed column and included column.                                                                                                                                                               |
+| Filters on individual   elements of an array, which can appear in any position in the array | Define an index using a container column path:   for example, arraycolumn[].                                                                                                                                                                    |
+| Issues Drill SQL queries with   filter conditions that contain CAST expressions             | Specify the CAST function when defining the   index key.                                                                                                                                                                                        |
+| Sorts on columns                                                                            | Define the sequence and order direction of the   index keys to match the sequence and order direction of the columns your   query sorts. If the sort order of the index keys matches the insertion order   of documents, define hashed indexes. |
+| Sorts on one set of columns   and filters on another set using equality conditions          | Define a composite index so that columns using   equality conditions are the prefixes in the index keys, followed by the sort   columns.                                                                                                        |   
+
+##Evaluate Trade-Offs and Limitations  
+
+When designing indexes for optimization, consider the following trade-offs and limitations:
+
+###Synchronizing Indexes
+When you design your indexes, remember that the data source must synchronize each index when you insert and update columns in the table. This impacts the throughput performance of inserts and updates because the data source must perform additional writes. The impact increases with each additional index.  
+
+###Index Storage Requirements
+Consider the storage costs when creating indexes and deciding on the columns to add to the index. Indexes increase your storage requirements. The storage size depends on the number of indexed and included columns in the index and the size of values stored in those columns. As the size of the index increases, the cost of reading the index also increases.  
+
+###Index Restrictions
+When designing your indexes, make sure the indexes support the functionality you need. 
+
+**Examples**  
+
+The following examples illustrate the concepts behind index design, though they do not account for sizing, storage, and updates. Always weigh the benefits of indexes against these other requirements.  
+
+Suppose you have the following customer data in JSON format:  
+
+	{
+	   "_id": "10000",
+	   "FullName": {
+	      "LastName": "Smith",
+	      "FirstName": "John"
+	   },
+	   "Address": {
+	      "Street": "123 SE 22nd St.",
+	      "City": "Oakland",
+	      "State": "CA",
+	      "Zipcode": "94601-1001"
+	   },
+	   "Gender": "M",
+	   "AccountBalance": 999.99,
+	   "Email": "john.smith@company.com",
+	   "Phones": [
+	      {"Type": "Home", "Number": "555-555-1234"},
+	      {"Type": "Mobile", "Number": "555-555-5678"},
+	      {"Type": "Work", "Number": "555-555-9012"}
+	   ],
+	   "Hobbies": ["Baseball", "Cooking", "Reading"],
+	   "DateOfBirth": "10/1/1985"
+	}   
+
+The following table contains columns in the document that are candidates for indexing based on the sample queries:  
+
+|    Query # | Query                                                                                                                     | Candidate   columns for Indexing                           |
+|------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
+| 1          | Find all customers who were born in the 1970s.                                                                            | DateOfBirth                                                |
+| 2          | Find all customers who have an account balance   greater than $10K. Order the information in descending order of balance. | AccountBalance                                             |
+| 3          | List customers who live in   California, ordering the list by LastName, FirstName.                                        | Address.State     FullName.LastName     FullName.FirstName |
+| 4          | Find the ids and emails of customers who live in   a specific zip code.                                                   | Address.Zip                                                |
+| 5          | Find customers who live in a   specific set of states and have an account balance less than a specific   value.           | Address.State     AccountBalance                           |
+| 6          | Find male customers with the   last name starting with the letter "S".                                                    | Gender     FullName.LastName                               |
+| 7          | Find all customers who have a   mobile phone number with a prefix of "650".                                               | Phones[].Type     Phones[].Number                          |     
+
+The following table contains indexes you might create to optimize the queries listed in the previous table and the reasons for doing so:  
+
+|    Index                                                                                           | Rationale                                                                                                                                                                                                                                                                |
+|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Simple   index on DateOfBirth                                                                      | Optimizes the range condition on   DateOfBirth in query 1.     You need not create a hashed index, because it is unlikely that the order   of DateOfBirth correlates with the insert order of new data.                                                                  |
+| Simple   index on AccountBalance, specified as a descending key                                    | Optimizes the range condition on   AccountBalance in query 2.     Descending order of key meets the ordering criteria in query 2.     Also optimizes the range condition on AccountBalance in query 5 in   combination with the index on Address.State.                  |
+| Composite   index on:     •Address.State     •FullName.LastName     •FullName.FirstName            | Optimizes both the equality   condition on Address.State and ordering in query 3.     Inclusion of the name columns in the index meets query 3 ordering.     Also optimizes the IN condition in query 5 when used in combination with   the index on AccountBalance      |
+| Simple   index with:     •Indexed column on Address.Zip     •Included columns on Id and Email      | Optimizes the equality condition   on Address.Zip in query 4.     Adding the included columns avoids reading the JSON table in query 4.                                                                                                                                  |
+| Composite   index on:     •Gender     •FullName.LastName                                           | Optimizes equality condition on   Gender and pattern matching condition on FullName.LastName for query 6.     Specifying Gender as the leading key in combination with FullName.LastName   results in more selective index lookups for query 6.                          |
+| Simple   index on Hobbies[]                                                                        | Optimizes the equality condition   on array elements of Hobbies in query 7:     {"$eq":{"Hobbies[]":"Reading"}}                                                                                                                                                          |
+| Composite   index on:     •Phones[].Type     •Phones[].Number                                      | Optimizes the following two   conditions in query 8:     •Equality condition on the Type subcolumn in nested documents in the Phones   array.     •Pattern matching condition on the Number subcolumn in nested documents in   the Phones array.                         |
+
+
+
+
diff --git a/_docs/query-data/querying-indexes/050-configuring-index-planning.md b/_docs/query-data/querying-indexes/050-configuring-index-planning.md
new file mode 100644
index 0000000..a5d176f
--- /dev/null
+++ b/_docs/query-data/querying-indexes/050-configuring-index-planning.md
@@ -0,0 +1,37 @@
+---
+title: "Configuring Index Planning"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---   
+
+Certain options control how Drill uses indexes when planning and executing queries. You can set the query planning and execution options, at the system or session level, using the ALTER SYSTEM|SET commands, as shown:  
+
+	ALTER SYSTEM SET `planner.enable_index_planning` = 'true' 
+	SET `planner.enable_index_planning` = 'false'  
+
+Options set at the session level only apply to queries that you run during the current Drill connection. Options set at the system level affect the entire system and persist between restarts. Session level settings override system level settings. Typically, you set the options at the session level unless you want the setting to persist across all sessions.  
+
+The following table lists the index planning and execution options that you can enable, disable, or modify:  
+
+**Note:** The planning option names are prepended with planner, for example `planner.enable_index_planning`. The execution options are prepended with exec, for example `exec.query.rowkeyjoin_batchsize`.  
+
+|    **Option**                                           | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                  [...]
+|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...]
+| planner.enable_index_planning                       | Enables   or disables index planning                                                                                                                                                                                                                                                                                                                                                                                                                 [...]
+| planner.index.force_sort_noncovering                | Forces   Drill to sort for non-covering indexes. If the query has an ORDER-BY on index   columns and a non-covering index is chosen, by default Drill leverages the   sortedness of the index columns and does not sort. Fast changing primary   table data may produce a partial sort. This option forces a sort within   Drill.                                                                                                                    [...]
+| planner.enable_rowkeyjoin_conversion                | Drill can push down the   rowkey filter to the data source during runtime. For a query to qualify for   runtime filter pushdown, the join condition must filter on a rowkey. For   example,     SELECT t.mscIdentities FROM dfs.root.`/user/ted/MixTable` t WHERE t.row_key   IN (SELECT max(convert_fromutf8(i.KeyA.ENTRY_KEY)) FROM   dfs.root.`/user/ted/TableIMSI` i WHERE i.row_key='460021050005636')     Drill evaluates the results of the s [...]
+| planner.rowkeyjoin_conversion_selectivity_threshold | Sets the selectivity (as a   percentage) under which Drill uses a rowkey join for eligible queries.                                                                                                                                                                                                                                                                                                                                                  [...]
+| planner.rowkeyjoin_conversion_using_hashjoin        | When   enabled, Drill uses the hash join operator instead of a rowkey join.                                                                                                                                                                                                                                                                                                                                                                          [...]
+| planner.index.covering_selectivity_threshold        | For   covering indexes, this option specifies the filter selectivity that   corresponds to the leading prefix of the index below which the index is   considered for planning. For example, for the filter ‘a > 10 AND b <   20’ if an index has indexed columns (a, b, c) and the combined selectivity of   the above condition is less than the threshold, the index is considered for   the query plan.                                           [...]
+| planner.index.noncovering_selectivity_threshold     | For   non-covering indexes, this option specifies the filter selectivity that   corresponds to the leading prefix of the index below which the index is   considered for planning.                                                                                                                                                                                                                                                                   [...]
+| planner.index.max_chosen_indexes_per_table          | The   maximum number of “chosen” indexes for a table after index costing and   ranking.                                                                                                                                                                                                                                                                                                                                                              [...]
+| planner.index.rowkeyjoin_cost_factor                | The   cost factor that provides some control over the I/O cost for non-covering   indexes when the rowkey join back to the primary table causes random I/O from   the primary table.                                                                                                                                                                                                                                                                 [...]
+| planner.enable_statistics                           | Enable   or disable statistics for the filter conditions on indexed columns.                                                                                                                                                                                                                                                                                                                                                                         [...]
+| exec.query.rowkeyjoin_batchsize                     | For   batch GET operations, this option specifies the batch size in terms of the   number of rowkeys. Used for non-covering index plans when doing joins back to   primary table.                                                                                                                                                                                                                                                                    [...]
+
+
+
+
+
+
+
diff --git a/_docs/query-data/querying-indexes/060-verifying-index-use.md b/_docs/query-data/querying-indexes/060-verifying-index-use.md
new file mode 100644
index 0000000..2662c07
--- /dev/null
+++ b/_docs/query-data/querying-indexes/060-verifying-index-use.md
@@ -0,0 +1,103 @@
+---
+title: "Verifying Index Use"
+date: 2018-09-28 21:35:21 UTC
+parent: "Querying Indexes"
+---  
+
+Evaluate query plans to analyze query performance and determine if Drill used qualifying indexes. You can view query plans in the Drill Web UI or through the command line using the EXPLAIN PLAN FOR command. Alternatively, you can disable the `planner.enable_index_planning` option in Drill to see the query plan with a full table scan and then compare the plan with the index-based plan.  
+
+The following examples demonstrate how to view the query profile in the Drill Web UI and the output of the EXPLAIN PLAN FOR command to determine if the query planner in Drill selected an index-based query plan. There is also an example that shows you how to disable index planning to compare a full table scan plan with an index-based query plan.  
+
+##Examples  
+The subsequent sections assume that an index exists on a table named "lineitem." The index, l_single_c_5, is a single column index created on the L_QUANTITY column. The index also covers the L_SUPPKEY, L_DISCOUNT, L_SHIPDate, and L_SHIPMODE columns. If a query contains columns covered by the index, the query is a covering query. If a query contains columns not covered by the index, the query is non-covering and requires a lookup back into the primary table to retrieve data.  
+
+The following list summarizes the assumptions:  
+
+- **Table name**: `lineitem`
+- **Index name**: `l_single_c_5`
+- **Indexed column**: `L_QUANTITY`
+- **Included columns**: `L_SUPPKEY`, `L_DISCOUNT`, `L_SHIPDate`, `L_SHIPMODE`  
+
+###Query Profile
+You can view the query plan on the Profiles page of the Drill Web UI, by selecting the query you want to evaluate and then selecting the Physical Plan page. The page displays the physical plan that Drill used to execute the query.  
+ 
+The following image shows the physical plan that Drill used to execute this simple equality query:  
+
+	SELECT L_SHIPDate FROM lineitem WHERE L_QUANTITY = 5; 
+
+![](https://i.imgur.com/DkZPAYJ.png)   
+
+In the plan, you can see that Drill scanned the index, `l_single_c_5`, instead of the primary table. The query was completely covered by the index because the index contains all the columns referenced in the query and the query filtered on the indexed column.  
+
+###EXPLAIN Command  
+Alternatively, you can issue the [EXPLAIN command]({{site.baseurl}}/docs/explain/) to see how Drill executes a query. You can see the chosen physical execution plan for a query without running the query, by issuing the [EXPLAIN PLAN FOR command]({{site.baseurl}}/docs/explain/#explain-for-physical-plans). The output of the command shows you if Drill plans to use the index when executing the query, as shown:    
+
+	EXPLAIN PLAN FOR SELECT L_SHIPDate FROM lineitem WHERE L_QUANTITY = 5 LIMIT 10;
+	+------+------+
+	| text | json |
+	+------+------+
+	| 00-00    Screen
+	00-01      Project(L_SHIPDate=[$0])
+	00-02        SelectionVectorRemover
+	00-03          Limit(fetch=[10])
+	00-04            Limit(fetch=[10])
+	00-05              Project(L_SHIPDate=[$1])
+	00-06                Scan(groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=hdfs:///drill/testdata/tpch/sf1/data/json/lineitem, condition=(L_QUANTITY = {"$numberLong":5}), indexName=l_single_c_5], columns=[`L_QUANTITY`, `L_SHIPDate`]]])  
+
+In the plan, you can see that Drill plans to use the index, `l_single_c_5`, instead of performing a full table scan. The query is completely covered by the index because the index contains all columns referenced in the query and the query filters on the indexed column.  
+
+###Comparing an Index-Based Plan to a Full Table Scan Plan 
+If you want to compare an index-based plan against a plan with a full table scan, disable the `planner.enable_index_planning` option in Drill, and run the EXPLAIN PLAN FOR command with the query. Running this command with the `planner.enable_index_planning` option disabled forces Drill to generate a plan that includes a full table scan. 
+
+You can compare the full table scan plan against the index-based plan to compare the costs and resource consumption of each plan.
+ 
+In the following example, the indexing feature is enabled, and Drill generated a plan using the index:  
+
+	EXPLAIN PLAN FOR SELECT L_SHIPDate FROM lineitem WHERE L_QUANTITY = 5 LIMIT 10;
+	+------+------+
+	| text | json |
+	+------+------+
+	| 00-00    Screen
+	00-01      Project(L_SHIPDate=[$0])
+	00-02        SelectionVectorRemover
+	00-03          Limit(fetch=[10])
+	00-04            Limit(fetch=[10])
+	00-05              Project(L_SHIPDate=[$1])
+	00-06                Scan(groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=hdfs:///drill/testdata/tpch/sf1/data/json/lineitem, condition=(L_QUANTITY = {"$numberLong":5}), indexName=l_single_c_5], columns=[`L_QUANTITY`, `L_SHIPDate`]]])  
+
+Turning the option off, as shown:  
+
+	ALTER SESSION SET planner.enable_index_planning = false   
+ 
+And running the EXPLAIN PLAN FOR command again shows the plan with a full table scan:  
+
+	EXPLAIN PLAN FOR SELECT L_SHIPDate FROM lineitem WHERE L_QUANTITY = 5 LIMIT 10;
+	+------+------+
+	| text | json |
+	+------+------+
+	| 00-00    Screen
+	00-01      Project(L_SHIPDate=[$0])
+	00-02        SelectionVectorRemover
+	00-03          Limit(fetch=[10])
+	00-04            UnionExchange
+	01-01              SelectionVectorRemover
+	01-02                Limit(fetch=[10])
+	01-03                  Project(L_SHIPDate=[$1])
+	01-04                    Scan(groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=hdfs:///drill/testdata/tpch/sf1/data/json/lineitem, condition=(L_QUANTITY = {"$numberLong":5})], columns=[`L_QUANTITY`, `L_SHIPDate`]]])
+	 |  
+
+**Note:** To see the cost of each plan, go to the Drill Web UI and view the query profile for each EXPLAIN PLAN FOR command that you issue through the command line.
+
+
+
+
+
+ 
+
+
+
+
+
+
+
+   
\ No newline at end of file