You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/02/26 01:31:06 UTC
[02/13] drill git commit: DRILL-2315: Confluence conversion plus fixes
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/003-select.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/cmd-summary/003-select.md b/_docs/sql-ref/cmd-summary/003-select.md
new file mode 100644
index 0000000..00004a4
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/003-select.md
@@ -0,0 +1,85 @@
+---
+title: "SELECT Statements"
+parent: "SQL Commands Summary"
+---
+Drill supports the following ANSI standard clauses in the SELECT statement:
+
+ * WITH clause
+ * SELECT list
+ * FROM clause
+ * WHERE clause
+ * GROUP BY clause
+ * HAVING clause
+ * ORDER BY clause (with an optional LIMIT clause)
+
+You can use the same SELECT syntax in the following commands:
+
+ * CREATE TABLE AS (CTAS)
+ * CREATE VIEW
+
+INSERT INTO SELECT is not yet supported.
+
+## Column Aliases
+
+You can use named column aliases in the SELECT list to provide meaningful
+names for regular columns and computed columns, such as the results of
+aggregate functions. See the section on running queries for examples.
+
+You cannot reference column aliases in the following clauses:
+
+ * WHERE
+ * GROUP BY
+ * HAVING
+
+Because Drill works with schema-less data sources, you cannot use positional
+aliases (1, 2, etc.) to refer to SELECT list columns, except in the ORDER BY
+clause.
+
+## UNION ALL Set Operator
+
+Drill supports the UNION ALL set operator to combine two result sets. The
+distinct UNION operator is not yet supported.
+
+The EXCEPT, EXCEPT ALL, INTERSECT, and INTERSECT ALL operators are not yet
+supported.
+
+## Joins
+
+Drill supports ANSI standard joins in the FROM and WHERE clauses:
+
+ * Inner joins
+ * Left, full, and right outer joins
+
+The following types of join syntax are supported:
+
+Join type| Syntax
+---|---
+Join condition in WHERE clause|FROM table1, table 2 WHERE table1.col1=table2.col1
+USING join in FROM clause|FROM table1 JOIN table2 USING(col1, ...)
+ON join in FROM clause|FROM table1 JOIN table2 ON table1.col1=table2.col1
+NATURAL JOIN in FROM clause|FROM table 1 NATURAL JOIN table 2
+
+Cross-joins are not yet supported. You must specify a join condition when more
+than one table is listed in the FROM clause.
+
+Non-equijoins are supported if the join also contains an equality condition on
+the same two tables as part of a conjunction:
+
+ table1.col1 = table2.col1 AND table1.c2 < table2.c2
+
+This restriction applies to both inner and outer joins.
+
+## Subqueries
+
+You can use the following subquery operators in Drill queries. These operators
+all return Boolean results.
+
+ * ALL
+ * ANY
+ * EXISTS
+ * IN
+ * SOME
+
+In general, correlated subqueries are supported. EXISTS and NOT EXISTS
+subqueries that do not contain a correlation join are not yet supported.
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/004-show-files.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/cmd-summary/004-show-files.md b/_docs/sql-ref/cmd-summary/004-show-files.md
new file mode 100644
index 0000000..1fcf395
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/004-show-files.md
@@ -0,0 +1,65 @@
+---
+title: "SHOW FILES Command"
+parent: "SQL Commands Summary"
+---
+The SHOW FILES command provides a quick report of the file systems that are
+visible to Drill for query purposes. This command is unique to Apache Drill.
+
+## Syntax
+
+The SHOW FILES command supports the following syntax.
+
+ SHOW FILES [ FROM filesystem.directory_name | IN filesystem.directory_name ];
+
+The FROM or IN clause is required if you do not specify a default file system
+first. You can do this with the USE command. FROM and IN are synonyms.
+
+The directory name is optional. (If the directory name is a Drill reserved
+word, you must use back ticks around the name.)
+
+The command returns standard Linux `stat` information for each file or
+directory, such as permissions, owner, and group values. This information is
+not specific to Drill.
+
+## Examples
+
+The following example returns information about directories and files in the
+local (`dfs`) file system.
+
+ 0: jdbc:drill:> use dfs;
+
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'dfs' |
+ +------------+------------+
+ 1 row selected (0.318 seconds)
+
+ 0: jdbc:drill:> show files;
+ +------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+ | name | isDirectory | isFile | length | owner | group | permissions | accessTime | modificationTime |
+ +------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+ | user | true | false | 1 | mapr | mapr | rwxr-xr-x | 2014-07-30 21:37:06.0 | 2014-07-31 22:15:53.193 |
+ | backup.tgz | false | true | 36272 | root | root | rw-r--r-- | 2014-07-31 22:09:13.0 | 2014-07-31 22:09:13.211 |
+ | JSON | true | false | 1 | root | root | rwxr-xr-x | 2014-07-31 15:22:42.0 | 2014-08-04 15:43:07.083 |
+ | scripts | true | false | 3 | root | root | rwxr-xr-x | 2014-07-31 22:10:51.0 | 2014-08-04 18:23:09.236 |
+ | temp | true | false | 2 | root | root | rwxr-xr-x | 2014-08-01 20:07:37.0 | 2014-08-01 20:09:42.595 |
+ | hbase | true | false | 10 | mapr | mapr | rwxr-xr-x | 2014-07-30 21:36:08.0 | 2014-08-04 18:31:13.778 |
+ | tables | true | false | 0 | root | root | rwxrwxrwx | 2014-07-31 22:14:35.0 | 2014-08-04 15:42:43.415 |
+ | CSV | true | false | 4 | root | root | rwxrwxrwx | 2014-07-31 17:34:53.0 | 2014-08-04
+ ...
+
+The following example shows the files in a specific directory in the `dfs`
+file system:
+
+ 0: jdbc:drill:> show files in dfs.CSV;
+
+ +------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+ | name | isDirectory | isFile | length | owner | group | permissions | accessTime | modificationTime |
+ +------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+ | customers.csv | false | true | 62011 | root | root | rw-r--r-- | 2014-08-04 18:30:39.0 | 2014-08-04 18:30:39.314 |
+ | products.csv.small | false | true | 34972 | root | root | rw-r--r-- | 2014-07-31 23:58:42.0 | 2014-07-31 23:59:16.849 |
+ | products.csv | false | true | 34972 | root | root | rw-r--r-- | 2014-08-01 06:39:34.0 | 2014-08-04 15:58:09.325 |
+ | products.csv.bad | false | true | 62307 | root | root | rw-r--r-- | 2014-08-04 15:58:02.0 | 2014-08-04 15:58:02.612 |
+ +------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+ 4 rows selected (0.165 seconds)
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/data-types/001-date.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/data-types/001-date.md b/_docs/sql-ref/data-types/001-date.md
new file mode 100644
index 0000000..6340e35
--- /dev/null
+++ b/_docs/sql-ref/data-types/001-date.md
@@ -0,0 +1,148 @@
+---
+title: "Supported Date/Time Data Type Formats"
+parent: "Data Types"
+---
+You must use supported `date` and `time` formats when you `SELECT` date and
+time literals or when you `CAST()` from `VARCHAR `to `date` and `time` data
+types. Apache Drill currently supports specific formats for the following
+`date` and `time` data types:
+
+ * Date
+ * Timestamp
+ * Time
+ * Interval
+ * Interval Year
+ * Interval Day
+ * Literal
+
+The following query provides an example of how to `SELECT` a few of the
+supported date and time formats as literals:
+
+ select date '2008-2-23', timestamp '2008-1-23 14:24:23', time '10:20:30' from dfs.`/tmp/input.json`;
+
+The following query provides an example where `VARCHAR` data in a file is
+`CAST()` to supported `date `and `time` formats:
+
+ select cast(col_A as date), cast(col_B as timestamp), cast(col_C as time) from dfs.`/tmp/dates.json`;
+
+`Date`, t`imestamp`, and` time` data types store values in `UTC`. Currently,
+Apache Drill does not support `timestamp` with time zone.
+
+## Date
+
+Drill supports the `date` data type in the following format:
+
+ YYYY-MM-DD (year-month-date)
+
+The following table provides some examples for the `date` data type:
+
+ | Use | Example |
+ | --- | ------- |
+ |Literal| `select date ‘2008-2-23’ from dfs.`/tmp/input.json`;`|
+ |`JSON` input | `{"date_col" : "2008-2-23"}
+ | `CAST` from `VARCHAR`| `` select CAST(date_col as date) as CAST_DATE from dfs.`/tmp/input.json`; ``|
+
+## Timestamp
+
+Drill supports the `timestamp` data type in the following format:
+
+ yyyy-MM-dd HH:mm:ss.SSS (year-month-date hour:minute:sec.milliseconds)
+
+The following table provides some examples for the `timestamp` data type:
+
+<table>
+ <tbody>
+ <tr>
+ <th>Use</th>
+ <th>CAST Example</th>
+ </tr>
+ <tr>
+ <td valign="top">Literal</td>
+ <td valign="top"><code><span style="color: rgb(0,0,0);">select timestamp ‘2008-2-23 10:20:30.345’, timestamp ‘2008-2-23 10:20:30’ from dfs.`/tmp/input.json`;</span></code>
+ </td></tr>
+ <tr>
+ <td colspan="1" valign="top"><code>JSON</code> Input</td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“timestamp_col”: “2008-2-23 15:20:30.345”}<br /></span><span style="color: rgb(0,0,0);">{“timestamp_col”: “2008-2-23 10:20:30”}</span></code><span style="color: rgb(0,0,0);">The fractional millisecond component is optional.</span></td>
+ </tr>
+ <tr>
+ <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select cast(timestamp_col as timestamp) from dfs.`/tmp/input.json`; </span></code></td>
+ </tr>
+ </tbody>
+ </table>
+
+## Time
+
+Drill supports the `time` data type in the following format:
+
+ HH:mm:ss.SSS (hour:minute:sec.milliseconds)
+
+The following table provides some examples for the `time` data type:
+
+<table><tbody><tr>
+ <th>Use</th>
+ <th>Example</th>
+ </tr>
+ <tr>
+ <td valign="top">Literal</td>
+ <td valign="top"><code><span style="color: rgb(0,0,0);">select time ‘15:20:30’, time ‘10:20:30.123’ from dfs.`/tmp/input.json`;</span></code></td>
+ </tr>
+ <tr>
+ <td colspan="1" valign="top"><code>JSON</code> Input</td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“time_col” : “10:20:30.999”}<br /></span><span style="color: rgb(0,0,0);">{“time_col”: “10:20:30”}</span></code></td>
+ </tr>
+ <tr>
+ <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select cast(time_col as time) from dfs.`/tmp/input.json`;</span></code></td>
+</tr></tbody>
+</table>
+
+## Interval
+
+Drill supports the `interval year` and `interval day` data types.
+
+### Interval Year
+
+The `interval year` data type stores time duration in years and months. Drill
+supports the `interval` data type in the following format:
+
+ P [qty] Y [qty] M
+
+The following table provides examples for `interval year` data type:
+
+<table ><tbody><tr>
+<th>Use</th>
+<th>Example</th></tr>
+ <tr>
+ <td valign="top">Literals</td>
+ <td valign="top"><code><span style="color: rgb(0,0,0);">select interval ‘1-2’ year to month from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select interval ‘1’ year from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select interval '13’ month from dfs.`/tmp/input.json`;</span></code></td></tr><tr>
+ <td colspan="1" valign="top"><code>JSON</code> Input</td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“col” : “P1Y2M”}<br /></span><span style="color: rgb(0,0,0);">{“col” : “P-1Y2M”}<br /></span><span style="color: rgb(0,0,0);">{“col” : “P-1Y-2M”}<br /></span><span style="color: rgb(0,0,0);">{“col”: “P10M”}<br /></span><span style="color: rgb(0,0,0);">{“col”: “P5Y”}</span></code></td>
+ </tr>
+ <tr>
+ <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
+ <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select cast(col as interval year) from dfs.`/tmp/input.json`;</span></code></td>
+ </tr>
+ </tbody></table>
+
+### Interval Day
+
+The `interval day` data type stores time duration in days, hours, minutes, and
+seconds. You do not need to specify all fields in a given interval. Drill
+supports the `interval day` data type in the following format:
+
+ P [qty] D T [qty] H [qty] M [qty] S
+
+The following table provides examples for `interval day` data type:
+
+<table ><tbody><tr><th >Use</th><th >Example</th></tr><tr><td valign="top">Literal</td><td valign="top"><code><span style="color: rgb(0,0,0);">select interval '1 10:20:30.123' day to second from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select interval '1 10' day to hour from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select interval '10' day from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select interval '10' hour from dfs.`/tmp/input.json`;</span></code><code><span style="color: rgb(0,0,0);">select interval '10.999' second from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" valign="top"><code>JSON</code> Input</td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{"col" : "P1DT10H20M30S"}<br /></span><span style="color: rgb(0,0,0);">{"col" : "P1DT10H20M30.123S"}<br /></span><span style="color: rgb(0,0,0);">{"col" : &q
uot;P1D"}<br /></span><span style="color: rgb(0,0,0);">{"col" : "PT10H"}<br /></span><span style="color: rgb(0,0,0);">{"col" : "PT10.10S"}<br /></span><span style="color: rgb(0,0,0);">{"col" : "PT20S"}<br /></span><span style="color: rgb(0,0,0);">{"col" : "PT10H10S"}</span></code></td></tr><tr><td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select cast(col as interval day) from dfs.`/tmp/input.json`;</span></code></td></tr></tbody></table>
+
+## Literal
+
+The following table provides a list of `date/time` literals that Drill
+supports with examples of each:
+
+<table ><tbody><tr><th >Format</th><th colspan="1" >Interpretation</th><th >Example</th></tr><tr><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '1 10:20:30.123' day to second</span></code></td><td colspan="1" valign="top"><code>1 day, 10 hours, 20 minutes, 30 seconds, and 123 thousandths of a second</code></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select interval '1 10:20:30.123' day to second from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '1 10' day to hour</span></code></td><td colspan="1" valign="top"><code>1 day 10 hours</code></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select interval '1 10' day to hour from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10' day</span></code></td><td colspan="1" valign="top"><code>10 days</code
></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select interval '10' day from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10' hour</span></code></td><td colspan="1" valign="top"><code>10 hours</code></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select interval '10' hour from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10.999' second</span></code></td><td colspan="1" valign="top"><code>10.999 seconds</code></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select interval '10.999' second from dfs.`/tmp/input.json`; </span></code></td></tr></tbody></table>
+
+
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/001-flatten.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/001-flatten.md b/_docs/sql-ref/nested/001-flatten.md
new file mode 100644
index 0000000..2769000
--- /dev/null
+++ b/_docs/sql-ref/nested/001-flatten.md
@@ -0,0 +1,89 @@
+---
+title: "FLATTEN Function"
+parent: "Nested Data Functions"
+---
+The FLATTEN function is useful for flexible exploration of repeated data.
+FLATTEN separates the elements in a repeated field into individual records. To
+maintain the association between each flattened value and the other fields in
+the record, all of the other columns are copied into each new record. A very
+simple example would turn this data (one record):
+
+ {
+ "x" : 5,
+ "y" : "a string",
+ "z" : [ 1,2,3]
+ }
+
+into three distinct records:
+
+ select flatten(z) from table;
+ | x | y | z |
+ +-------------+----------------+-----------+
+ | 5 | "a string" | 1 |
+ | 5 | "a string" | 2 |
+ | 5 | "a string" | 3 |
+
+The function takes a single argument, which must be an array (the `z` column
+in this example).
+
+
+
+For a more interesting example, consider the JSON data in the publicly
+available [Yelp](https://www.yelp.com/dataset_challenge/dataset) data set. The
+first query below returns three columns from the
+`yelp_academic_dataset_business.json` file: `name`, `hours`, and `categories`.
+The query is restricted to distinct rows where the name is `z``pizza`. The
+query returns only one row that meets those criteria; however, note that this
+row contains an array of four categories:
+
+ 0: jdbc:drill:zk=local> select distinct name, hours, categories
+ from dfs.yelp.`yelp_academic_dataset_business.json`
+ where name ='zpizza';
+ +------------+------------+------------+
+ | name | hours | categories |
+ +------------+------------+------------+
+ | zpizza | {"Tuesday":{"close":"22:00","open":"10:00"},"Friday":{"close":"23:00","open":"10:00"},"Monday":{"close":"22:00","open":"10:00"},"Wednesday":{"close":"22:00","open":"10:00"},"Thursday":{"close":"22:00","open":"10:00"},"Sunday":{"close":"22:00","open":"10:00"},"Saturday":{"close":"23:00","open":"10:00"}} | ["Gluten-Free","Pizza","Vegan","Restaurants"] |
+
+The FLATTEN function can operate on this single row and return multiple rows,
+one for each category:
+
+ 0: jdbc:drill:zk=local> select distinct name, flatten(categories) as categories
+ from dfs.yelp.`yelp_academic_dataset_business.json`
+ where name ='zpizza' order by 2;
+ +------------+-------------+
+ | name | categories |
+ +------------+-------------+
+ | zpizza | Gluten-Free |
+ | zpizza | Pizza |
+ | zpizza | Restaurants |
+ | zpizza | Vegan |
+ +------------+-------------+
+ 4 rows selected (2.797 seconds)
+
+Having used the FLATTEN function to break down arrays into distinct rows, you
+can run queries that do deeper analysis on the flattened result set. For
+example, you can use FLATTEN in a subquery, then apply WHERE clause
+constraints or aggregate functions to the results in the outer query.
+
+The following query uses the same data file as the previous query to flatten
+the categories array, then run a COUNT function on the flattened result:
+
+ select celltbl.catl, count(celltbl.catl) catcount
+ from (select flatten(categories) catl
+ from dfs.yelp.`yelp_academic_dataset_business.json`) celltbl
+ group by celltbl.catl
+ order by count(celltbl.catl) desc limit 5;
+
+ +---------------+------------+
+ | catl | catcount |
+ +---------------+------------+
+ | Restaurants | 14303 |
+ | Shopping | 6428 |
+ | Food | 5209 |
+ | Beauty & Spas | 3421 |
+ | Nightlife | 2870 |
+ +---------------|------------+
+
+A common use case for FLATTEN is its use in conjunction with the
+[KVGEN](/drill/docs/flatten-function) function.
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/002-kvgen.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/002-kvgen.md b/_docs/sql-ref/nested/002-kvgen.md
new file mode 100644
index 0000000..f619864
--- /dev/null
+++ b/_docs/sql-ref/nested/002-kvgen.md
@@ -0,0 +1,150 @@
+---
+title: "KVGEN Function"
+parent: "Nested Data Functions"
+---
+KVGEN stands for _key-value generation_. This function is useful when complex
+data files contain arbitrary maps that consist of relatively "unknown" column
+names. Instead of having to specify columns in the map to access the data, you
+can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
+a map with a wide set of columns into an array of key-value pairs.
+
+In turn, you can write analytic queries that return a subset of the generated
+keys or constrain the keys in some way. For example, you can use the
+[FLATTEN](/drill/docs/flatten-function) function to break the
+array down into multiple distinct rows and further query those rows.
+
+
+
+For example, assume that a JSON file contains this data:
+
+ {"a": "valA", "b": "valB"}
+ {"c": "valC", "d": "valD"}
+
+
+KVGEN would operate on this data to generate:
+
+ [{"key": "a", "value": "valA"}, {"key": "b", "value": "valB"}]
+ [{"key": "c", "value": "valC"}, {"key": "d", "value": "valD"}]
+
+Applying the [FLATTEN](/drill/docs/flatten-function) function to
+this data would return:
+
+ {"key": "a", "value": "valA"}
+ {"key": "b", "value": "valB"}
+ {"key": "c", "value": "valC"}
+ {"key": "d", "value": "valD"}
+
+Assume that a JSON file called `kvgendata.json` includes multiple records that
+look like this one:
+
+ {
+ "rownum": 1,
+ "bigintegercol": {
+ "int_1": 1,
+ "int_2": 2,
+ "int_3": 3
+ },
+ "varcharcol": {
+ "varchar_1": "abc",
+ "varchar_2": "def",
+ "varchar_3": "xyz"
+ },
+ "boolcol": {
+ "boolean_1": true,
+ "boolean_2": false,
+ "boolean_3": true
+ },
+ "float8col": {
+ "f8_1": 1.1,
+ "f8_2": 2.2
+ },
+ "complex": [
+ {
+ "col1": 3
+ },
+ {
+ "col2": 2,
+ "col3": 1
+ },
+ {
+ "col1": 7
+ }
+ ]
+ }
+
+ {
+ "rownum": 3,
+ "bigintegercol": {
+ "int_1": 1,
+ "int_3": 3
+ },
+ "varcharcol": {
+ "varchar_1": "abcde",
+ "varchar_2": null,
+ "varchar_3": "xyz",
+ "varchar_4": "xyz2"
+ },
+ "boolcol": {
+ "boolean_1": true,
+ "boolean_2": false
+ },
+ "float8col": {
+ "f8_1": 1.1,
+ "f8_3": 6.6
+ },
+ "complex": [
+ {
+ "col1": 2,
+ "col3": 1
+ }
+ ]
+ }
+ ...
+
+
+A SELECT * query against this specific record returns the following row:
+
+ 0: jdbc:drill:zk=local> select * from dfs.yelp.`kvgendata.json` where rownum=1;
+
+ +------------+---------------+------------+------------+------------+------------+
+ | rownum | bigintegercol | varcharcol | boolcol | float8col | complex |
+ +------------+---------------+------------+------------+------------+------------+
+ | 1 | {"int_1":1,"int_2":2,"int_3":3} | {"varchar_1":"abc","varchar_2":"def","varchar_3":"xyz"} | {"boolean_1":true,"boolean_2":false,"boolean_3":true} | {"f8_1":1.1,"f8_2":2.2} | [{"col1":3},{"col2":2,"col3":1},{"col1":7}] |
+ +------------+---------------+------------+------------+------------+------------+
+ 1 row selected (0.122 seconds)
+
+You can use the KVGEN function to turn the maps in this data into key-value
+pairs. For example:
+
+ 0: jdbc:drill:zk=local> select kvgen(varcharcol) from dfs.yelp.`kvgendata.json`;
+ +------------+
+ | EXPR$0 |
+ +------------+
+ | [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"},{"key":"varchar_3","value":"xyz"}] |
+ | [{"key":"varchar_1","value":"abcd"}] |
+ | [{"key":"varchar_1","value":"abcde"},{"key":"varchar_3","value":"xyz"},{"key":"varchar_4","value":"xyz2"}] |
+ | [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"}] |
+ +------------+
+ 4 rows selected (0.091 seconds)
+
+Now you can apply the FLATTEN function to break out the key-value pairs into
+distinct rows:
+
+ 0: jdbc:drill:zk=local> select flatten(kvgen(varcharcol)) from dfs.yelp.`kvgendata.json`;
+ +------------+
+ | EXPR$0 |
+ +------------+
+ | {"key":"varchar_1","value":"abc"} |
+ | {"key":"varchar_2","value":"def"} |
+ | {"key":"varchar_3","value":"xyz"} |
+ | {"key":"varchar_1","value":"abcd"} |
+ | {"key":"varchar_1","value":"abcde"} |
+ | {"key":"varchar_3","value":"xyz"} |
+ | {"key":"varchar_4","value":"xyz2"} |
+ | {"key":"varchar_1","value":"abc"} |
+ | {"key":"varchar_2","value":"def"} |
+ +------------+
+ 9 rows selected (0.151 seconds)
+
+See the description of [FLATTEN](/drill/docs/flatten-function)
+for an example of a query against the flattened data.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/003-repeated-cnt.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/003-repeated-cnt.md b/_docs/sql-ref/nested/003-repeated-cnt.md
new file mode 100644
index 0000000..2b332b3
--- /dev/null
+++ b/_docs/sql-ref/nested/003-repeated-cnt.md
@@ -0,0 +1,33 @@
+---
+title: "REPEATED_COUNT Function"
+parent: "Nested Data Functions"
+---
+This function counts the values in an array. The following example returns the
+counts for the `categories` array in the `yelp_academic_dataset_business.json`
+file. The counts are restricted to rows that contain the string `pizza`.
+
+ SELECT name, REPEATED_COUNT(categories)
+ FROM dfs.yelp.`yelp_academic_dataset_business.json`
+ WHERE name LIKE '%pizza%';
+
+ +---------------+------------+
+ | name | EXPR$1 |
+ +---------------+------------+
+ | Villapizza | 2 |
+ | zpizza | 4 |
+ | zpizza | 4 |
+ | Luckys pizza | 2 |
+ | Zpizza | 2 |
+ | S2pizzabar | 4 |
+ | Dominos pizza | 5 |
+ +---------------+------------+
+
+ 7 rows selected (2.03 seconds)
+
+The function requires a single argument, which must be an array. Note that
+this function is not a standard SQL aggregate function and does not require
+the count to be grouped by other columns in the select list (such as `name` in
+this example).
+
+For another example of this function, see the following lesson in the Apache
+Drill Tutorial for Hadoop: [Lesson 3: Run Queries on Complex Data Types](/drill/docs/lession-3-run-queries-on-complex-data-types/).
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/001-install-sandbox.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/001-install-sandbox.md b/_docs/tutorial/001-install-sandbox.md
new file mode 100644
index 0000000..26360ff
--- /dev/null
+++ b/_docs/tutorial/001-install-sandbox.md
@@ -0,0 +1,33 @@
+---
+title: "Installing the Apache Drill Sandbox"
+parent: "Apache Drill Tutorial"
+---
+## Prerequisites
+
+The MapR Sandbox with Apache Drill runs on VMware Player and VirtualBox, free
+desktop applications that you can use to run a virtual machine on a Windows,
+Mac, or Linux PC. Before you install the MapR Sandbox with Apache Drill,
+verify that the host system meets the following prerequisites:
+
+ * VMware Player or VirtualBox is installed.
+ * At least 20 GB free hard disk space, at least 4 physical cores, and 8 GB of RAM is available. Performance increases with more RAM and free hard disk space.
+ * Uses one of the following 64-bit x86 architectures:
+ * A 1.3 GHz or faster AMD CPU with segment-limit support in long mode
+ * A 1.3 GHz or faster Intel CPU with VT-x support
+ * If you have an Intel CPU with VT-x support, verify that VT-x support is enabled in the host system BIOS. The BIOS settings that must be enabled for VT-x support vary depending on the system vendor. See the VMware knowledge base article at <http://kb.vmware.com/kb/1003944> for information about how to determine if VT-x support is enabled.
+
+### VM Player Downloads
+
+For Linux, Mac, or Windows, download the free [VMware Player](https://my.vmwar
+e.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0) or
+[VirtualBox](https://www.virtualbox.org/wiki/Downloads). Optionally, you can
+purchase [VMware Fusion](http://www.vmware.com/products/fusion/) for Mac.
+
+### VM Player Installation
+
+The following list provides links to the virtual machine installation
+instructions:
+
+ * To install the VMware Player, see the [VMware documentation](http://www.vmware.com/support/pubs/player_pubs.html). Use of VMware Player is subject to the VMware Player end user license terms. VMware does not provide support for VMware Player. For self-help resources, see the [VMware Player FAQ](http://www.vmware.com/products/player/faqs.html).
+ * To install VirtualBox, see the [Oracle VM VirtualBox User Manual](http://dlc.sun.com.edgesuite.net/virtualbox/4.3.4/UserManual.pdf). By downloading VirtualBox, you agree to the terms and conditions of the respective license.
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/002-get2kno-sb.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/002-get2kno-sb.md b/_docs/tutorial/002-get2kno-sb.md
new file mode 100644
index 0000000..9b11b9d
--- /dev/null
+++ b/_docs/tutorial/002-get2kno-sb.md
@@ -0,0 +1,232 @@
+---
+title: "Getting to Know the Drill Sandbox"
+parent: "Apache Drill Tutorial"
+---
+This section describes the configuration of the Apache Drill system that you
+have installed and introduces the overall use case for the tutorial.
+
+# Storage Plugins Overview
+
+The Hadoop cluster within the sandbox is set up with MapR-FS, MapR-DB, and
+Hive, which all serve as data sources for Drill in this tutorial. Before you
+can run queries against these data sources, Drill requires each one to be
+configured as a storage plugin. A storage plugin defines the abstraction on
+the data sources for Drill to talk to and provides interfaces to read/write
+and get metadata from the data source. Each storage plugin also exposes
+optimization rules for Drill to leverage for efficient query execution.
+
+Take a look at the pre-configured storage plugins by opening the Drill Web UI.
+
+Feel free to skip this section and jump directly to the queries: [Lesson 1:
+Learn About the Data
+Set](/drill/docs/lession-1-learn-about-the-data-set)
+
+ * Launch a web browser and go to: `http://<IP address of the sandbox>:8047`
+ * Go to the Storage tab
+ * Open the configured storage plugins one at a time by clicking Update
+ * You will see the following plugins configured.
+
+## maprdb
+
+A storage plugin configuration for MapR-DB in the sandbox. Drill uses a single
+storage plugin for connecting to HBase as well as MapR-DB, which is an
+enterprise grade in-Hadoop NoSQL database. In addition to the following brief example, see the [Registering HBase](/drill/docs/registering-hbase) for more
+information on how to configure Drill to query HBase.
+
+ {
+ "type" : "hbase",
+ "enabled" : true,
+ "config" : {
+ "hbase.table.namespace.mappings" : "*:/tables"
+ }
+ }
+
+## dfs
+
+This is a storage plugin configuration for the MapR file system (MapR-FS) in
+the sandbox. The connection attribute indicates the type of distributed file
+system: in this case, MapR-FS. Drill can work with any distributed system,
+including HDFS, S3, and so on.
+
+The configuration also includes a set of workspaces; each one represents a
+location in MapR-FS:
+
+ * root: access to the root file system location
+ * clicks: access to nested JSON log data
+ * logs: access to flat (non-nested) JSON log data in the logs directory and its subdirectories
+ * views: a workspace for creating views
+
+A workspace in Drill is a location where users can easily access a specific
+set of data and collaborate with each other by sharing artifacts. Users can
+create as many workspaces as they need within Drill.
+
+Each workspace can also be configured as “writable” or not, which indicates
+whether users can write data to this location and defines the storage format
+in which the data will be written (parquet, csv, json). These attributes
+become relevant when you explore SQL commands, especially CREATE TABLE
+AS (CTAS) and CREATE VIEW.
+
+Drill can query files and directories directly and can detect the file formats
+based on the file extension or the first few bits of data within the file.
+However, additional information around formats is required for Drill, such as
+delimiters for text files, which are specified in the “formats” section below.
+
+ {
+ "type": "file",
+ "enabled": true,
+ "connection": "maprfs:///",
+ "workspaces": {
+ "root": {
+ "location": "/mapr/demo.mapr.com/data",
+ "writable": false,
+ "storageformat": null
+ },
+ "clicks": {
+ "location": "/mapr/demo.mapr.com/data/nested",
+ "writable": true,
+ "storageformat": "parquet"
+ },
+ "logs": {
+ "location": "/mapr/demo.mapr.com/data/flat",
+ "writable": true,
+ "storageformat": "parquet"
+ },
+ "views": {
+ "location": "/mapr/demo.mapr.com/data/views",
+ "writable": true,
+ "storageformat": "parquet"
+ },
+ "formats": {
+ "psv": {
+ "type": "text",
+ "extensions": [
+ "tbl"
+ ],
+ "delimiter": "|"
+ },
+ "csv": {
+ "type": "text",
+ "extensions": [
+ "csv"
+ ],
+ "delimiter": ","
+ },
+ "tsv": {
+ "type": "text",
+ "extensions": [
+ "tsv"
+ ],
+ "delimiter": "\t"
+ },
+ "parquet": {
+ "type": "parquet"
+ },
+ "json": {
+ "type": "json"
+ }
+ }}
+
+## hive
+
+A storage plugin configuration for a Hive data warehouse within the sandbox.
+Drill connects to the Hive metastore by using the configured metastore thrift
+URI. Metadata for Hive tables is automatically available for users to query.
+
+ {
+ "type": "hive",
+ "enabled": true,
+ "configProps": {
+ "hive.metastore.uris": "thrift://localhost:9083",
+ "hive.metastore.sasl.enabled": "false"
+ }
+ }
+
+# Client Application Interfaces
+
+Drill also provides additional application interfaces for the client tools to
+connect and access from Drill. The interfaces include the following.
+
+### ODBC/JDBC drivers
+
+Drill provides ODBC/JDBC drivers to connect from BI tools such as Tableau,
+MicroStrategy, SQUirrel, and Jaspersoft; refer to [Using ODBC to Access Apache
+Drill from BI Tools](/drill/docs/odbc-jdbc-interfaces/using-odbc-to- access-apache-drill-from-bi-tools) and [Using JDBC to Access Apache Drill](/drill/docs/odbc-jdbc-interfaces#using-jdbc-to-access-apache-drill-from-squirrel) to learn
+more.
+
+### SQLLine
+
+SQLLine is a JDBC application that comes packaged with Drill. In order to
+start working with it, you can use the command line on the demo cluster to log
+in as root, then enter `sqlline`. Use `mapr` as the login password. For
+example:
+
+ $ ssh root@localhost -p 2222
+ Password:
+ Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+ Welcome to your Mapr Demo virtual machine.
+ [root@maprdemo ~]# sqlline
+ sqlline version 1.1.6
+ 0: jdbc:drill:>
+
+### Drill Web UI
+
+The Drill Web UI is a simple user interface for configuring and manage Apache
+Drill. This UI can be launched from any of the nodes in the Drill cluster. The
+configuration for Drill includes setting up storage plugins that represent the
+data sources on which Drill performs queries. The sandbox comes with storage
+plugins configured for the Hive, HBase, MapR file system, and local file
+system.
+
+Users and developers can get the necessary information for tuning and
+performing diagnostics on queries, such as the list of queries executed in a
+session and detailed query plan profiles for each.
+
+Detailed configuration and management of Drill is out of scope for this
+tutorial.
+
+The Web interface for Apache Drill also provides a query UI where users can
+submit queries to Drill and observe results. Here is a screen shot of the Web
+UI for Apache Drill:
+
+![drill query flow]({{ site.baseurl }}/docs/img/DrillWebUI.png)
+
+### REST API
+
+Drill provides a simple REST API for the users to query data as well as manage
+the system. The Web UI leverages the REST API to talk to Drill.
+
+This tutorial introduces sample queries that you can run by using SQLLine.
+Note that you can run the queries just as easily by launching the Drill Web
+UI. No additional installation or configuration is required.
+
+# Use Case Overview
+
+As you run through the queries in this tutorial, put yourself in the shoes of
+an analyst with basic SQL skills. Let us imagine that the analyst works for an
+emerging online retail business that accepts purchases from its customers
+through both an established web-based interface and a new mobile application.
+
+The analyst is data-driven and operates mostly on the business side with
+little or no interaction with the IT department. Recently the central IT team
+has implemented a Hadoop-based infrastructure to reduce the cost of the legacy
+database system, and most of the DWH/ETL workload is now handled by
+Hadoop/Hive. The master customer profile information and product catalog are
+managed in MapR-DB, which is a NoSQL database. The IT team has also started
+acquiring clickstream data that comes from web and mobile applications. This
+data is stored in Hadoop as JSON files.
+
+The analyst has a number of data sources that he could explore, but exploring
+them in isolation is not the way to go. There are some potentially very
+interesting analytical connections between these data sources. For example, it
+would be good to be able to analyze customer records in the clickstream data
+and tie them to the master customer data in MapR DB.
+
+The analyst decides to explore various data sources and he chooses to do that
+by using Apache Drill. Think about the flexibility and analytic capability of
+Apache Drill as you work through the tutorial.
+
+# What's Next
+
+Start running queries by going to [Lesson 1: Learn About the Data
+Set](/drill/docs/lession-1-learn-about-the-data-set).
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/003-lesson1.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/003-lesson1.md b/_docs/tutorial/003-lesson1.md
new file mode 100644
index 0000000..119d67f
--- /dev/null
+++ b/_docs/tutorial/003-lesson1.md
@@ -0,0 +1,396 @@
+---
+title: "Lession 1: Learn about the Data Set"
+parent: "Apache Drill Tutorial"
+---
+## Goal
+
+This lesson is simply about discovering what data is available, in what
+format, using simple SQL SELECT statements. Drill is capable of analyzing data
+without prior knowledge or definition of its schema. This means that you can
+start querying data immediately (and even as it changes), regardless of its
+format.
+
+The data set for the tutorial consists of:
+
+ * Transactional data: stored as a Hive table
+ * Product catalog and master customer data: stored as MapR-DB tables
+ * Clickstream and logs data: stored in the MapR file system as JSON files
+
+## Queries in This Lesson
+
+This lesson consists of select * queries on each data source.
+
+## Before You Begin
+
+### Start sqlline
+
+If sqlline is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter `sqlline`:
+
+ $ ssh root@10.250.0.6
+ Password:
+ Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+ Welcome to your Mapr Demo virtual machine.
+ [root@maprdemo ~]# sqlline
+ sqlline version 1.1.6
+ 0: jdbc:drill:>
+
+You can run queries from this prompt to complete the tutorial. To exit from
+`sqlline`, type:
+
+ 0: jdbc:drill:> !quit
+
+Note that though this tutorial demonstrates the queries using SQLLine, you can
+also execute queries using the Drill Web UI.
+
+### List the available workspaces and databases:
+
+ 0: jdbc:drill:> show databases;
+ +-------------+
+ | SCHEMA_NAME |
+ +-------------+
+ | hive.default |
+ | dfs.default |
+ | dfs.logs |
+ | dfs.root |
+ | dfs.views |
+ | dfs.clicks |
+ | dfs.data |
+ | dfs.tmp |
+ | sys |
+ | maprdb |
+ | cp.default |
+ | INFORMATION_SCHEMA |
+ +-------------+
+ 12 rows selected
+
+Note that this command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. This includes the Hive and
+MapR-DB databases as well as the workspaces configured in the file system. As
+you run queries in the tutorial, you will switch among these schemas by
+submitting the USE command. This behavior resembles the ability to use
+different database schemas (namespaces) in a relational database system.
+
+## Query Hive Tables
+
+The orders table is a six-column Hive table defined in the Hive metastore.
+This is a Hive external table pointing to the data stored in flat files on the
+MapR file system. The orders table contains 122,000 rows.
+
+### Set the schema to hive:
+
+ 0: jdbc:drill:> use hive;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'hive' |
+ +------------+------------+
+
+You will run the USE command throughout this tutorial. The USE command sets
+the schema for the current session.
+
+### Describe the table:
+
+You can use the DESCRIBE command to show the columns and data types for a Hive
+table:
+
+ 0: jdbc:drill:> describe orders;
+ +-------------+------------+-------------+
+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+ +-------------+------------+-------------+
+ | order_id | BIGINT | YES |
+ | month | VARCHAR | YES |
+ | cust_id | BIGINT | YES |
+ | state | VARCHAR | YES |
+ | prod_id | BIGINT | YES |
+ | order_total | INTEGER | YES |
+ +-------------+------------+-------------+
+
+The DESCRIBE command returns complete schema information for Hive tables based
+on the metadata available in the Hive metastore.
+
+### Select 5 rows from the orders table:
+
+ 0: jdbc:drill:> select * from orders limit 5;
+ +------------+------------+------------+------------+------------+-------------+
+ | order_id | month | cust_id | state | prod_id | order_total |
+ +------------+------------+------------+------------+------------+-------------+
+ | 67212 | June | 10001 | ca | 909 | 13 |
+ | 70302 | June | 10004 | ga | 420 | 11 |
+ | 69090 | June | 10011 | fl | 44 | 76 |
+ | 68834 | June | 10012 | ar | 0 | 81 |
+ | 71220 | June | 10018 | az | 411 | 24 |
+ +------------+------------+------------+------------+------------+-------------+
+
+Because orders is a Hive table, you can query the data in the same way that
+you would query the columns in a relational database table. Note the use of
+the standard LIMIT clause, which limits the result set to the specified number
+of rows. You can use LIMIT with or without an ORDER BY clause.
+
+Drill provides seamless integration with Hive by allowing queries on Hive
+tables defined in the metastore with no extra configuration. Note that Hive is
+not a prerequisite for Drill, but simply serves as a storage plugin or data
+source for Drill. Drill also lets users query all Hive file formats (including
+custom serdes). Additionally, any UDFs defined in Hive can be leveraged as
+part of Drill queries.
+
+Because Drill has its own low-latency SQL query execution engine, you can
+query Hive tables with high performance and support for interactive and ad-hoc
+data exploration.
+
+## Query MapR-DB and HBase Tables
+
+The customers and products tables are MapR-DB tables. MapR-DB is an enterprise
+in-Hadoop NoSQL database. It exposes the HBase API to support application
+development. Every MapR-DB table has a row_key, in addition to one or more
+column families. Each column family contains one or more specific columns. The
+row_key value is a primary key that uniquely identifies each row.
+
+Drill allows direct queries on MapR-DB and HBase tables. Unlike other SQL on
+Hadoop options, Drill requires no overlay schema definitions in Hive to work
+with this data. Think about a MapR-DB or HBase table with thousands of
+columns, such as a time-series database, and the pain of having to manage
+duplicate schemas for it in Hive!
+
+### Products Table
+
+The products table has two column families.
+
+<table ><colgroup><col /><col /></colgroup><tbody><tr><td ><span style="color: rgb(0,0,0);">Column Family</span></td><td ><span style="color: rgb(0,0,0);">Columns</span></td></tr><tr><td ><span style="color: rgb(0,0,0);">details</span></td><td ><span style="color: rgb(0,0,0);">name</br></span><span style="color: rgb(0,0,0);">category</span></td></tr><tr><td ><span style="color: rgb(0,0,0);">pricing</span></td><td ><span style="color: rgb(0,0,0);">price</span></td></tr></tbody></table>
+The products table contains 965 rows.
+
+### Customers Table
+
+The Customers table has three column families.
+
+<table ><colgroup><col /><col /></colgroup><tbody><tr><td ><span style="color: rgb(0,0,0);">Column Family</span></td><td ><span style="color: rgb(0,0,0);">Columns</span></td></tr><tr><td ><span style="color: rgb(0,0,0);">address</span></td><td ><span style="color: rgb(0,0,0);">state</span></td></tr><tr><td ><span style="color: rgb(0,0,0);">loyalty</span></td><td ><span style="color: rgb(0,0,0);">agg_rev</br></span><span style="color: rgb(0,0,0);">membership</span></td></tr><tr><td ><span style="color: rgb(0,0,0);">personal</span></td><td ><span style="color: rgb(0,0,0);">age</br></span><span style="color: rgb(0,0,0);">gender</span></td></tr></tbody></table>
+
+The customers table contains 993 rows.
+
+### Set the workspace to maprdb:
+
+ 0: jdbc:drill:> use maprdb;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'maprdb' |
+ +------------+------------+
+
+### Describe the tables:
+
+ 0: jdbc:drill:> describe customers;
+ +-------------+------------+-------------+
+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+ +-------------+------------+-------------+
+ | row_key | ANY | NO |
+ | address | (VARCHAR(1), ANY) MAP | NO |
+ | loyalty | (VARCHAR(1), ANY) MAP | NO |
+ | personal | (VARCHAR(1), ANY) MAP | NO |
+ +-------------+------------+-------------+
+
+ 0: jdbc:drill:> describe products;
+ +-------------+------------+-------------+
+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+ +-------------+------------+-------------+
+ | row_key | ANY | NO |
+ | details | (VARCHAR(1), ANY) MAP | NO |
+ | pricing | (VARCHAR(1), ANY) MAP | NO |
+ +-------------+------------+-------------+
+
+Unlike the Hive example, the DESCRIBE command does not return the full schema
+up to the column level. Wide-column NoSQL databases such as MapR-DB and HBase
+can be schema-less by design; every row has its own set of column name-value
+pairs in a given column family, and the column value can be of any data type,
+as determined by the application inserting the data.
+
+A “MAP” complex type in Drill represents this variable column name-value
+structure, and “ANY” represents the fact that the column value can be of any
+data type. Observe the row_key, which is also simply bytes and has the type
+ANY.
+
+### Select 5 rows from the products table:
+
+ 0: jdbc:drill:> select * from products limit 5;
+ +------------+------------+------------+
+ | row_key | details | pricing |
+ +------------+------------+------------+
+ | [B@a1a3e25 | {"category":"bGFwdG9w","name":"IlNvbnkgbm90ZWJvb2si"} | {"price":"OTU5"} |
+ | [B@103a43af | {"category":"RW52ZWxvcGVz","name":"IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw=="} | {"price":"MT |
+ | [B@61319e7b | {"category":"U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==","name":"MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs"} | {"price" |
+ | [B@9bcf17 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk4"} | {"price":"Mw=="} |
+ | [B@7538ef50 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk="} | {"price":"Mw=="} |
+
+Given that Drill requires no up front schema definitions indicating data
+types, the query returns the raw byte arrays for column values, just as they
+are stored in MapR-DB (or HBase). Observe that the column families (details
+and pricing) have the map data type and appear as JSON strings.
+
+In Lesson 2, you will use CAST functions to return typed data for each column.
+
+### Select 5 rows from the customers table:
+
+
+ +0: jdbc:drill:> select * from customers limit 5;
+ +------------+------------+------------+------------+
+ | row_key | address | loyalty | personal |
+ +------------+------------+------------+------------+
+ | [B@284bae62 | {"state":"Imt5Ig=="} | {"agg_rev":"IjEwMDEtMzAwMCI=","membership":"ImJhc2ljIg=="} | {"age":"IjI2LTM1Ig==","gender":"Ik1B |
+ | [B@7ffa4523 | {"state":"ImNhIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImdvbGQi"} | {"age":"IjI2LTM1Ig==","gender":"IkZFTUFMRSI= |
+ | [B@7d13e79 | {"state":"Im9rIg=="} | {"agg_rev":"IjUwMS0xMDAwIg==","membership":"InNpbHZlciI="} | {"age":"IjI2LTM1Ig==","gender":"IkZFT |
+ | [B@3a5c7df1 | {"state":"ImtzIg=="} | {"agg_rev":"IjMwMDEtMTAwMDAwIg==","membership":"ImdvbGQi"} | {"age":"IjUxLTEwMCI=","gender":"IkZF |
+ | [B@e507726 | {"state":"Im5qIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImJhc2ljIg=="} | {"age":"IjIxLTI1Ig==","gender":"Ik1BTEUi" |
+ +------------+------------+------------+------------+
+
+Again the table returns byte data that needs to be cast to readable data
+types.
+
+## Query the File System
+
+Along with querying a data source with full schemas (such as Hive) and partial
+schemas (such as MapR-DB and HBase), Drill offers the unique capability to
+perform SQL queries directly on file system. The file system could be a local
+file system, or a distributed file system such as MapR-FS, HDFS, or S3.
+
+In the context of Drill, a file or a directory is considered as synonymous to
+a relational database “table.” Therefore, you can perform SQL operations
+directly on files and directories without the need for up-front schema
+definitions or schema management for any model changes. The schema is
+discovered on the fly based on the query. Drill supports queries on a variety
+of file formats including text, CSV, Parquet, and JSON in the 0.5 release.
+
+In this example, the clickstream data coming from the mobile/web applications
+is in JSON format. The JSON files have the following structure:
+
+ {"trans_id":31920,"date":"2014-04-26","time":"12:17:12","user_info":{"cust_id":22526,"device":"IOS5","state":"il"},"trans_info":{"prod_id":[174,2],"purch_flag":"false"}}
+ {"trans_id":31026,"date":"2014-04-20","time":"13:50:29","user_info":{"cust_id":16368,"device":"AOS4.2","state":"nc"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
+ {"trans_id":33848,"date":"2014-04-10","time":"04:44:42","user_info":{"cust_id":21449,"device":"IOS6","state":"oh"},"trans_info":{"prod_id":[582],"purch_flag":"false"}}
+
+
+The clicks.json and clicks.campaign.json files contain metadata as part of the
+data itself (referred to as “self-describing” data). Also note that the data
+elements are complex, or nested. The initial queries below do not show how to
+unpack the nested data, but they show that easy access to the data requires no
+setup beyond the definition of a workspace.
+
+### Query nested clickstream data
+
+#### Set the workspace to dfs.clicks:
+
+ 0: jdbc:drill:> use dfs.clicks;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'dfs.clicks' |
+ +------------+------------+
+
+In this case, setting the workspace is a mechanism for making queries easier
+to write. When you specify a file system workspace, you can shorten references
+to files in the FROM clause of your queries. Instead of having to provide the
+complete path to a file, you can provide the path relative to a directory
+location specified in the workspace. For example:
+
+ "location": "/mapr/demo.mapr.com/data/nested"
+
+Any file or directory that you want to query in this path can be referenced
+relative to this path. The clicks directory referred to in the following query
+is directly below the nested directory.
+
+#### Select 2 rows from the clicks.json file:
+
+ 0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
+ +------------+------------+------------+------------+------------+
+ | trans_id | date | time | user_info | trans_info |
+ +------------+------------+------------+------------+------------+
+ | 31920 | 2014-04-26 | 12:17:12 | {"cust_id":22526,"device":"IOS5","state":"il"} | {"prod_id":[174,2],"purch_flag":"false"} |
+ | 31026 | 2014-04-20 | 13:50:29 | {"cust_id":16368,"device":"AOS4.2","state":"nc"} | {"prod_id":[],"purch_flag":"false"} |
+ +------------+------------+------------+------------+------------+
+ 2 rows selected
+
+Note that the FROM clause reference points to a specific file. Drill expands
+the traditional concept of a “table reference” in a standard SQL FROM clause
+to refer to a file in a local or distributed file system.
+
+The only special requirement is the use of back ticks to enclose the file
+path. This is necessary whenever the file path contains Drill reserved words
+or characters.
+
+#### Select 2 rows from the campaign.json file:
+
+ 0: jdbc:drill:> select * from `clicks/clicks.campaign.json` limit 2;
+ +------------+------------+------------+------------+------------+------------+
+ | trans_id | date | time | user_info | ad_info | trans_info |
+ +------------+------------+------------+------------+------------+------------+
+ | 35232 | 2014-05-10 | 00:13:03 | {"cust_id":18520,"device":"AOS4.3","state":"tx"} | {"camp_id":"null"} | {"prod_id":[7,7],"purch_flag":"true"} |
+ | 31995 | 2014-05-22 | 16:06:38 | {"cust_id":17182,"device":"IOS6","state":"fl"} | {"camp_id":"null"} | {"prod_id":[],"purch_flag":"false"} |
+ +------------+------------+------------+------------+------------+------------+
+ 2 rows selected
+
+Notice that with a select * query, any complex data types such as maps and
+arrays return as JSON strings. You will see how to unpack this data using
+various SQL functions and operators in the next lesson.
+
+## Query Logs Data
+
+Unlike the previous example where we performed queries against clicks data in
+one file, logs data is stored as partitioned directories on the file system.
+The logs directory has three subdirectories:
+
+ * 2012
+ * 2013
+ * 2014
+
+Each of these year directories fans out to a set of numbered month
+directories, and each month directory contains a JSON file with log records
+for that month. The total number of records in all log files is 48000.
+
+The files in the logs directory and its subdirectories are JSON files. There
+are many of these files, but you can use Drill to query them all as a single
+data source, or to query a subset of the files.
+
+#### Set the workspace to dfs.logs:
+
+ 0: jdbc:drill:> use dfs.logs;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'dfs.logs' |
+ +------------+------------+
+
+#### Select 2 rows from the logs directory:
+
+ 0: jdbc:drill:> select * from logs limit 2;
+ +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+ | dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_fl |
+ +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+ | 2014 | 8 | 24181 | 08/02/2014 | 09:23:52 | 0 | IOS5 | il | 2 | wait | 128 | false |
+ | 2014 | 8 | 24195 | 08/02/2014 | 07:58:19 | 243 | IOS5 | mo | 6 | hmm | 107 | false |
+ +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+
+Note that this is flat JSON data. The dfs.clicks workspace location property
+points to a directory that contains the logs directory, making the FROM clause
+reference for this query very simple. You do not have to refer to the complete
+directory path on the file system.
+
+The column names dir0 and dir1 are special Drill variables that identify
+subdirectories below the logs directory. In Lesson 3, you will do more complex
+queries that leverage these dynamic variables.
+
+#### Find the total number of rows in the logs directory (all files):
+
+ 0: jdbc:drill:> select count(*) from logs;
+ +------------+
+ | EXPR$0 |
+ +------------+
+ | 48000 |
+ +------------+
+
+This query traverses all of the files in the logs directory and its
+subdirectories to return the total number of rows in those files.
+
+# What's Next
+
+Go to [Lesson 2: Run Queries with ANSI
+SQL](/drill/docs/lession-2-run-queries-with-ansi-sql).
+
+
+
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/004-lesson2.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/004-lesson2.md b/_docs/tutorial/004-lesson2.md
new file mode 100644
index 0000000..73c4329
--- /dev/null
+++ b/_docs/tutorial/004-lesson2.md
@@ -0,0 +1,388 @@
+---
+title: "Lession 2: Run Queries with ANSI SQL"
+parent: "Apache Drill Tutorial"
+---
+## Goal
+
+This lesson shows how to do some standard SQL analysis in Apache Drill: for
+example, summarizing data by using simple aggregate functions and connecting
+data sources by using joins. Note that Apache Drill provides ANSI SQL support,
+not a “SQL-like” interface.
+
+## Queries in This Lesson
+
+Now that you know what the data sources look like in their raw form, using
+select * queries, try running some simple but more useful queries on each data
+source. These queries demonstrate how Drill supports ANSI SQL constructs and
+also how you can combine data from different data sources in a single SELECT
+statement.
+
+ * Show an aggregate query on a single file or table. Use GROUP BY, WHERE, HAVING, and ORDER BY clauses.
+ * Perform joins between Hive, MapR-DB, and file system data sources.
+ * Use table and column aliases.
+ * Create a Drill view.
+
+## Aggregation
+
+
+### Set the schema to hive:
+
+ 0: jdbc:drill:> use hive;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'hive' |
+ +------------+------------+
+ 1 row selected
+
+### Return sales totals by month:
+
+ 0: jdbc:drill:> select `month`, sum(order_total)
+ from orders group by `month` order by 2 desc;
+ +------------+------------+
+ | month | EXPR$1 |
+ +------------+------------+
+ | June | 950481 |
+ | May | 947796 |
+ | March | 836809 |
+ | April | 807291 |
+ | July | 757395 |
+ | October | 676236 |
+ | August | 572269 |
+ | February | 532901 |
+ | September | 373100 |
+ | January | 346536 |
+ +------------+------------+
+
+Drill supports SQL aggregate functions such as SUM, MAX, AVG, and MIN.
+Standard SQL clauses work in the same way in Drill queries as in relational
+database queries.
+
+Note that back ticks are required for the “month” column only because “month”
+is a reserved word in SQL.
+
+### Return the top 20 sales totals by month and state:
+
+ 0: jdbc:drill:> select `month`, state, sum(order_total) as sales from orders group by `month`, state
+ order by 3 desc limit 20;
+ +------------+------------+------------+
+ | month | state | sales |
+ +------------+------------+------------+
+ | May | ca | 119586 |
+ | June | ca | 116322 |
+ | April | ca | 101363 |
+ | March | ca | 99540 |
+ | July | ca | 90285 |
+ | October | ca | 80090 |
+ | June | tx | 78363 |
+ | May | tx | 77247 |
+ | March | tx | 73815 |
+ | August | ca | 71255 |
+ | April | tx | 68385 |
+ | July | tx | 63858 |
+ | February | ca | 63527 |
+ | June | fl | 62199 |
+ | June | ny | 62052 |
+ | May | fl | 61651 |
+ | May | ny | 59369 |
+ | October | tx | 55076 |
+ | March | fl | 54867 |
+ | March | ny | 52101 |
+ +------------+------------+------------+
+ 20 rows selected
+
+Note the alias for the result of the SUM function. Drill supports column
+aliases and table aliases.
+
+## HAVING Clause
+
+This query uses the HAVING clause to constrain an aggregate result.
+
+### Set the workspace to dfs.clicks
+
+ 0: jdbc:drill:> use dfs.clicks;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'dfs.clicks' |
+ +------------+------------+
+ 1 row selected
+
+### Return total number of clicks for devices that indicate high click-throughs:
+
+ 0: jdbc:drill:> select t.user_info.device, count(*) from `clicks/clicks.json` t
+ group by t.user_info.device
+ having count(*) > 1000;
+ +------------+------------+
+ | EXPR$0 | EXPR$1 |
+ +------------+------------+
+ | IOS5 | 11814 |
+ | AOS4.2 | 5986 |
+ | IOS6 | 4464 |
+ | IOS7 | 3135 |
+ | AOS4.4 | 1562 |
+ | AOS4.3 | 3039 |
+ +------------+------------+
+
+The aggregate is a count of the records for each different mobile device in
+the clickstream data. Only the activity for the devices that registered more
+than 1000 transactions qualify for the result set.
+
+## UNION Operator
+
+Use the same workspace as before (dfs.clicks).
+
+### Combine clicks activity from before and after the marketing campaign
+
+ 0: jdbc:drill:> select t.trans_id transaction, t.user_info.cust_id customer from `clicks/clicks.campaign.json` t
+ union all
+ select u.trans_id, u.user_info.cust_id from `clicks/clicks.json` u limit 5;
+ +-------------+------------+
+ | transaction | customer |
+ +-------------+------------+
+ | 35232 | 18520 |
+ | 31995 | 17182 |
+ | 35760 | 18228 |
+ | 37090 | 17015 |
+ | 37838 | 18737 |
+ +-------------+------------+
+
+This UNION ALL query returns rows that exist in two files (and includes any
+duplicate rows from those files): `clicks.campaign.json` and `clicks.json`.
+
+## Subqueries
+
+### Set the workspace to hive:
+
+ 0: jdbc:drill:> use hive;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'hive' |
+ +------------+------------+
+
+### Compare order totals across states:
+
+ 0: jdbc:drill:> select o1.cust_id, sum(o1.order_total) as ny_sales,
+ (select sum(o2.order_total) from hive.orders o2
+ where o1.cust_id=o2.cust_id and state='ca') as ca_sales
+ from hive.orders o1 where o1.state='ny' group by o1.cust_id
+ order by cust_id limit 20;
+ +------------+------------+------------+
+ | cust_id | ny_sales | ca_sales |
+ +------------+------------+------------+
+ | 1001 | 72 | 47 |
+ | 1002 | 108 | 198 |
+ | 1003 | 83 | null |
+ | 1004 | 86 | 210 |
+ | 1005 | 168 | 153 |
+ | 1006 | 29 | 326 |
+ | 1008 | 105 | 168 |
+ | 1009 | 443 | 127 |
+ | 1010 | 75 | 18 |
+ | 1012 | 110 | null |
+ | 1013 | 19 | null |
+ | 1014 | 106 | 162 |
+ | 1015 | 220 | 153 |
+ | 1016 | 85 | 159 |
+ | 1017 | 82 | 56 |
+ | 1019 | 37 | 196 |
+ | 1020 | 193 | 165 |
+ | 1022 | 124 | null |
+ | 1023 | 166 | 149 |
+ | 1024 | 233 | null |
+ +------------+------------+------------+
+
+This example demonstrates Drill support for correlated subqueries. This query
+uses a subquery in the select list and correlates the result of the subquery
+with the outer query, using the cust_id column reference. The subquery returns
+the sum of order totals for California, and the outer query returns the
+equivalent sum, for the same cust_id, for New York.
+
+The result set is sorted by the cust_id and presents the sales totals side by
+side for easy comparison. Null values indicate customer IDs that did not
+register any sales in that state.
+
+## CAST Function
+
+### Use the maprdb workspace:
+
+ 0: jdbc:drill:> use maprdb;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'maprdb' |
+ +------------+------------+
+ 1 row selected
+
+### Return customer data with appropriate data types
+
+ 0: jdbc:drill:> select cast(row_key as int) as cust_id, cast(t.personal.name as varchar(20)) as name,
+ cast(t.personal.gender as varchar(10)) as gender, cast(t.personal.age as varchar(10)) as age,
+ cast(t.address.state as varchar(4)) as state, cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+ cast(t.loyalty.membership as varchar(20)) as membership
+ from customers t limit 5;
+ +------------+------------+------------+------------+------------ +------------+------------+
+ | cust_id | name | gender | age | state | agg_rev | membership |
+ +------------+------------+------------+------------+------------+------------+------------+
+ | 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
+ | 10005 | "Brittany Park" | "MALE" | "26-35" | "in" | 230.00 | "silver" |
+ | 10006 | "Rose Lokey" | "MALE" | "26-35" | "ca" | 250.00 | "silver" |
+ | 10007 | "James Fowler" | "FEMALE" | "51-100" | "me" | 263.00 | "silver" |
+ | 10010 | "Guillermo Koehler" | "OTHER" | "51-100" | "mn" | 202.00 | "silver" |
+ +------------+------------+------------+------------+------------+------------+------------+
+ 5 rows selected
+
+Note the following features of this query:
+
+ * The CAST function is required for every column in the table. This function returns the MapR-DB/HBase binary data as readable integers and strings. Alternatively, you can use CONVERT_TO/CONVERT_FROM functions to decode the columns. CONVERT_TO and CONVERT_FROM are more efficient than CAST in most cases.
+ * The row_key column functions as the primary key of the table (a customer ID in this case).
+ * The table alias t is required; otherwise the column family names would be parsed as table names and the query would return an error.
+
+### Remove the quotes from the strings:
+
+You can use the regexp_replace function to remove the quotes around the
+strings in the query results. For example, to return a state name va instead
+of “va”:
+
+ 0: jdbc:drill:> select cast(row_key as int), regexp_replace(cast(t.address.state as varchar(10)),'"','')
+ from customers t limit 1;
+ +------------+------------+
+ | EXPR$0 | EXPR$1 |
+ +------------+------------+
+ | 10001 | va |
+ +------------+------------+
+ 1 row selected
+
+## CREATE VIEW Command
+
+ 0: jdbc:drill:> use dfs.views;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | Default schema changed to 'dfs.views' |
+ +------------+------------+
+
+### Use a mutable workspace:
+
+A mutable (or writable) workspace is a workspace that is enabled for “write”
+operations. This attribute is part of the storage plugin configuration. You
+can create Drill views and tables in mutable workspaces.
+
+### Create a view on a MapR-DB table
+
+ 0: jdbc:drill:> create or replace view custview as select cast(row_key as int) as cust_id,
+ cast(t.personal.name as varchar(20)) as name,
+ cast(t.personal.gender as varchar(10)) as gender,
+ cast(t.personal.age as varchar(10)) as age,
+ cast(t.address.state as varchar(4)) as state,
+ cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+ cast(t.loyalty.membership as varchar(20)) as membership
+ from maprdb.customers t;
+ +------------+------------+
+ | ok | summary |
+ +------------+------------+
+ | true | View 'custview' replaced successfully in 'dfs.views' schema |
+ +------------+------------+
+ 1 row selected
+
+Drill provides CREATE OR REPLACE VIEW syntax similar to relational databases
+to create views. Use the OR REPLACE option to make it easier to update the
+view later without having to remove it first. Note that the FROM clause in
+this example must refer to maprdb.customers. The MapR-DB tables are not
+directly visible to the dfs.views workspace.
+
+Unlike a traditional database where views typically are DBA/developer-driven
+operations, file system-based views in Drill are very lightweight. A view is
+simply a special file with a specific extension (.drill). You can store views
+even in your local file system or point to a specific workspace. You can
+specify any query against any Drill data source in the body of the CREATE VIEW
+statement.
+
+Drill provides a decentralized metadata model. Drill is able to query metadata
+defined in data sources such as Hive, HBase, and the file system. Drill also
+supports the creation of metadata in the file system.
+
+### Query data from the view:
+
+ 0: jdbc:drill:> select * from custview limit 1;
+ +------------+------------+------------+------------+------------+------------+------------+
+ | cust_id | name | gender | age | state | agg_rev | membership |
+ +------------+------------+------------+------------+------------+------------+------------+
+ | 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
+ +------------+------------+------------+------------+------------+------------+------------+
+
+Once the users get an idea on what data is available by exploring it directly
+from file system , views can be used as a way to take the data in downstream
+tools like Tableau, Microstrategy etc for downstream analysis and
+visualization. For these tools, a view appears simply as a “table” with
+selectable “columns” in it.
+
+## Query Across Data Sources
+
+Continue using `dfs.views` for this query.
+
+### Join the customers view and the orders table:
+
+ 0: jdbc:drill:> select membership, sum(order_total) as sales from hive.orders, custview
+ where orders.cust_id=custview.cust_id
+ group by membership order by 2;
+ +------------+------------+
+ | membership | sales |
+ +------------+------------+
+ | "basic" | 380665 |
+ | "silver" | 708438 |
+ | "gold" | 2787682 |
+ +------------+------------+
+ 3 rows selected
+
+In this query, we are reading data from a MapR-DB table (represented by
+custview) and combining it with the order information in Hive. When doing
+cross data source queries such as this, you need to use fully qualified
+table/view names. For example, the orders table is prefixed by “hive,” which
+is the storage plugin name registered with Drill. We are not using any prefix
+for “custview” because we explicitly switched the dfs.views workspace where
+custview is stored.
+
+Note: If the results of any of your queries appear to be truncated because the
+rows are wide, set the maximum width of the display to 10000:
+
+Do not use a semicolon for this SET command.
+
+### Join the customers, orders, and clickstream data:
+
+ 0: jdbc:drill:> select custview.membership, sum(orders.order_total) as sales from hive.orders, custview,
+ dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c
+ where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id
+ group by custview.membership order by 2;
+ +------------+------------+
+ | membership | sales |
+ +------------+------------+
+ | "basic" | 372866 |
+ | "silver" | 728424 |
+ | "gold" | 7050198 |
+ +------------+------------+
+ 3 rows selected
+
+This three-way join selects from three different data sources in one query:
+
+ * hive.orders table
+ * custview (a view of the HBase customers table)
+ * clicks.json file
+
+The join column for both sets of join conditions is the cust_id column. The
+views workspace is used for this query so that custview can be accessed. The
+hive.orders table is also visible to the query.
+
+However, note that the JSON file is not directly visible from the views
+workspace, so the query specifies the full path to the file:
+
+ dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
+
+
+## What's Next
+
+Go to [Lesson 3: Run Queries on Complex Data Types](/drill/docs/lession-3-run-queries-on-complex-data-types).
+
+
+