You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/02/26 01:31:10 UTC
[06/13] drill git commit: DRILL-2315: Confluence conversion plus fixes
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/sql-ref/nested/001-flatten.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/sql-ref/nested/001-flatten.md b/_docs/drill-docs/sql-ref/nested/001-flatten.md
deleted file mode 100644
index 124db91..0000000
--- a/_docs/drill-docs/sql-ref/nested/001-flatten.md
+++ /dev/null
@@ -1,89 +0,0 @@
----
-title: "FLATTEN Function"
-parent: "Nested Data Functions"
----
-The FLATTEN function is useful for flexible exploration of repeated data.
-FLATTEN separates the elements in a repeated field into individual records. To
-maintain the association between each flattened value and the other fields in
-the record, all of the other columns are copied into each new record. A very
-simple example would turn this data (one record):
-
- {
- "x" : 5,
- "y" : "a string",
- "z" : [ 1,2,3]
- }
-
-into three distinct records:
-
- select flatten(z) from table;
- | x | y | z |
- +-------------+----------------+-----------+
- | 5 | "a string" | 1 |
- | 5 | "a string" | 2 |
- | 5 | "a string" | 3 |
-
-The function takes a single argument, which must be an array (the `z` column
-in this example).
-
-
-
-For a more interesting example, consider the JSON data in the publicly
-available [Yelp](https://www.yelp.com/dataset_challenge/dataset) data set. The
-first query below returns three columns from the
-`yelp_academic_dataset_business.json` file: `name`, `hours`, and `categories`.
-The query is restricted to distinct rows where the name is `z``pizza`. The
-query returns only one row that meets those criteria; however, note that this
-row contains an array of four categories:
-
- 0: jdbc:drill:zk=local> select distinct name, hours, categories
- from dfs.yelp.`yelp_academic_dataset_business.json`
- where name ='zpizza';
- +------------+------------+------------+
- | name | hours | categories |
- +------------+------------+------------+
- | zpizza | {"Tuesday":{"close":"22:00","open":"10:00"},"Friday":{"close":"23:00","open":"10:00"},"Monday":{"close":"22:00","open":"10:00"},"Wednesday":{"close":"22:00","open":"10:00"},"Thursday":{"close":"22:00","open":"10:00"},"Sunday":{"close":"22:00","open":"10:00"},"Saturday":{"close":"23:00","open":"10:00"}} | ["Gluten-Free","Pizza","Vegan","Restaurants"] |
-
-The FLATTEN function can operate on this single row and return multiple rows,
-one for each category:
-
- 0: jdbc:drill:zk=local> select distinct name, flatten(categories) as categories
- from dfs.yelp.`yelp_academic_dataset_business.json`
- where name ='zpizza' order by 2;
- +------------+-------------+
- | name | categories |
- +------------+-------------+
- | zpizza | Gluten-Free |
- | zpizza | Pizza |
- | zpizza | Restaurants |
- | zpizza | Vegan |
- +------------+-------------+
- 4 rows selected (2.797 seconds)
-
-Having used the FLATTEN function to break down arrays into distinct rows, you
-can run queries that do deeper analysis on the flattened result set. For
-example, you can use FLATTEN in a subquery, then apply WHERE clause
-constraints or aggregate functions to the results in the outer query.
-
-The following query uses the same data file as the previous query to flatten
-the categories array, then run a COUNT function on the flattened result:
-
- select celltbl.catl, count(celltbl.catl) catcount
- from (select flatten(categories) catl
- from dfs.yelp.`yelp_academic_dataset_business.json`) celltbl
- group by celltbl.catl
- order by count(celltbl.catl) desc limit 5;
-
- +---------------+------------+
- | catl | catcount |
- +---------------+------------+
- | Restaurants | 14303 |
- | Shopping | 6428 |
- | Food | 5209 |
- | Beauty & Spas | 3421 |
- | Nightlife | 2870 |
- +---------------|------------+
-
-A common use case for FLATTEN is its use in conjunction with the
-[KVGEN](/confluence/display/DRILL/KVGEN+Function) function.
-
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/sql-ref/nested/002-kvgen.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/sql-ref/nested/002-kvgen.md b/_docs/drill-docs/sql-ref/nested/002-kvgen.md
deleted file mode 100644
index a27a781..0000000
--- a/_docs/drill-docs/sql-ref/nested/002-kvgen.md
+++ /dev/null
@@ -1,150 +0,0 @@
----
-title: "KVGEN Function"
-parent: "Nested Data Functions"
----
-KVGEN stands for _key-value generation_. This function is useful when complex
-data files contain arbitrary maps that consist of relatively "unknown" column
-names. Instead of having to specify columns in the map to access the data, you
-can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
-a map with a wide set of columns into an array of key-value pairs.
-
-In turn, you can write analytic queries that return a subset of the generated
-keys or constrain the keys in some way. For example, you can use the
-[FLATTEN](/confluence/display/DRILL/FLATTEN+Function) function to break the
-array down into multiple distinct rows and further query those rows.
-
-
-
-For example, assume that a JSON file contains this data:
-
- {"a": "valA", "b": "valB"}
- {"c": "valC", "d": "valD"}
-
-
-KVGEN would operate on this data to generate:
-
- [{"key": "a", "value": "valA"}, {"key": "b", "value": "valB"}]
- [{"key": "c", "value": "valC"}, {"key": "d", "value": "valD"}]
-
-Applying the [FLATTEN](/confluence/display/DRILL/FLATTEN+Function) function to
-this data would return:
-
- {"key": "a", "value": "valA"}
- {"key": "b", "value": "valB"}
- {"key": "c", "value": "valC"}
- {"key": "d", "value": "valD"}
-
-Assume that a JSON file called `kvgendata.json` includes multiple records that
-look like this one:
-
- {
- "rownum": 1,
- "bigintegercol": {
- "int_1": 1,
- "int_2": 2,
- "int_3": 3
- },
- "varcharcol": {
- "varchar_1": "abc",
- "varchar_2": "def",
- "varchar_3": "xyz"
- },
- "boolcol": {
- "boolean_1": true,
- "boolean_2": false,
- "boolean_3": true
- },
- "float8col": {
- "f8_1": 1.1,
- "f8_2": 2.2
- },
- "complex": [
- {
- "col1": 3
- },
- {
- "col2": 2,
- "col3": 1
- },
- {
- "col1": 7
- }
- ]
- }
-
- {
- "rownum": 3,
- "bigintegercol": {
- "int_1": 1,
- "int_3": 3
- },
- "varcharcol": {
- "varchar_1": "abcde",
- "varchar_2": null,
- "varchar_3": "xyz",
- "varchar_4": "xyz2"
- },
- "boolcol": {
- "boolean_1": true,
- "boolean_2": false
- },
- "float8col": {
- "f8_1": 1.1,
- "f8_3": 6.6
- },
- "complex": [
- {
- "col1": 2,
- "col3": 1
- }
- ]
- }
- ...
-
-
-A SELECT * query against this specific record returns the following row:
-
- 0: jdbc:drill:zk=local> select * from dfs.yelp.`kvgendata.json` where rownum=1;
-
- +------------+---------------+------------+------------+------------+------------+
- | rownum | bigintegercol | varcharcol | boolcol | float8col | complex |
- +------------+---------------+------------+------------+------------+------------+
- | 1 | {"int_1":1,"int_2":2,"int_3":3} | {"varchar_1":"abc","varchar_2":"def","varchar_3":"xyz"} | {"boolean_1":true,"boolean_2":false,"boolean_3":true} | {"f8_1":1.1,"f8_2":2.2} | [{"col1":3},{"col2":2,"col3":1},{"col1":7}] |
- +------------+---------------+------------+------------+------------+------------+
- 1 row selected (0.122 seconds)
-
-You can use the KVGEN function to turn the maps in this data into key-value
-pairs. For example:
-
- 0: jdbc:drill:zk=local> select kvgen(varcharcol) from dfs.yelp.`kvgendata.json`;
- +------------+
- | EXPR$0 |
- +------------+
- | [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"},{"key":"varchar_3","value":"xyz"}] |
- | [{"key":"varchar_1","value":"abcd"}] |
- | [{"key":"varchar_1","value":"abcde"},{"key":"varchar_3","value":"xyz"},{"key":"varchar_4","value":"xyz2"}] |
- | [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"}] |
- +------------+
- 4 rows selected (0.091 seconds)
-
-Now you can apply the FLATTEN function to break out the key-value pairs into
-distinct rows:
-
- 0: jdbc:drill:zk=local> select flatten(kvgen(varcharcol)) from dfs.yelp.`kvgendata.json`;
- +------------+
- | EXPR$0 |
- +------------+
- | {"key":"varchar_1","value":"abc"} |
- | {"key":"varchar_2","value":"def"} |
- | {"key":"varchar_3","value":"xyz"} |
- | {"key":"varchar_1","value":"abcd"} |
- | {"key":"varchar_1","value":"abcde"} |
- | {"key":"varchar_3","value":"xyz"} |
- | {"key":"varchar_4","value":"xyz2"} |
- | {"key":"varchar_1","value":"abc"} |
- | {"key":"varchar_2","value":"def"} |
- +------------+
- 9 rows selected (0.151 seconds)
-
-See the description of [FLATTEN](/confluence/display/DRILL/FLATTEN+Function)
-for an example of a query against the flattened data.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/sql-ref/nested/003-repeated-cnt.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/sql-ref/nested/003-repeated-cnt.md b/_docs/drill-docs/sql-ref/nested/003-repeated-cnt.md
deleted file mode 100644
index a66075c..0000000
--- a/_docs/drill-docs/sql-ref/nested/003-repeated-cnt.md
+++ /dev/null
@@ -1,34 +0,0 @@
----
-title: "REPEATED_COUNT Function"
-parent: "Nested Data Functions"
----
-This function counts the values in an array. The following example returns the
-counts for the `categories` array in the `yelp_academic_dataset_business.json`
-file. The counts are restricted to rows that contain the string `pizza`.
-
- SELECT name, REPEATED_COUNT(categories)
- FROM dfs.yelp.`yelp_academic_dataset_business.json`
- WHERE name LIKE '%pizza%';
-
- +---------------+------------+
- | name | EXPR$1 |
- +---------------+------------+
- | Villapizza | 2 |
- | zpizza | 4 |
- | zpizza | 4 |
- | Luckys pizza | 2 |
- | Zpizza | 2 |
- | S2pizzabar | 4 |
- | Dominos pizza | 5 |
- +---------------+------------+
-
- 7 rows selected (2.03 seconds)
-
-The function requires a single argument, which must be an array. Note that
-this function is not a standard SQL aggregate function and does not require
-the count to be grouped by other columns in the select list (such as `name` in
-this example).
-
-For another example of this function, see the following lesson in the Apache
-Drill Tutorial for Hadoop: [Lesson 3: Run Queries on Complex Data Types](/conf
-luence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types).
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/001-install-sandbox.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/001-install-sandbox.md b/_docs/drill-docs/tutorial/001-install-sandbox.md
deleted file mode 100644
index e63ddd4..0000000
--- a/_docs/drill-docs/tutorial/001-install-sandbox.md
+++ /dev/null
@@ -1,56 +0,0 @@
----
-title: "Installing the Apache Drill Sandbox"
-parent: "Apache Drill Tutorial"
----
-This tutorial uses the MapR Sandbox, which is a Hadoop environment pre-configured with Apache Drill.
-
-To complete the tutorial on the MapR Sandbox with Apache Drill, work through
-the following pages in order:
-
- * [Installing the Apache Drill Sandbox](/confluence/display/DRILL/Installing+the+Apache+Drill+Sandbox)
- * [Getting to Know the Drill Setup](/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup)
- * [Lesson 1: Learn About the Data Set](/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set)
- * [Lesson 2: Run Queries with ANSI SQL](/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL)
- * [Lesson 3: Run Queries on Complex Data Types](/confluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types)
- * [Summary](/confluence/display/DRILL/Summary)
-
-# About Apache Drill
-
-Drill is an Apache open-source SQL query engine for Big Data exploration.
-Drill is designed from the ground up to support high-performance analysis on
-the semi-structured and rapidly evolving data coming from modern Big Data
-applications, while still providing the familiarity and ecosystem of ANSI SQL,
-the industry-standard query language. Drill provides plug-and-play integration
-with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers
-the following key features:
-
- * Low-latency SQL queries
-
- * Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore.
-
- * ANSI SQL
-
- * Nested data support
-
- * Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs)
-
- * BI/SQL tool integration using standard JDBC/ODBC drivers
-
-# MapR Sandbox with Apache Drill
-
-MapR includes Apache Drill as part of the Hadoop distribution. The MapR
-Sandbox with Apache Drill is a fully functional single-node cluster that can
-be used to get an overview on Apache Drill in a Hadoop environment. Business
-and technical analysts, product managers, and developers can use the sandbox
-environment to get a feel for the power and capabilities of Apache Drill by
-performing various types of queries. Once you get a flavor for the technology,
-refer to the [Apache Drill web site](http://incubator.apache.org/drill/) and
-[Apache Drill documentation
-](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki)for more
-details.
-
-Note that Hadoop is not a prerequisite for Drill and users can start ramping
-up with Drill by running SQL queries directly on the local file system. Refer
-to [Apache Drill in 10 minutes](https://cwiki.apache.org/confluence/display/DR
-ILL/Apache+Drill+in+10+Minutes) for an introduction to using Drill in local
-(embedded) mode.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/002-get2kno-sb.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/002-get2kno-sb.md b/_docs/drill-docs/tutorial/002-get2kno-sb.md
deleted file mode 100644
index e7b24a8..0000000
--- a/_docs/drill-docs/tutorial/002-get2kno-sb.md
+++ /dev/null
@@ -1,235 +0,0 @@
----
-title: "Getting to Know the Drill Sandbox"
-parent: "Apache Drill Tutorial"
----
-This section describes the configuration of the Apache Drill system that you
-have installed and introduces the overall use case for the tutorial.
-
-# Storage Plugins Overview
-
-The Hadoop cluster within the sandbox is set up with MapR-FS, MapR-DB, and
-Hive, which all serve as data sources for Drill in this tutorial. Before you
-can run queries against these data sources, Drill requires each one to be
-configured as a storage plugin. A storage plugin defines the abstraction on
-the data sources for Drill to talk to and provides interfaces to read/write
-and get metadata from the data source. Each storage plugin also exposes
-optimization rules for Drill to leverage for efficient query execution.
-
-Take a look at the pre-configured storage plugins by opening the Drill Web UI.
-
-Feel free to skip this section and jump directly to the queries: [Lesson 1:
-Learn About the Data
-Set](/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set)
-
- * Launch a web browser and go to: `http://<IP address of the sandbox>:8047`
- * Go to the Storage tab
- * Open the configured storage plugins one at a time by clicking Update
- * You will see the following plugins configured.
-
-## maprdb
-
-A storage plugin configuration for MapR-DB in the sandbox. Drill uses a single
-storage plugin for connecting to HBase as well as MapR-DB, which is an
-enterprise grade in-Hadoop NoSQL database. See the [Apache Drill
-Wiki](https://cwiki.apache.org/confluence/display/DRILL/Registering+HBase) for
-information on how to configure Drill to query HBase.
-
- {
- "type" : "hbase",
- "enabled" : true,
- "config" : {
- "hbase.table.namespace.mappings" : "*:/tables"
- }
- }
-
-## dfs
-
-This is a storage plugin configuration for the MapR file system (MapR-FS) in
-the sandbox. The connection attribute indicates the type of distributed file
-system: in this case, MapR-FS. Drill can work with any distributed system,
-including HDFS, S3, and so on.
-
-The configuration also includes a set of workspaces; each one represents a
-location in MapR-FS:
-
- * root: access to the root file system location
- * clicks: access to nested JSON log data
- * logs: access to flat (non-nested) JSON log data in the logs directory and its subdirectories
- * views: a workspace for creating views
-
-A workspace in Drill is a location where users can easily access a specific
-set of data and collaborate with each other by sharing artifacts. Users can
-create as many workspaces as they need within Drill.
-
-Each workspace can also be configured as “writable” or not, which indicates
-whether users can write data to this location and defines the storage format
-in which the data will be written (parquet, csv, json). These attributes
-become relevant when you explore Drill SQL commands, especially CREATE TABLE
-AS (CTAS) and CREATE VIEW.
-
-Drill can query files and directories directly and can detect the file formats
-based on the file extension or the first few bits of data within the file.
-However, additional information around formats is required for Drill, such as
-delimiters for text files, which are specified in the “formats” section below.
-
- {
- "type": "file",
- "enabled": true,
- "connection": "maprfs:///",
- "workspaces": {
- "root": {
- "location": "/mapr/demo.mapr.com/data",
- "writable": false,
- "storageformat": null
- },
- "clicks": {
- "location": "/mapr/demo.mapr.com/data/nested",
- "writable": true,
- "storageformat": "parquet"
- },
- "logs": {
- "location": "/mapr/demo.mapr.com/data/flat",
- "writable": true,
- "storageformat": "parquet"
- },
- "views": {
- "location": "/mapr/demo.mapr.com/data/views",
- "writable": true,
- "storageformat": "parquet"
- },
- "formats": {
- "psv": {
- "type": "text",
- "extensions": [
- "tbl"
- ],
- "delimiter": "|"
- },
- "csv": {
- "type": "text",
- "extensions": [
- "csv"
- ],
- "delimiter": ","
- },
- "tsv": {
- "type": "text",
- "extensions": [
- "tsv"
- ],
- "delimiter": "\t"
- },
- "parquet": {
- "type": "parquet"
- },
- "json": {
- "type": "json"
- }
- }}
-
-## hive
-
-A storage plugin configuration for a Hive data warehouse within the sandbox.
-Drill connects to the Hive metastore by using the configured metastore thrift
-URI. Metadata for Hive tables is automatically available for users to query.
-
- {
- "type": "hive",
- "enabled": true,
- "configProps": {
- "hive.metastore.uris": "thrift://localhost:9083",
- "hive.metastore.sasl.enabled": "false"
- }
- }
-
-# Client Application Interfaces
-
-Drill also provides additional application interfaces for the client tools to
-connect and access from Drill. The interfaces include the following.
-
-### ODBC/JDBC drivers
-
-Drill provides ODBC/JDBC drivers to connect from BI tools such as Tableau,
-MicroStrategy, SQUirrel, and Jaspersoft; refer to [Using ODBC to Access Apache
-Drill from BI Tools](http://doc.mapr.com/display/MapR/Using+ODBC+to+Access+Apa
-che+Drill+from+BI+Tools) and [Using JDBC to Access Apache Drill](http://doc.ma
-pr.com/display/MapR/Using+JDBC+to+Access+Apache+Drill+from+SQuirreL) to learn
-more.
-
-### SQLLine
-
-SQLLine is a JDBC application that comes packaged with Drill. In order to
-start working with it, you can use the command line on the demo cluster to log
-in as root, then enter `sqlline`. Use `mapr` as the login password. For
-example:
-
- $ ssh root@localhost -p 2222
- Password:
- Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
- Welcome to your Mapr Demo virtual machine.
- [root@maprdemo ~]# sqlline
- sqlline version 1.1.6
- 0: jdbc:drill:>
-
-### Drill Web UI
-
-The Drill Web UI is a simple user interface for configuring and manage Apache
-Drill. This UI can be launched from any of the nodes in the Drill cluster. The
-configuration for Drill includes setting up storage plugins that represent the
-data sources on which Drill performs queries. The sandbox comes with storage
-plugins configured for the Hive, HBase, MapR file system, and local file
-system.
-
-Users and developers can get the necessary information for tuning and
-performing diagnostics on queries, such as the list of queries executed in a
-session and detailed query plan profiles for each.
-
-Detailed configuration and management of Drill is out of scope for this
-tutorial.
-
-The Web interface for Apache Drill also provides a query UI where users can
-submit queries to Drill and observe results. Here is a screen shot of the Web
-UI for Apache Drill:
-
-![](../../img/DrillWebUI.png)
-
-### REST API
-
-Drill provides a simple REST API for the users to query data as well as manage
-the system. The Web UI leverages the REST API to talk to Drill.
-
-This tutorial introduces sample queries that you can run by using SQLLine.
-Note that you can run the queries just as easily by launching the Drill Web
-UI. No additional installation or configuration is required.
-
-# Use Case Overview
-
-As you run through the queries in this tutorial, put yourself in the shoes of
-an analyst with basic SQL skills. Let us imagine that the analyst works for an
-emerging online retail business that accepts purchases from its customers
-through both an established web-based interface and a new mobile application.
-
-The analyst is data-driven and operates mostly on the business side with
-little or no interaction with the IT department. Recently the central IT team
-has implemented a Hadoop-based infrastructure to reduce the cost of the legacy
-database system, and most of the DWH/ETL workload is now handled by
-Hadoop/Hive. The master customer profile information and product catalog are
-managed in MapR-DB, which is a NoSQL database. The IT team has also started
-acquiring clickstream data that comes from web and mobile applications. This
-data is stored in Hadoop as JSON files.
-
-The analyst has a number of data sources that he could explore, but exploring
-them in isolation is not the way to go. There are some potentially very
-interesting analytical connections between these data sources. For example, it
-would be good to be able to analyze customer records in the clickstream data
-and tie them to the master customer data in MapR DB.
-
-The analyst decides to explore various data sources and he chooses to do that
-by using Apache Drill. Think about the flexibility and analytic capability of
-Apache Drill as you work through the tutorial.
-
-# What's Next
-
-Start running queries by going to [Lesson 1: Learn About the Data
-Set](/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set).
-
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/003-lesson1.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/003-lesson1.md b/_docs/drill-docs/tutorial/003-lesson1.md
deleted file mode 100644
index 8f3465f..0000000
--- a/_docs/drill-docs/tutorial/003-lesson1.md
+++ /dev/null
@@ -1,423 +0,0 @@
----
-title: "Lession 1: Learn about the Data Set"
-parent: "Apache Drill Tutorial"
----
-## Goal
-
-This lesson is simply about discovering what data is available, in what
-format, using simple SQL SELECT statements. Drill is capable of analyzing data
-without prior knowledge or definition of its schema. This means that you can
-start querying data immediately (and even as it changes), regardless of its
-format.
-
-The data set for the tutorial consists of:
-
- * Transactional data: stored as a Hive table
-
- * Product catalog and master customer data: stored as MapR-DB tables
-
- * Clickstream and logs data: stored in the MapR file system as JSON files
-
-## Queries in This Lesson
-
-This lesson consists of select * queries on each data source.
-
-## Before You Begin
-
-### Start sqlline
-
-If sqlline is not already started, use a Terminal or Command window to log
-into the demo VM as root, then enter `sqlline`:
-
- $ ssh root@10.250.0.6
- Password:
- Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
- Welcome to your Mapr Demo virtual machine.
- [root@maprdemo ~]# sqlline
- sqlline version 1.1.6
- 0: jdbc:drill:>
-
-You can run queries from this prompt to complete the tutorial. To exit from
-`sqlline`, type:
-
- 0: jdbc:drill:> !quit
-
-Note that though this tutorial demonstrates the queries using SQLLine, you can
-also execute queries using the Drill Web UI.
-
-### List the available workspaces and databases:
-
- 0: jdbc:drill:> show databases;
- +-------------+
- | SCHEMA_NAME |
- +-------------+
- | hive.default |
- | dfs.default |
- | dfs.logs |
- | dfs.root |
- | dfs.views |
- | dfs.clicks |
- | dfs.data |
- | dfs.tmp |
- | sys |
- | maprdb |
- | cp.default |
- | INFORMATION_SCHEMA |
- +-------------+
- 12 rows selected
-
-Note that this command exposes all the metadata available from the storage
-plugins configured with Drill as a set of schemas. This includes the Hive and
-MapR-DB databases as well as the workspaces configured in the file system. As
-you run queries in the tutorial, you will switch among these schemas by
-submitting the USE command. This behavior resembles the ability to use
-different database schemas (namespaces) in a relational database system.
-
-## Query Hive Tables
-
-The orders table is a six-column Hive table defined in the Hive metastore.
-This is a Hive external table pointing to the data stored in flat files on the
-MapR file system. The orders table contains 122,000 rows.
-
-### Set the schema to hive:
-
- 0: jdbc:drill:> use hive;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'hive' |
- +------------+------------+
-
-You will run the USE command throughout this tutorial. The USE command sets
-the schema for the current session.
-
-### Describe the table:
-
-You can use the DESCRIBE command to show the columns and data types for a Hive
-table:
-
- 0: jdbc:drill:> describe orders;
- +-------------+------------+-------------+
- | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
- +-------------+------------+-------------+
- | order_id | BIGINT | YES |
- | month | VARCHAR | YES |
- | cust_id | BIGINT | YES |
- | state | VARCHAR | YES |
- | prod_id | BIGINT | YES |
- | order_total | INTEGER | YES |
- +-------------+------------+-------------+
-
-The DESCRIBE command returns complete schema information for Hive tables based
-on the metadata available in the Hive metastore.
-
-### Select 5 rows from the orders table:
-
- 0: jdbc:drill:> select * from orders limit 5;
- +------------+------------+------------+------------+------------+-------------+
- | order_id | month | cust_id | state | prod_id | order_total |
- +------------+------------+------------+------------+------------+-------------+
- | 67212 | June | 10001 | ca | 909 | 13 |
- | 70302 | June | 10004 | ga | 420 | 11 |
- | 69090 | June | 10011 | fl | 44 | 76 |
- | 68834 | June | 10012 | ar | 0 | 81 |
- | 71220 | June | 10018 | az | 411 | 24 |
- +------------+------------+------------+------------+------------+-------------+
-
-Because orders is a Hive table, you can query the data in the same way that
-you would query the columns in a relational database table. Note the use of
-the standard LIMIT clause, which limits the result set to the specified number
-of rows. You can use LIMIT with or without an ORDER BY clause.
-
-Drill provides seamless integration with Hive by allowing queries on Hive
-tables defined in the metastore with no extra configuration. Note that Hive is
-not a prerequisite for Drill, but simply serves as a storage plugin or data
-source for Drill. Drill also lets users query all Hive file formats (including
-custom serdes). Additionally, any UDFs defined in Hive can be leveraged as
-part of Drill queries.
-
-Because Drill has its own low-latency SQL query execution engine, you can
-query Hive tables with high performance and support for interactive and ad-hoc
-data exploration.
-
-## Query MapR-DB and HBase Tables
-
-The customers and products tables are MapR-DB tables. MapR-DB is an enterprise
-in-Hadoop NoSQL database. It exposes the HBase API to support application
-development. Every MapR-DB table has a row_key, in addition to one or more
-column families. Each column family contains one or more specific columns. The
-row_key value is a primary key that uniquely identifies each row.
-
-Drill allows direct queries on MapR-DB and HBase tables. Unlike other SQL on
-Hadoop options, Drill requires no overlay schema definitions in Hive to work
-with this data. Think about a MapR-DB or HBase table with thousands of
-columns, such as a time-series database, and the pain of having to manage
-duplicate schemas for it in Hive!
-
-### Products Table
-
-The products table has two column families.
-
-Column Family|Columns
-
----|---
-
-details
-
-|
-
-name
-
-category
-
-pricing
-
-|
-
-price
-
-The products table contains 965 rows.
-
-### Customers Table
-
-The Customers table has three column families.
-
-Column Family|Columns
--------------|-------
- address | state
- loyalty | agg_rev
- | membership
- personal | age
- | gender
-
-The customers table contains 993 rows.
-
-### Set the workspace to maprdb:
-
- 0: jdbc:drill:> use maprdb;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'maprdb' |
- +------------+------------+
-
-### Describe the tables:
-
- 0: jdbc:drill:> describe customers;
- +-------------+------------+-------------+
- | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
- +-------------+------------+-------------+
- | row_key | ANY | NO |
- | address | (VARCHAR(1), ANY) MAP | NO |
- | loyalty | (VARCHAR(1), ANY) MAP | NO |
- | personal | (VARCHAR(1), ANY) MAP | NO |
- +-------------+------------+-------------+
-
- 0: jdbc:drill:> describe products;
- +-------------+------------+-------------+
- | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
- +-------------+------------+-------------+
- | row_key | ANY | NO |
- | details | (VARCHAR(1), ANY) MAP | NO |
- | pricing | (VARCHAR(1), ANY) MAP | NO |
- +-------------+------------+-------------+
-
-Unlike the Hive example, the DESCRIBE command does not return the full schema
-up to the column level. Wide-column NoSQL databases such as MapR-DB and HBase
-can be schema-less by design; every row has its own set of column name-value
-pairs in a given column family, and the column value can be of any data type,
-as determined by the application inserting the data.
-
-A “MAP” complex type in Drill represents this variable column name-value
-structure, and “ANY” represents the fact that the column value can be of any
-data type. Observe the row_key, which is also simply bytes and has the type
-ANY.
-
-### Select 5 rows from the products table:
-
- 0: jdbc:drill:> select * from products limit 5;
- +------------+------------+------------+
- | row_key | details | pricing |
- +------------+------------+------------+
- | [B@a1a3e25 | {"category":"bGFwdG9w","name":"IlNvbnkgbm90ZWJvb2si"} | {"price":"OTU5"} |
- | [B@103a43af | {"category":"RW52ZWxvcGVz","name":"IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw=="} | {"price":"MT |
- | [B@61319e7b | {"category":"U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==","name":"MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs"} | {"price" |
- | [B@9bcf17 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk4"} | {"price":"Mw=="} |
- | [B@7538ef50 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk="} | {"price":"Mw=="} |
-
-Given that Drill requires no up front schema definitions indicating data
-types, the query returns the raw byte arrays for column values, just as they
-are stored in MapR-DB (or HBase). Observe that the column families (details
-and pricing) have the map data type and appear as JSON strings.
-
-In Lesson 2, you will use CAST functions to return typed data for each column.
-
-### Select 5 rows from the customers table:
-
-
- +0: jdbc:drill:> select * from customers limit 5;
- +------------+------------+------------+------------+
- | row_key | address | loyalty | personal |
- +------------+------------+------------+------------+
- | [B@284bae62 | {"state":"Imt5Ig=="} | {"agg_rev":"IjEwMDEtMzAwMCI=","membership":"ImJhc2ljIg=="} | {"age":"IjI2LTM1Ig==","gender":"Ik1B |
- | [B@7ffa4523 | {"state":"ImNhIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImdvbGQi"} | {"age":"IjI2LTM1Ig==","gender":"IkZFTUFMRSI= |
- | [B@7d13e79 | {"state":"Im9rIg=="} | {"agg_rev":"IjUwMS0xMDAwIg==","membership":"InNpbHZlciI="} | {"age":"IjI2LTM1Ig==","gender":"IkZFT |
- | [B@3a5c7df1 | {"state":"ImtzIg=="} | {"agg_rev":"IjMwMDEtMTAwMDAwIg==","membership":"ImdvbGQi"} | {"age":"IjUxLTEwMCI=","gender":"IkZF |
- | [B@e507726 | {"state":"Im5qIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImJhc2ljIg=="} | {"age":"IjIxLTI1Ig==","gender":"Ik1BTEUi" |
- +------------+------------+------------+------------+
-
-Again the table returns byte data that needs to be cast to readable data
-types.
-
-## Query the File System
-
-Along with querying a data source with full schemas (such as Hive) and partial
-schemas (such as MapR-DB and HBase), Drill offers the unique capability to
-perform SQL queries directly on file system. The file system could be a local
-file system, or a distributed file system such as MapR-FS, HDFS, or S3.
-
-In the context of Drill, a file or a directory is considered as synonymous to
-a relational database “table.” Therefore, you can perform SQL operations
-directly on files and directories without the need for up-front schema
-definitions or schema management for any model changes. The schema is
-discovered on the fly based on the query. Drill supports queries on a variety
-of file formats including text, CSV, Parquet, and JSON in the 0.5 release.
-
-In this example, the clickstream data coming from the mobile/web applications
-is in JSON format. The JSON files have the following structure:
-
- {"trans_id":31920,"date":"2014-04-26","time":"12:17:12","user_info":{"cust_id":22526,"device":"IOS5","state":"il"},"trans_info":{"prod_id":[174,2],"purch_flag":"false"}}
- {"trans_id":31026,"date":"2014-04-20","time":"13:50:29","user_info":{"cust_id":16368,"device":"AOS4.2","state":"nc"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
- {"trans_id":33848,"date":"2014-04-10","time":"04:44:42","user_info":{"cust_id":21449,"device":"IOS6","state":"oh"},"trans_info":{"prod_id":[582],"purch_flag":"false"}}
-
-
-The clicks.json and clicks.campaign.json files contain metadata as part of the
-data itself (referred to as “self-describing” data). Also note that the data
-elements are complex, or nested. The initial queries below do not show how to
-unpack the nested data, but they show that easy access to the data requires no
-setup beyond the definition of a workspace.
-
-### Query nested clickstream data
-
-#### Set the workspace to dfs.clicks:
-
- 0: jdbc:drill:> use dfs.clicks;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.clicks' |
- +------------+------------+
-
-In this case, setting the workspace is a mechanism for making queries easier
-to write. When you specify a file system workspace, you can shorten references
-to files in the FROM clause of your queries. Instead of having to provide the
-complete path to a file, you can provide the path relative to a directory
-location specified in the workspace. For example:
-
- "location": "/mapr/demo.mapr.com/data/nested"
-
-Any file or directory that you want to query in this path can be referenced
-relative to this path. The clicks directory referred to in the following query
-is directly below the nested directory.
-
-#### Select 2 rows from the clicks.json file:
-
- 0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
- +------------+------------+------------+------------+------------+
- | trans_id | date | time | user_info | trans_info |
- +------------+------------+------------+------------+------------+
- | 31920 | 2014-04-26 | 12:17:12 | {"cust_id":22526,"device":"IOS5","state":"il"} | {"prod_id":[174,2],"purch_flag":"false"} |
- | 31026 | 2014-04-20 | 13:50:29 | {"cust_id":16368,"device":"AOS4.2","state":"nc"} | {"prod_id":[],"purch_flag":"false"} |
- +------------+------------+------------+------------+------------+
- 2 rows selected
-
-Note that the FROM clause reference points to a specific file. Drill expands
-the traditional concept of a “table reference” in a standard SQL FROM clause
-to refer to a file in a local or distributed file system.
-
-The only special requirement is the use of back ticks to enclose the file
-path. This is necessary whenever the file path contains Drill reserved words
-or characters.
-
-#### Select 2 rows from the campaign.json file:
-
- 0: jdbc:drill:> select * from `clicks/clicks.campaign.json` limit 2;
- +------------+------------+------------+------------+------------+------------+
- | trans_id | date | time | user_info | ad_info | trans_info |
- +------------+------------+------------+------------+------------+------------+
- | 35232 | 2014-05-10 | 00:13:03 | {"cust_id":18520,"device":"AOS4.3","state":"tx"} | {"camp_id":"null"} | {"prod_id":[7,7],"purch_flag":"true"} |
- | 31995 | 2014-05-22 | 16:06:38 | {"cust_id":17182,"device":"IOS6","state":"fl"} | {"camp_id":"null"} | {"prod_id":[],"purch_flag":"false"} |
- +------------+------------+------------+------------+------------+------------+
- 2 rows selected
-
-Notice that with a select * query, any complex data types such as maps and
-arrays return as JSON strings. You will see how to unpack this data using
-various SQL functions and operators in the next lesson.
-
-## Query Logs Data
-
-Unlike the previous example where we performed queries against clicks data in
-one file, logs data is stored as partitioned directories on the file system.
-The logs directory has three subdirectories:
-
- * 2012
-
- * 2013
-
- * 2014
-
-Each of these year directories fans out to a set of numbered month
-directories, and each month directory contains a JSON file with log records
-for that month. The total number of records in all log files is 48000.
-
-The files in the logs directory and its subdirectories are JSON files. There
-are many of these files, but you can use Drill to query them all as a single
-data source, or to query a subset of the files.
-
-#### Set the workspace to dfs.logs:
-
- 0: jdbc:drill:> use dfs.logs;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.logs' |
- +------------+------------+
-
-#### Select 2 rows from the logs directory:
-
- 0: jdbc:drill:> select * from logs limit 2;
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
- | dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_fl |
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
- | 2014 | 8 | 24181 | 08/02/2014 | 09:23:52 | 0 | IOS5 | il | 2 | wait | 128 | false |
- | 2014 | 8 | 24195 | 08/02/2014 | 07:58:19 | 243 | IOS5 | mo | 6 | hmm | 107 | false |
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
-
-Note that this is flat JSON data. The dfs.clicks workspace location property
-points to a directory that contains the logs directory, making the FROM clause
-reference for this query very simple. You do not have to refer to the complete
-directory path on the file system.
-
-The column names dir0 and dir1 are special Drill variables that identify
-subdirectories below the logs directory. In Lesson 3, you will do more complex
-queries that leverage these dynamic variables.
-
-#### Find the total number of rows in the logs directory (all files):
-
- 0: jdbc:drill:> select count(*) from logs;
- +------------+
- | EXPR$0 |
- +------------+
- | 48000 |
- +------------+
-
-This query traverses all of the files in the logs directory and its
-subdirectories to return the total number of rows in those files.
-
-# What's Next
-
-Go to [Lesson 2: Run Queries with ANSI
-SQL](/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL).
-
-
-
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/004-lesson2.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/004-lesson2.md b/_docs/drill-docs/tutorial/004-lesson2.md
deleted file mode 100644
index d9c68d5..0000000
--- a/_docs/drill-docs/tutorial/004-lesson2.md
+++ /dev/null
@@ -1,392 +0,0 @@
----
-title: "Lession 2: Run Queries with ANSI SQL"
-parent: "Apache Drill Tutorial"
----
-## Goal
-
-This lesson shows how to do some standard SQL analysis in Apache Drill: for
-example, summarizing data by using simple aggregate functions and connecting
-data sources by using joins. Note that Apache Drill provides ANSI SQL support,
-not a “SQL-like” interface.
-
-## Queries in This Lesson
-
-Now that you know what the data sources look like in their raw form, using
-select * queries, try running some simple but more useful queries on each data
-source. These queries demonstrate how Drill supports ANSI SQL constructs and
-also how you can combine data from different data sources in a single SELECT
-statement.
-
- * Show an aggregate query on a single file or table. Use GROUP BY, WHERE, HAVING, and ORDER BY clauses.
-
- * Perform joins between Hive, MapR-DB, and file system data sources.
-
- * Use table and column aliases.
-
- * Create a Drill view.
-
-## Aggregation
-
-
-### Set the schema to hive:
-
- 0: jdbc:drill:> use hive;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'hive' |
- +------------+------------+
- 1 row selected
-
-### Return sales totals by month:
-
- 0: jdbc:drill:> select `month`, sum(order_total)
- from orders group by `month` order by 2 desc;
- +------------+------------+
- | month | EXPR$1 |
- +------------+------------+
- | June | 950481 |
- | May | 947796 |
- | March | 836809 |
- | April | 807291 |
- | July | 757395 |
- | October | 676236 |
- | August | 572269 |
- | February | 532901 |
- | September | 373100 |
- | January | 346536 |
- +------------+------------+
-
-Drill supports SQL aggregate functions such as SUM, MAX, AVG, and MIN.
-Standard SQL clauses work in the same way in Drill queries as in relational
-database queries.
-
-Note that back ticks are required for the “month” column only because “month”
-is a reserved word in SQL.
-
-### Return the top 20 sales totals by month and state:
-
- 0: jdbc:drill:> select `month`, state, sum(order_total) as sales from orders group by `month`, state
- order by 3 desc limit 20;
- +------------+------------+------------+
- | month | state | sales |
- +------------+------------+------------+
- | May | ca | 119586 |
- | June | ca | 116322 |
- | April | ca | 101363 |
- | March | ca | 99540 |
- | July | ca | 90285 |
- | October | ca | 80090 |
- | June | tx | 78363 |
- | May | tx | 77247 |
- | March | tx | 73815 |
- | August | ca | 71255 |
- | April | tx | 68385 |
- | July | tx | 63858 |
- | February | ca | 63527 |
- | June | fl | 62199 |
- | June | ny | 62052 |
- | May | fl | 61651 |
- | May | ny | 59369 |
- | October | tx | 55076 |
- | March | fl | 54867 |
- | March | ny | 52101 |
- +------------+------------+------------+
- 20 rows selected
-
-Note the alias for the result of the SUM function. Drill supports column
-aliases and table aliases.
-
-## HAVING Clause
-
-This query uses the HAVING clause to constrain an aggregate result.
-
-### Set the workspace to dfs.clicks
-
- 0: jdbc:drill:> use dfs.clicks;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.clicks' |
- +------------+------------+
- 1 row selected
-
-### Return total number of clicks for devices that indicate high click-throughs:
-
- 0: jdbc:drill:> select t.user_info.device, count(*) from `clicks/clicks.json` t
- group by t.user_info.device
- having count(*) > 1000;
- +------------+------------+
- | EXPR$0 | EXPR$1 |
- +------------+------------+
- | IOS5 | 11814 |
- | AOS4.2 | 5986 |
- | IOS6 | 4464 |
- | IOS7 | 3135 |
- | AOS4.4 | 1562 |
- | AOS4.3 | 3039 |
- +------------+------------+
-
-The aggregate is a count of the records for each different mobile device in
-the clickstream data. Only the activity for the devices that registered more
-than 1000 transactions qualify for the result set.
-
-## UNION Operator
-
-Use the same workspace as before (dfs.clicks).
-
-### Combine clicks activity from before and after the marketing campaign
-
- 0: jdbc:drill:> select t.trans_id transaction, t.user_info.cust_id customer from `clicks/clicks.campaign.json` t
- union all
- select u.trans_id, u.user_info.cust_id from `clicks/clicks.json` u limit 5;
- +-------------+------------+
- | transaction | customer |
- +-------------+------------+
- | 35232 | 18520 |
- | 31995 | 17182 |
- | 35760 | 18228 |
- | 37090 | 17015 |
- | 37838 | 18737 |
- +-------------+------------+
-
-This UNION ALL query returns rows that exist in two files (and includes any
-duplicate rows from those files): `clicks.campaign.json` and `clicks.json`.
-
-## Subqueries
-
-### Set the workspace to hive:
-
- 0: jdbc:drill:> use hive;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'hive' |
- +------------+------------+
-
-### Compare order totals across states:
-
- 0: jdbc:drill:> select o1.cust_id, sum(o1.order_total) as ny_sales,
- (select sum(o2.order_total) from hive.orders o2
- where o1.cust_id=o2.cust_id and state='ca') as ca_sales
- from hive.orders o1 where o1.state='ny' group by o1.cust_id
- order by cust_id limit 20;
- +------------+------------+------------+
- | cust_id | ny_sales | ca_sales |
- +------------+------------+------------+
- | 1001 | 72 | 47 |
- | 1002 | 108 | 198 |
- | 1003 | 83 | null |
- | 1004 | 86 | 210 |
- | 1005 | 168 | 153 |
- | 1006 | 29 | 326 |
- | 1008 | 105 | 168 |
- | 1009 | 443 | 127 |
- | 1010 | 75 | 18 |
- | 1012 | 110 | null |
- | 1013 | 19 | null |
- | 1014 | 106 | 162 |
- | 1015 | 220 | 153 |
- | 1016 | 85 | 159 |
- | 1017 | 82 | 56 |
- | 1019 | 37 | 196 |
- | 1020 | 193 | 165 |
- | 1022 | 124 | null |
- | 1023 | 166 | 149 |
- | 1024 | 233 | null |
- +------------+------------+------------+
-
-This example demonstrates Drill support for correlated subqueries. This query
-uses a subquery in the select list and correlates the result of the subquery
-with the outer query, using the cust_id column reference. The subquery returns
-the sum of order totals for California, and the outer query returns the
-equivalent sum, for the same cust_id, for New York.
-
-The result set is sorted by the cust_id and presents the sales totals side by
-side for easy comparison. Null values indicate customer IDs that did not
-register any sales in that state.
-
-## CAST Function
-
-### Use the maprdb workspace:
-
- 0: jdbc:drill:> use maprdb;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'maprdb' |
- +------------+------------+
- 1 row selected
-
-### Return customer data with appropriate data types
-
- 0: jdbc:drill:> select cast(row_key as int) as cust_id, cast(t.personal.name as varchar(20)) as name,
- cast(t.personal.gender as varchar(10)) as gender, cast(t.personal.age as varchar(10)) as age,
- cast(t.address.state as varchar(4)) as state, cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
- cast(t.loyalty.membership as varchar(20)) as membership
- from customers t limit 5;
- +------------+------------+------------+------------+------------ +------------+------------+
- | cust_id | name | gender | age | state | agg_rev | membership |
- +------------+------------+------------+------------+------------+------------+------------+
- | 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
- | 10005 | "Brittany Park" | "MALE" | "26-35" | "in" | 230.00 | "silver" |
- | 10006 | "Rose Lokey" | "MALE" | "26-35" | "ca" | 250.00 | "silver" |
- | 10007 | "James Fowler" | "FEMALE" | "51-100" | "me" | 263.00 | "silver" |
- | 10010 | "Guillermo Koehler" | "OTHER" | "51-100" | "mn" | 202.00 | "silver" |
- +------------+------------+------------+------------+------------+------------+------------+
- 5 rows selected
-
-Note the following features of this query:
-
- * The CAST function is required for every column in the table. This function returns the MapR-DB/HBase binary data as readable integers and strings. Alternatively, you can use CONVERT_TO/CONVERT_FROM functions to decode the columns. CONVERT_TO and CONVERT_FROM are more efficient than CAST in most cases.
- * The row_key column functions as the primary key of the table (a customer ID in this case).
- * The table alias t is required; otherwise the column family names would be parsed as table names and the query would return an error.
-
-### Remove the quotes from the strings:
-
-You can use the regexp_replace function to remove the quotes around the
-strings in the query results. For example, to return a state name va instead
-of “va”:
-
- 0: jdbc:drill:> select cast(row_key as int), regexp_replace(cast(t.address.state as varchar(10)),'"','')
- from customers t limit 1;
- +------------+------------+
- | EXPR$0 | EXPR$1 |
- +------------+------------+
- | 10001 | va |
- +------------+------------+
- 1 row selected
-
-## CREATE VIEW Command
-
- 0: jdbc:drill:> use dfs.views;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.views' |
- +------------+------------+
-
-### Use a mutable workspace:
-
-A mutable (or writable) workspace is a workspace that is enabled for “write”
-operations. This attribute is part of the storage plugin configuration. You
-can create Drill views and tables in mutable workspaces.
-
-### Create a view on a MapR-DB table
-
- 0: jdbc:drill:> create or replace view custview as select cast(row_key as int) as cust_id,
- cast(t.personal.name as varchar(20)) as name,
- cast(t.personal.gender as varchar(10)) as gender,
- cast(t.personal.age as varchar(10)) as age,
- cast(t.address.state as varchar(4)) as state,
- cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
- cast(t.loyalty.membership as varchar(20)) as membership
- from maprdb.customers t;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | View 'custview' replaced successfully in 'dfs.views' schema |
- +------------+------------+
- 1 row selected
-
-Drill provides CREATE OR REPLACE VIEW syntax similar to relational databases
-to create views. Use the OR REPLACE option to make it easier to update the
-view later without having to remove it first. Note that the FROM clause in
-this example must refer to maprdb.customers. The MapR-DB tables are not
-directly visible to the dfs.views workspace.
-
-Unlike a traditional database where views typically are DBA/developer-driven
-operations, file system-based views in Drill are very lightweight. A view is
-simply a special file with a specific extension (.drill). You can store views
-even in your local file system or point to a specific workspace. You can
-specify any query against any Drill data source in the body of the CREATE VIEW
-statement.
-
-Drill provides a decentralized metadata model. Drill is able to query metadata
-defined in data sources such as Hive, HBase, and the file system. Drill also
-supports the creation of metadata in the file system.
-
-### Query data from the view:
-
- 0: jdbc:drill:> select * from custview limit 1;
- +------------+------------+------------+------------+------------+------------+------------+
- | cust_id | name | gender | age | state | agg_rev | membership |
- +------------+------------+------------+------------+------------+------------+------------+
- | 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
- +------------+------------+------------+------------+------------+------------+------------+
-
-Once the users get an idea on what data is available by exploring it directly
-from file system , views can be used as a way to take the data in downstream
-tools like Tableau, Microstrategy etc for downstream analysis and
-visualization. For these tools, a view appears simply as a “table” with
-selectable “columns” in it.
-
-## Query Across Data Sources
-
-Continue using dfs.views for this query.
-
-### Join the customers view and the orders table:
-
- 0: jdbc:drill:> select membership, sum(order_total) as sales from hive.orders, custview
- where orders.cust_id=custview.cust_id
- group by membership order by 2;
- +------------+------------+
- | membership | sales |
- +------------+------------+
- | "basic" | 380665 |
- | "silver" | 708438 |
- | "gold" | 2787682 |
- +------------+------------+
- 3 rows selected
-
-In this query, we are reading data from a MapR-DB table (represented by
-custview) and combining it with the order information in Hive. When doing
-cross data source queries such as this, you need to use fully qualified
-table/view names. For example, the orders table is prefixed by “hive,” which
-is the storage plugin name registered with Drill. We are not using any prefix
-for “custview” because we explicitly switched the dfs.views workspace where
-custview is stored.
-
-Note: If the results of any of your queries appear to be truncated because the
-rows are wide, set the maximum width of the display to 10000:
-
-Do not use a semicolon for this SET command.
-
-### Join the customers, orders, and clickstream data:
-
- 0: jdbc:drill:> select custview.membership, sum(orders.order_total) as sales from hive.orders, custview,
- dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c
- where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id
- group by custview.membership order by 2;
- +------------+------------+
- | membership | sales |
- +------------+------------+
- | "basic" | 372866 |
- | "silver" | 728424 |
- | "gold" | 7050198 |
- +------------+------------+
- 3 rows selected
-
-This three-way join selects from three different data sources in one query:
-
- * hive.orders table
- * custview (a view of the HBase customers table)
- * clicks.json file
-
-The join column for both sets of join conditions is the cust_id column. The
-views workspace is used for this query so that custview can be accessed. The
-hive.orders table is also visible to the query.
-
-However, note that the JSON file is not directly visible from the views
-workspace, so the query specifies the full path to the file:
-
- dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
-
-
-# What's Next
-
-Go to [Lesson 3: Run Queries on Complex Data Types](/confluence/display/DRILL/
-Lesson+3%3A+Run+Queries+on+Complex+Data+Types).
-
-
-
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/005-lesson3.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/005-lesson3.md b/_docs/drill-docs/tutorial/005-lesson3.md
deleted file mode 100644
index d9b362a..0000000
--- a/_docs/drill-docs/tutorial/005-lesson3.md
+++ /dev/null
@@ -1,379 +0,0 @@
----
-title: "Lession 3: Run Queries on Complex Data Types"
-parent: "Apache Drill Tutorial"
----
-## Goal
-
-This lesson focuses on queries that exercise functions and operators on self-
-describing data and complex data types. Drill offers intuitive SQL extensions
-to work with such data and offers high query performance with an architecture
-built from the ground up for complex data.
-
-## Queries in This Lesson
-
-Now that you have run ANSI SQL queries against different tables and files with
-relational data, you can try some examples including complex types.
-
- * Access directories and subdirectories of files in a single SELECT statement.
- * Demonstrate simple ways to access complex data in JSON files.
- * Demonstrate the repeated_count function to aggregate values in an array.
-
-## Query Partitioned Directories
-
-You can use special variables in Drill to refer to subdirectories in your
-workspace path:
-
- * dir0
- * dir1
- * …
-
-Note that these variables are dynamically determined based on the partitioning
-of the file system. No up-front definitions are required on what partitions
-exist. Here is a visual example of how this works:
-
-![example_query.png](../../img/example_query.png)
-
-### Set workspace to dfs.logs:
-
- 0: jdbc:drill:> use dfs.logs;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.logs' |
- +------------+------------+
-
-### Query logs data for a specific year:
-
- 0: jdbc:drill:> select * from logs where dir0='2013' limit 10;
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+-----------+------------+
- | dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_flag |
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+-----------+------------+
- | 2013 | 11 | 12119 | 11/09/2013 | 02:24:51 | 262 | IOS5 | ny | 0 | chamber | 198 | false |
- | 2013 | 11 | 12120 | 11/19/2013 | 09:37:43 | 0 | AOS4.4 | il | 2 | outside | 511 | false |
- | 2013 | 11 | 12134 | 11/10/2013 | 23:42:47 | 60343 | IOS5 | ma | 4 | and | 421 | false |
- | 2013 | 11 | 12135 | 11/16/2013 | 01:42:13 | 46762 | AOS4.3 | ca | 4 | here's | 349 | false |
- | 2013 | 11 | 12165 | 11/26/2013 | 21:58:09 | 41987 | AOS4.2 | mn | 4 | he | 271 | false |
- | 2013 | 11 | 12168 | 11/09/2013 | 23:41:48 | 8600 | IOS5 | in | 6 | i | 459 | false |
- | 2013 | 11 | 12196 | 11/20/2013 | 02:23:06 | 15603 | IOS5 | tn | 1 | like | 324 | false |
- | 2013 | 11 | 12203 | 11/25/2013 | 23:50:29 | 221 | IOS6 | tx | 10 | if | 323 | false |
- | 2013 | 11 | 12206 | 11/09/2013 | 23:53:01 | 2488 | AOS4.2 | tx | 14 | unlike | 296 | false |
- | 2013 | 11 | 12217 | 11/06/2013 | 23:51:56 | 0 | AOS4.2 | tx | 9 | can't | 54 | false |
- +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
-
-
-This query constrains files inside the subdirectory named 2013. The variable
-dir0 refers to the first level down from logs, dir1 to the next level, and so
-on. So this query returned 10 of the rows for February 2013.
-
-### Further constrain the results using multiple predicates in the query:
-
-This query returns a list of customer IDs for people who made a purchase via
-an IOS5 device in August 2013.
-
- 0: jdbc:drill:> select dir0 as yr, dir1 as mth, cust_id from logs
- where dir0='2013' and dir1='8' and device='IOS5' and purch_flag='true'
- order by `date`;
- +------------+------------+------------+
- | yr | mth | cust_id |
- +------------+------------+------------+
- | 2013 | 8 | 4 |
- | 2013 | 8 | 521 |
- | 2013 | 8 | 1 |
- | 2013 | 8 | 2 |
- | 2013 | 8 | 4 |
- | 2013 | 8 | 549 |
- | 2013 | 8 | 72827 |
- | 2013 | 8 | 38127 |
- ...
-
-### Return monthly counts per customer for a given year:
-
- 0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs
- where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10;
- +------------+------------+-------------+
- | cust_id | month_no | month_count |
- +------------+------------+-------------+
- | 0 | 1 | 143 |
- | 0 | 2 | 118 |
- | 0 | 3 | 117 |
- | 0 | 4 | 115 |
- | 0 | 5 | 137 |
- | 0 | 6 | 117 |
- | 0 | 7 | 142 |
- | 0 | 8 | 19 |
- | 1 | 1 | 66 |
- | 1 | 2 | 59 |
- +------------+------------+-------------+
- 10 rows selected
-
-This query groups the aggregate function by customer ID and month for one
-year: 2014.
-
-## Query Complex Data
-
-Drill provides some specialized operators and functions that you can use to
-analyze nested data natively without transformation. If you are familiar with
-JavaScript notation, you will already know how some of these extensions work.
-
-### Set the workspace to dfs.clicks:
-
- 0: jdbc:drill:> use dfs.clicks;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.clicks' |
- +------------+------------+
-
-### Explore clickstream data:
-
-Note that the user_info and trans_info columns contain nested data: arrays and
-arrays within arrays. The following queries show how to access this complex
-data.
-
- 0: jdbc:drill:> select * from `clicks/clicks.json` limit 5;
- +------------+------------+------------+------------+------------+
- | trans_id | date | time | user_info | trans_info |
- +------------+------------+------------+------------+------------+
- | 31920 | 2014-04-26 | 12:17:12 | {"cust_id":22526,"device":"IOS5","state":"il"} | {"prod_id":[174,2],"purch_flag":"false"} |
- | 31026 | 2014-04-20 | 13:50:29 | {"cust_id":16368,"device":"AOS4.2","state":"nc"} | {"prod_id":[],"purch_flag":"false"} |
- | 33848 | 2014-04-10 | 04:44:42 | {"cust_id":21449,"device":"IOS6","state":"oh"} | {"prod_id":[582],"purch_flag":"false"} |
- | 32383 | 2014-04-18 | 06:27:47 | {"cust_id":20323,"device":"IOS5","state":"oh"} | {"prod_id":[710,47],"purch_flag":"false"} |
- | 32359 | 2014-04-19 | 23:13:25 | {"cust_id":15360,"device":"IOS5","state":"ca"} | {"prod_id": [0,8,170,173,1,124,46,764,30,711,0,3,25],"purch_flag":"true"} |
- +------------+------------+------------+------------+------------+
-
-
-### Unpack the user_info column:
-
- 0: jdbc:drill:> select t.user_info.cust_id as custid, t.user_info.device as device,
- t.user_info.state as state
- from `clicks/clicks.json` t limit 5;
- +------------+------------+------------+
- | custid | device | state |
- +------------+------------+------------+
- | 22526 | IOS5 | il |
- | 16368 | AOS4.2 | nc |
- | 21449 | IOS6 | oh |
- | 20323 | IOS5 | oh |
- | 15360 | IOS5 | ca |
- +------------+------------+------------+
-
-This query uses a simple table.column.column notation to extract nested column
-data. For example:
-
- t.user_info.cust_id
-
-where `t` is the table alias provided in the query, `user_info` is a top-level
-column name, and `cust_id` is a nested column name.
-
-The table alias is required; otherwise column names such as `user_info` are
-parsed as table names by the SQL parser.
-
-### Unpack the trans_info column:
-
- 0: jdbc:drill:> select t.trans_info.prod_id as prodid, t.trans_info.purch_flag as
- purchased
- from `clicks/clicks.json` t limit 5;
- +------------+------------+
- | prodid | purchased |
- +------------+------------+
- | [174,2] | false |
- | [] | false |
- | [582] | false |
- | [710,47] | false |
- | [0,8,170,173,1,124,46,764,30,711,0,3,25] | true |
- +------------+------------+
- 5 rows selected
-
-Note that this result reveals that the prod_id column contains an array of IDs
-(one or more product ID values per row, separated by commas). The next step
-shows how you to access this kind of data.
-
-## Query Arrays
-
-Now use the [ n ] notation, where n is the position of the value in an array,
-starting from position 0 (not 1) for the first value. You can use this
-notation to write interesting queries against nested array data.
-
-For example:
-
- trans_info.prod_id[0]
-
-refers to the first value in the nested prod_id column and
-
- trans_info.prod_id[20]
-
-refers to the 21st value, assuming one exists.
-
-### Find the first product that is searched for in each transaction:
-
- 0: jdbc:drill:> select t.trans_id, t.trans_info.prod_id[0] from `clicks/clicks.json` t limit 5;
- +------------+------------+
- | trans_id | EXPR$1 |
- +------------+------------+
- | 31920 | 174 |
- | 31026 | null |
- | 33848 | 582 |
- | 32383 | 710 |
- | 32359 | 0 |
- +------------+------------+
- 5 rows selected
-
-### For which transactions did customers search on at least 21 products?
-
- 0: jdbc:drill:> select t.trans_id, t.trans_info.prod_id[20]
- from `clicks/clicks.json` t
- where t.trans_info.prod_id[20] is not null
- order by trans_id limit 5;
- +------------+------------+
- | trans_id | EXPR$1 |
- +------------+------------+
- | 10328 | 0 |
- | 10380 | 23 |
- | 10701 | 1 |
- | 11100 | 0 |
- | 11219 | 46 |
- +------------+------------+
- 5 rows selected
-
-This query returns transaction IDs and product IDs for records that contain a
-non-null product ID at the 21st position in the array.
-
-### Return clicks for a specific product range:
-
- 0: jdbc:drill:> select * from (select t.trans_id, t.trans_info.prod_id[0] as prodid,
- t.trans_info.purch_flag as purchased
- from `clicks/clicks.json` t) sq
- where sq.prodid between 700 and 750 and sq.purchased='true'
- order by sq.prodid;
- +------------+------------+------------+
- | trans_id | prodid | purchased |
- +------------+------------+------------+
- | 21886 | 704 | true |
- | 20674 | 708 | true |
- | 22158 | 709 | true |
- | 34089 | 714 | true |
- | 22545 | 714 | true |
- | 37500 | 717 | true |
- | 36595 | 718 | true |
- ...
-
-This query assumes that there is some meaning to the array (that it is an
-ordered list of products purchased rather than a random list).
-
-## Perform Operations on Arrays
-
-### Rank successful click conversions and count product searches for each session:
-
- 0: jdbc:drill:> select t.trans_id, t.`date` as session_date, t.user_info.cust_id as
- cust_id, t.user_info.device as device, repeated_count(t.trans_info.prod_id) as
- prod_count, t.trans_info.purch_flag as purch_flag
- from `clicks/clicks.json` t
- where t.trans_info.purch_flag = 'true' order by prod_count desc;
- +------------+--------------+------------+------------+------------+------------+
- | trans_id | session_date | cust_id | device | prod_count | purch_flag |
- +------------+--------------+------------+------------+------------+------------+
- | 37426 | 2014-04-06 | 18709 | IOS5 | 34 | true |
- | 31589 | 2014-04-16 | 18576 | IOS6 | 31 | true |
- | 11600 | 2014-04-07 | 4260 | AOS4.2 | 28 | true |
- | 35074 | 2014-04-03 | 16697 | AOS4.3 | 27 | true |
- | 17192 | 2014-04-22 | 2501 | AOS4.2 | 26 | true |
- ...
-
-This query uses a Drill SQL extension, the repeated_count function, to get an
-aggregated count of the array values. The query returns the number of products
-searched for each session that converted into a purchase and ranks the counts
-in descending order. Only clicks that have resulted in a purchase are counted.
-
-## Store a Result Set in a Table for Reuse and Analysis
-
-Finally, run another correlated subquery that returns a fairly large result
-set. To facilitate additional analysis on this result set, you can easily and
-quickly create a Drill table from the results of the query.
-
-### Continue to use the dfs.clicks workspace
-
- 0: jdbc:drill:> use dfs.clicks;
- +------------+------------+
- | ok | summary |
- +------------+------------+
- | true | Default schema changed to 'dfs.clicks' |
- +------------+------------+
-
-### Return product searches for high-value customers:
-
- 0: jdbc:drill:> select o.cust_id, o.order_total, t.trans_info.prod_id[0] as prod_id
- from hive.orders as o, `clicks/clicks.json` t
- where o.cust_id=t.user_info.cust_id
- and o.order_total > (select avg(inord.order_total)
- from hive.orders inord where inord.state = o.state);
- +------------+-------------+------------+
- | cust_id | order_total | prod_id |
- +------------+-------------+------------+
- ...
- | 9650 | 69 | 16 |
- | 9650 | 69 | 560 |
- | 9650 | 69 | 959 |
- | 9654 | 76 | 768 |
- | 9656 | 76 | 32 |
- | 9656 | 76 | 16 |
- ...
- +------------+-------------+------------+
- 106,281 rows selected
-
-This query returns a list of products that are being searched for by customers
-who have made transactions that are above the average in their states.
-
-### Materialize the result of the previous query:
-
- 0: jdbc:drill:> create table product_search as select o.cust_id, o.order_total, t.trans_info.prod_id[0] as prod_id
- from hive.orders as o, `clicks/clicks.json` t
- where o.cust_id=t.user_info.cust_id and o.order_total > (select avg(inord.order_total)
- from hive.orders inord where inord.state = o.state);
- +------------+---------------------------+
- | Fragment | Number of records written |
- +------------+---------------------------+
- | 0_0 | 106281 |
- +------------+---------------------------+
- 1 row selected
-
-This example uses a CTAS statement to create a table based on a correlated
-subquery that you ran previously. This table contains all of the rows that the
-query returns (106,281) and stores them in the format specified by the storage
-plugin (Parquet format in this example). You can create tables that store data
-in csv, parquet, and json formats.
-
-### Query the new table to verify the row count:
-
-This example simply checks that the CTAS statement worked by verifying the
-number of rows in the table.
-
- 0: jdbc:drill:> select count(*) from product_search;
- +------------+
- | EXPR$0 |
- +------------+
- | 106281 |
- +------------+
- 1 row selected
-
-### Find the storage file for the table:
-
- [root@maprdemo product_search]# cd /mapr/demo.mapr.com/data/nested/product_search
- [root@maprdemo product_search]# ls -la
- total 451
- drwxr-xr-x. 2 mapr mapr 1 Sep 15 13:41 .
- drwxr-xr-x. 4 root root 2 Sep 15 13:41 ..
- -rwxr-xr-x. 1 mapr mapr 460715 Sep 15 13:41 0_0_0.parquet
-
-Note that the table is stored in a file called `0_0_0.parquet`. This file is
-stored in the location defined by the dfs.clicks workspace:
-
- "location": "http://demo.mapr.com/data/nested)"
-
-with a subdirectory that has the same name as the table you created.
-
-## What's Next
-
-Complete the tutorial with the [Summary](/confluence/display/DRILL/Summary).
-
-
-
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/006-summary.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/006-summary.md b/_docs/drill-docs/tutorial/006-summary.md
deleted file mode 100644
index f210766..0000000
--- a/_docs/drill-docs/tutorial/006-summary.md
+++ /dev/null
@@ -1,14 +0,0 @@
----
-title: "Summary"
-parent: "Apache Drill Tutorial"
----
-This tutorial introduced Apache Drill and its ability to run ANSI SQL queries
-against various data sources, including Hive tables, MapR-DB/HBase tables, and
-file system directories. The tutorial also showed how to work with and
-manipulate complex and multi-structured data commonly found in Hadoop/NoSQL
-systems.
-
-Now that you are familiar with different ways to access the sample data with
-Drill, you can try writing your own queries against your own data sources.
-Refer to the [Apache Drill documentation](https://cwiki.apache.org/confluence/
-display/DRILL/Apache+Drill+Wiki) for more information.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/install-sandbox/001-install-mapr-vm.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/install-sandbox/001-install-mapr-vm.md b/_docs/drill-docs/tutorial/install-sandbox/001-install-mapr-vm.md
deleted file mode 100644
index f3d8953..0000000
--- a/_docs/drill-docs/tutorial/install-sandbox/001-install-mapr-vm.md
+++ /dev/null
@@ -1,55 +0,0 @@
----
-title: "Installing the MapR Sandbox with Apache Drill on VMware Player/VMware Fusion"
-parent: "Installing the Apache Drill Sandbox"
----
-Complete the following steps to install the MapR Sandbox with Apache Drill on
-VMware Player or VMware Fusion:
-
- 1. Download the MapR Sandbox with Drill file to a directory on your machine:
-<https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill>
-
- 2. Open the virtual machine player, and select the **Open a Virtual Machine **option.
-
- Tip for VMware Fusion
-
-If you are running VMware Fusion, select** Import**.
-
-![](../../../img/vmWelcome.png)
-
- 3. Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill file, and select `MapR-Sandbox-For-Apache-Drill-4.0.1_VM.ova`.
-
-![](../../../img/vmShare.png)
-
-The Import Virtual Machine dialog appears.
-
- 4. Click **Import**. The virtual machine player imports the sandbox.
-
-![](../../../img/vmLibrary.png)
-
- 5. Select `MapR-Sandbox-For-Apache-Drill-4.0.1_VM`, and click **Play virtual machine**. It takes a few minutes for the MapR services to start.
-After the MapR services start and installation completes, the following screen
-appears:
-
-![](../../../img/loginSandbox.png)
-
-Note the URL provided in the screen, which corresponds to the Web UI in Apache
-Drill.
-
- 6. Verify that a DNS entry was created on the host machine for the virtual machine. If not, create the entry.
-
- * For Linux and Mac, create the entry in `/etc/hosts`.
-
- * For WIndows, create the entry in the `%WINDIR%\system32\drivers\etc\hosts` file.
-Example: `127.0.1.1 <vm_hostname>`
-
- 7. You can navigate to the URL provided to experience Drill Web UI or you can login to the sandbox through the command line.
-
- a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser's address bar.
-
- b. To login to the virtual machine and access the command line, press Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter `mapr` as the login and password.
-
-# What's Next
-
-After downloading and installing the sandbox, continue with the tutorial by
-[Getting to Know the Drill
-Setup](/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup).
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/tutorial/install-sandbox/002-install-mapr-vb.md
----------------------------------------------------------------------
diff --git a/_docs/drill-docs/tutorial/install-sandbox/002-install-mapr-vb.md b/_docs/drill-docs/tutorial/install-sandbox/002-install-mapr-vb.md
deleted file mode 100644
index 9ff26d5..0000000
--- a/_docs/drill-docs/tutorial/install-sandbox/002-install-mapr-vb.md
+++ /dev/null
@@ -1,72 +0,0 @@
----
-title: "Installing the MapR Sandbox with Apache Drill on VirtualBox"
-parent: "Installing the Apache Drill Sandbox"
----
-The MapR Sandbox for Apache Drill on VirtualBox comes with NAT port forwarding
-enabled, which allows you to access the sandbox using localhost as hostname.
-
-Complete the following steps to install the MapR Sandbox with Apache Drill on
-VirtualBox:
-
- 1. Download the MapR Sandbox with Apache Drill file to a directory on your machine:
-<https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill>
-
- 2. Open the virtual machine player.
-
- 3. Select **File > Import Appliance**. The Import Virtual Appliance dialog appears.
-
- ![](../../../img/vbImport.png)
-
- 4. Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill and click **Next**. The Appliance Settings window appears.
-
- ![](../../../img/vbapplSettings.png)
-
- 5. Select the check box at the bottom of the screen: **Reinitialize the MAC address of all network cards**, then click **Import**. The Import Appliance imports the sandbox.
-
- 6. When the import completes, select **File > Preferences**. The VirtualBox - Settings dialog appears.
-
- ![](../../../img/vbNetwork.png)
-
- 7. Select **Network**.
-
- The correct setting depends on your network connectivity when you run the
-Sandbox. In general, if you are going to use a wired Ethernet connection,
-select **NAT Networks **and **vboxnet0**. If you are going to use a wireless
-network, select **Host-only Networks** and the **VirtualBox Host-Only Ethernet
-Adapter**. If no adapters appear, click the green** +** button to add the
-VirtualBox adapter.
-
- ![](../../../img/vbMaprSetting.png)
-
- 8. Click **OK **to continue.
-
- 9. Click ![](https://lh5.googleusercontent.com/6TjVEW28MJhPud2Nc2ButYB_GDqKTnadaluSulg0Zb259MgN1IRCgIlo-kMAEJ7lGWHf2aqc-nIjUsUFlaXP-LceAIKE5owNqXUWxXS0WXcBLWzUqg5X1VIXXswajb6oWA). The MapR-Sandbox-For-Apache-Drill-0.6.0-r2-4.0.1 - Settings dialog appears.
-
- ![](../../../img/vbGenSettings.png)
-
- 10. Click **OK** to continue.
-
- 11. Click **Start**. It takes a few minutes for the MapR services to start. After the MapR services start and installation completes, the following screen appears:
-
- ![](../../../img/vbloginSandbox.png)
-
- 12. The client must be able to resolve the actual hostname of the Drill node(s) with the IP(s). Verify that a DNS entry was created on the client machine for the Drill node(s).
-If a DNS entry does not exist, create the entry for the Drill node(s).
-
- * For Windows, create the entry in the %WINDIR%\system32\drivers\etc\hosts file.
-
- * For Linux and Mac, create the entry in /etc/hosts.
-<drill-machine-IP> <drill-machine-hostname>
-Example: `127.0.1.1 maprdemo`
-
- 13. You can navigate to the URL provided or to [localhost:8047](http://localhost:8047) to experience the Drill Web UI, or you can log into the sandbox through the command line.
-
- a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser's address bar.
-
- b. To log into the virtual machine and access the command line, enter Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter `mapr` as the login and password.
-
-# What's Next
-
-After downloading and installing the sandbox, continue with the tutorial by
-[Getting to Know the Drill
-Setup](/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup).
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/58.png
----------------------------------------------------------------------
diff --git a/_docs/img/58.png b/_docs/img/58.png
new file mode 100644
index 0000000..b957927
Binary files /dev/null and b/_docs/img/58.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/BI_to_Drill_2.png
----------------------------------------------------------------------
diff --git a/_docs/img/BI_to_Drill_2.png b/_docs/img/BI_to_Drill_2.png
new file mode 100644
index 0000000..a7f32cd
Binary files /dev/null and b/_docs/img/BI_to_Drill_2.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/HbaseViewCreation0.png
----------------------------------------------------------------------
diff --git a/_docs/img/HbaseViewCreation0.png b/_docs/img/HbaseViewCreation0.png
new file mode 100644
index 0000000..0ae4465
Binary files /dev/null and b/_docs/img/HbaseViewCreation0.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/HbaseViewDSN.png
----------------------------------------------------------------------
diff --git a/_docs/img/HbaseViewDSN.png b/_docs/img/HbaseViewDSN.png
new file mode 100644
index 0000000..988e48b
Binary files /dev/null and b/_docs/img/HbaseViewDSN.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/Hbase_Browse.png
----------------------------------------------------------------------
diff --git a/_docs/img/Hbase_Browse.png b/_docs/img/Hbase_Browse.png
new file mode 100644
index 0000000..729e0f8
Binary files /dev/null and b/_docs/img/Hbase_Browse.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/Hive_DSN.png
----------------------------------------------------------------------
diff --git a/_docs/img/Hive_DSN.png b/_docs/img/Hive_DSN.png
new file mode 100644
index 0000000..be49d00
Binary files /dev/null and b/_docs/img/Hive_DSN.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_CustomSQL.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_CustomSQL.png b/_docs/img/ODBC_CustomSQL.png
new file mode 100644
index 0000000..d2c7fb2
Binary files /dev/null and b/_docs/img/ODBC_CustomSQL.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_HbasePreview2.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_HbasePreview2.png b/_docs/img/ODBC_HbasePreview2.png
new file mode 100644
index 0000000..948f268
Binary files /dev/null and b/_docs/img/ODBC_HbasePreview2.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_HbaseView.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_HbaseView.png b/_docs/img/ODBC_HbaseView.png
new file mode 100644
index 0000000..be3bf4f
Binary files /dev/null and b/_docs/img/ODBC_HbaseView.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_HiveConnection.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_HiveConnection.png b/_docs/img/ODBC_HiveConnection.png
new file mode 100644
index 0000000..a86d960
Binary files /dev/null and b/_docs/img/ODBC_HiveConnection.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_to_Drillbit.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_to_Drillbit.png b/_docs/img/ODBC_to_Drillbit.png
new file mode 100644
index 0000000..7197d09
Binary files /dev/null and b/_docs/img/ODBC_to_Drillbit.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ODBC_to_Quorum.png
----------------------------------------------------------------------
diff --git a/_docs/img/ODBC_to_Quorum.png b/_docs/img/ODBC_to_Quorum.png
new file mode 100644
index 0000000..bd77a28
Binary files /dev/null and b/_docs/img/ODBC_to_Quorum.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/Parquet_DSN.png
----------------------------------------------------------------------
diff --git a/_docs/img/Parquet_DSN.png b/_docs/img/Parquet_DSN.png
new file mode 100644
index 0000000..a76eb4e
Binary files /dev/null and b/_docs/img/Parquet_DSN.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/Parquet_Preview.png
----------------------------------------------------------------------
diff --git a/_docs/img/Parquet_Preview.png b/_docs/img/Parquet_Preview.png
new file mode 100644
index 0000000..121dff5
Binary files /dev/null and b/_docs/img/Parquet_Preview.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/RegionParquet_table.png
----------------------------------------------------------------------
diff --git a/_docs/img/RegionParquet_table.png b/_docs/img/RegionParquet_table.png
new file mode 100644
index 0000000..db914bb
Binary files /dev/null and b/_docs/img/RegionParquet_table.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/SelectHbaseView.png
----------------------------------------------------------------------
diff --git a/_docs/img/SelectHbaseView.png b/_docs/img/SelectHbaseView.png
new file mode 100644
index 0000000..a37b30e
Binary files /dev/null and b/_docs/img/SelectHbaseView.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/Untitled.png
----------------------------------------------------------------------
diff --git a/_docs/img/Untitled.png b/_docs/img/Untitled.png
new file mode 100644
index 0000000..7fea1e8
Binary files /dev/null and b/_docs/img/Untitled.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/VoterContributions_hbaseview.png
----------------------------------------------------------------------
diff --git a/_docs/img/VoterContributions_hbaseview.png b/_docs/img/VoterContributions_hbaseview.png
new file mode 100644
index 0000000..2c37df9
Binary files /dev/null and b/_docs/img/VoterContributions_hbaseview.png differ
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ngram_plugin.png
----------------------------------------------------------------------
diff --git a/_docs/img/ngram_plugin.png b/_docs/img/ngram_plugin.png
new file mode 100644
index 0000000..c47148c
Binary files /dev/null and b/_docs/img/ngram_plugin.png differ