You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/03/17 22:02:44 UTC
[03/12] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits

DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2a34ac89
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2a34ac89
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2a34ac89

Branch: refs/heads/gh-pages
Commit: 2a34ac8931326f30b34986868f4c4e5ad61fec59
Parents: d959a21
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Wed Feb 25 18:31:56 2015 -0800
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Mon Mar 2 14:18:23 2015 -0800

----------------------------------------------------------------------
 _docs/009-datasources.md                        |  27 ++
 _docs/010-dev-custom-func.md                    |  37 ++
 _docs/011-manage.md                             |  14 +
 _docs/012-develop.md                            |   9 +
 _docs/013-rn.md                                 | 191 ++++++++
 _docs/014-contribute.md                         |   9 +
 _docs/015-sample-ds.md                          |  10 +
 _docs/016-design.md                             |  13 +
 _docs/018-progress.md                           |   8 +
 _docs/019-bylaws.md                             | 170 ++++++++
 _docs/connect/005-reg-hive.md                   |   7 +-
 _docs/connect/007-mongo-plugin.md               |   6 +-
 _docs/data-sources/001-hive-types.md            | 188 ++++++++
 _docs/data-sources/002-hive-udf.md              |  39 ++
 _docs/data-sources/003-parquet-ref.md           | 287 ++++++++++++
 _docs/data-sources/004-json-ref.md              | 432 +++++++++++++++++++
 _docs/dev-custom-fcn/002-dev-aggregate.md       |   2 +-
 _docs/img/Untitled.png                          | Bin 39796 -> 0 bytes
 _docs/img/json-workaround.png                   | Bin 0 -> 20786 bytes
 _docs/install/001-drill-in-10.md                |   2 +-
 _docs/interfaces/001-odbc-win.md                |   3 +-
 .../interfaces/odbc-win/003-connect-odbc-win.md |   2 +-
 .../interfaces/odbc-win/004-tableau-examples.md |   6 +-
 _docs/manage/002-start-stop.md                  |   2 +-
 _docs/manage/003-ports.md                       |   2 +-
 _docs/manage/conf/002-startup-opt.md            |   2 +-
 _docs/manage/conf/003-plan-exec.md              |   3 +-
 _docs/manage/conf/004-persist-conf.md           |   2 +-
 _docs/query/001-get-started.md                  |  75 ++++
 _docs/query/001-query-fs.md                     |  35 --
 _docs/query/002-query-fs.md                     |  35 ++
 _docs/query/002-query-hbase.md                  | 151 -------
 _docs/query/003-query-complex.md                |  56 ---
 _docs/query/003-query-hbase.md                  | 151 +++++++
 _docs/query/004-query-complex.md                |  56 +++
 _docs/query/004-query-hive.md                   |  45 --
 _docs/query/005-query-hive.md                   |  45 ++
 _docs/query/005-query-info-skema.md             | 109 -----
 _docs/query/006-query-info-skema.md             | 109 +++++
 _docs/query/006-query-sys-tbl.md                | 159 -------
 _docs/query/007-query-sys-tbl.md                | 159 +++++++
 _docs/query/get-started/001-lesson1-connect.md  |  88 ++++
 _docs/query/get-started/002-lesson2-download.md | 103 +++++
 _docs/query/get-started/003-lesson3-plugin.md   | 142 ++++++
 _docs/sql-ref/003-functions.md                  |  19 +-
 _docs/sql-ref/005-cmd-summary.md                |   2 +-
 _docs/sql-ref/006-reserved-wds.md               |   2 +-
 _docs/sql-ref/data-types/001-date.md            |   4 +-
 _docs/tutorial/005-lesson3.md                   |   2 +-
 49 files changed, 2433 insertions(+), 587 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/009-datasources.md
----------------------------------------------------------------------
diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md
new file mode 100644
index 0000000..3f3d431
--- /dev/null
+++ b/_docs/009-datasources.md
@@ -0,0 +1,27 @@
+---
+title: "Data Sources and File Formats"
+---
+Included in the data sources that  Drill supports are these key data sources:
+
+* Hbase
+* Hive
+* MapR-DB
+* File system
+
+. . .
+
+Drill supports the following input formats for data:
+
+* CSV (Comma-Separated-Values)
+* TSV (Tab-Separated-Values)
+* PSV (Pipe-Separated-Values)
+* Parquet
+* JSON
+
+You set the input format for data coming from data sources to Drill in the workspace portion of the [storage plugin](/drill/docs/storage-plugin-registration) definition. The default input format in Drill is Parquet. 
+
+You change the [sys.options table](/drill/docs/planning-and-execution-options) to set the output format of Drill data. The default storage format for Drill Create Table AS (CTAS) statements is Parquet.
+
+
+ 
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/010-dev-custom-func.md
----------------------------------------------------------------------
diff --git a/_docs/010-dev-custom-func.md b/_docs/010-dev-custom-func.md
new file mode 100644
index 0000000..f8a6445
--- /dev/null
+++ b/_docs/010-dev-custom-func.md
@@ -0,0 +1,37 @@
+---
+title: "Develop Custom Functions"
+---
+
+Drill provides a high performance Java API with interfaces that you can
+implement to develop simple and aggregate custom functions. Custom functions
+are reusable SQL functions that you develop in Java to encapsulate code that
+processes column values during a query. Custom functions can perform
+calculations and transformations that built-in SQL operators and functions do
+not provide. Custom functions are called from within a SQL statement, like a
+regular function, and return a single value.
+
+## Simple Function
+
+A simple function operates on a single row and produces a single row as the
+output. When you include a simple function in a query, the function is called
+once for each row in the result set. Mathematical and string functions are
+examples of simple functions.
+
+## Aggregate Function
+
+Aggregate functions differ from simple functions in the number of rows that
+they accept as input. An aggregate function operates on multiple input rows
+and produces a single row as output. The COUNT(), MAX(), SUM(), and AVG()
+functions are examples of aggregate functions. You can use an aggregate
+function in a query with a GROUP BY clause to produce a result set with a
+separate aggregate value for each combination of values from the GROUP BY
+clause.
+
+## Process
+
+To develop custom functions that you can use in your Drill queries, you must
+complete the following tasks:
+
+  1. Create a Java program that implements Drill’s simple or aggregate interface, and compile a sources and a classes JAR file.
+  2. Add the sources and classes JAR files to Drill’s classpath.
+  3. Add the name of the package that contains the classes to Drill’s main configuration file, drill-override.conf. 

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/011-manage.md
----------------------------------------------------------------------
diff --git a/_docs/011-manage.md b/_docs/011-manage.md
new file mode 100644
index 0000000..ec6663b
--- /dev/null
+++ b/_docs/011-manage.md
@@ -0,0 +1,14 @@
+---
+title: "Manage Drill"
+---
+When using Drill, you may need to stop and restart a Drillbit on a node, or
+modify various options. For example, the default storage format for CTAS
+statements is Parquet. You can modify the default setting so that output data
+is stored in CSV or JSON format.
+
+You can use certain SQL commands to manage Drill from within the Drill shell
+(SQLLine). You can also modify Drill configuration options, such as memory
+allocation, in Drill's configuration files.
+
+  
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/012-develop.md
----------------------------------------------------------------------
diff --git a/_docs/012-develop.md b/_docs/012-develop.md
new file mode 100644
index 0000000..2b9ce67
--- /dev/null
+++ b/_docs/012-develop.md
@@ -0,0 +1,9 @@
+---
+title: "Develop Drill"
+---
+To develop Drill, you compile Drill from source code and then set up a project
+in Eclipse for use as your development environment. To review or contribute to
+Drill code, you must complete the steps required to install and use the Drill
+patch review tool.
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/013-rn.md
----------------------------------------------------------------------
diff --git a/_docs/013-rn.md b/_docs/013-rn.md
new file mode 100644
index 0000000..f369335
--- /dev/null
+++ b/_docs/013-rn.md
@@ -0,0 +1,191 @@
+---
+title: "Release Notes"
+---
+## Apache Drill 0.7.0 Release Notes
+
+Apache Drill 0.7.0, the third beta release for Drill, is designed to help
+enthusiasts start working and experimenting with Drill. It also continues the
+Drill monthly release cycle as we drive towards general availability.
+
+This release is available as
+[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-
+drill-0.7.0.tar.gz) and
+[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-
+drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop.
+Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop
+distributions. There are associated build profiles and JIRAs that can help you
+run Drill against your preferred distribution
+
+### Apache Drill 0.7.0 Key Features
+
+  * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios:
+
+    * UDP multicast not enabled (as in EC2)
+
+    * Cluster spans multiple subnets
+
+    * Cluster has multihome configuration
+
+  * New functions to natively work with nested data - KVGen and Flatten 
+
+  * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) 
+
+  * Improved performance when querying Hive tables and File system through partition pruning
+
+  * Improved performance for HBase with LIKE operator pushdown
+
+  * Improved memory management
+
+  * Drill web UI monitoring and query profile improvements
+
+  * Ability to parse files without explicit extensions using default storage format specification
+
+  * Fixes for dealing with complex/nested data objects in Parquet/JSON
+
+  * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly
+
+  * Several hang related fixes
+
+  * Parquet writer fixes for handling large datasets
+
+  * Stability improvements in ODBC and JDBC drivers
+
+### Apache Drill 0.7.0 Key Notes and Limitations
+
+  * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
+  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results.
+
+## Apache Drill 0.6.0 Release Notes
+
+Apache Drill 0.6.0, the second beta release for Drill, is designed to help
+enthusiasts start working and experimenting with Drill. It also continues the
+Drill monthly release cycle as we drive towards general availability.
+
+This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc
+ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and 
+[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu
+bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled
+against Apache Hadoop. Drill has been tested against MapR, Cloudera, and
+Hortonworks Hadoop distributions. There are associated build profiles and
+JIRAs that can help you run Drill against your preferred distribution.
+
+### Apache Drill 0.6.0 Key Features
+
+This release is primarily a bug fix release, with [more than 30 JIRAs closed](
+https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers
+ion=12327472), but there are some notable features:
+
+  * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/drill/docs/mongodb-plugin-for-apache-drill)
+  * Filesystem query performance improvements with partition pruning
+  * Ability to use the file system as a persistent store for query profiles and diagnostic information
+  * Window function support (alpha)
+
+### Apache Drill 0.6.0 Key Notes and Limitations
+
+  * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
+  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results.
+
+## Apache Drill 0.5.0 Release Notes
+
+Apache Drill 0.5.0, the first beta release for Drill, is designed to help
+enthusiasts start working and experimenting with Drill. It also continues the
+Drill monthly release cycle as we drive towards general availability.
+
+The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h
+ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi
+on=12324880) closed, but there are some notable features. For information
+about the features, see the [Apache Drill Blog for the 0.5.0
+release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see).
+
+This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc
+ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and 
+[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu
+bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled
+against Apache Hadoop. Drill has been tested against MapR, Cloudera, and
+Hortonworks Hadoop distributions. There are associated build profiles and
+JIRAs that can help you run Drill against your preferred distribution.
+
+### Apache Drill 0.5.0 Key Notes and Limitations
+
+  * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
+  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results.
+  * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information.
+
+## Apache Drill 0.4.0 Release Notes
+
+The 0.4.0 release is a developer preview release, designed to help enthusiasts
+start to work with and experiment with Drill. It is the first Drill release
+that provides distributed query execution.
+
+This release is built upon [more than 800
+JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/).
+It is a pre-beta release on the way towards Drill. As a developer snapshot,
+the release contains a large number of outstanding bugs that will make some
+use cases challenging. Feel free to consult outstanding issues [targeted for
+the 0.5.0
+release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/)
+to see whether your use case is affected.
+
+To read more about this release and new features introduced, please view the
+[0.4.0 announcement blog
+entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4).
+
+The release is available as both [binary](http://www.apache.org/dyn/closer.cgi
+/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz)
+and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0-
+incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases,
+these are compiled against Apache Hadoop. Drill has also been tested against
+MapR, Cloudera and Hortonworks Hadoop distributions and there are associated
+build profiles or JIRAs that can help you run against your preferred
+distribution.
+
+### Some Key Notes & Limitations
+
+  * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
+  * In many cases,merge join operations return incorrect results.
+  * Use of a local filter in a join “on” clause when using left, right or full outer joins may result in incorrect results.
+  * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases.
+  * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results.
+  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results.
+  * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways.
+  * Multiplication of some types of DECIMAL(28+,*) will return incorrect result.
+
+## Apache Drill M1 -- Release Notes (Apache Drill Alpha)
+
+### Milestone 1 Goals
+
+The first release of Apache Drill is designed as a technology preview for
+people to better understand the architecture and vision. It is a functional
+release tying to piece together the key components of a next generation MPP
+query engine. It is designed to allow milestone 2 (M2) to focus on
+architectural analysis and performance optimization.
+
+  * Provide a new optimistic DAG execution engine for data analysis
+  * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity
+  * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance
+  * Support queries against columnar on disk format (Parquet) and JSON
+  * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN
+  * Support schema-on-read querying and execution
+  * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal.
+  * Support unlimited level of subqueries and correlated subqueries
+  * Provided an extensible query-language agnostic JSON-base logical data flow syntax.
+  * Support complex data type manipulation via logical plan operations
+
+### Known Issues
+
+SQL Parsing  
+Because Apache Drill is built to support late-bound changing schemas while SQL
+is statically typed, there are couple of special requirements that are
+required writing SQL queries. These are limited to the current release and
+will be correct in a future milestone release.
+
+  * All tables are exposed as a single map field that contains
+  * Drill Alpha doesn't support implicit or explicit casts outside those required above.
+  * Drill Alpha does not include, there are currently a couple of differences for how to write a query in In order to query against
+
+### UDFs
+
+  * Drill currently supports simple and aggregate functions using scalar, repeated and
+  * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. However,
+  * asd
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/014-contribute.md
----------------------------------------------------------------------
diff --git a/_docs/014-contribute.md b/_docs/014-contribute.md
new file mode 100644
index 0000000..33db231
--- /dev/null
+++ b/_docs/014-contribute.md
@@ -0,0 +1,9 @@
+---
+title: "Contribute to Drill"
+---
+The Apache Drill community welcomes your support. Please read [Apache Drill
+Contribution Guidelines](/drill/docs/apache-drill-contribution-guidelines) for information about how to contribute to
+the project. If you would like to contribute to the project and need some
+ideas for what to do, please read [Apache Drill Contribution
+Ideas](/drill/docs/apache-drill-contribution-ideas).
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/015-sample-ds.md
----------------------------------------------------------------------
diff --git a/_docs/015-sample-ds.md b/_docs/015-sample-ds.md
new file mode 100644
index 0000000..7212ea0
--- /dev/null
+++ b/_docs/015-sample-ds.md
@@ -0,0 +1,10 @@
+---
+title: "Sample Datasets"
+---
+Use any of the following sample datasets provided to test Drill:
+
+  * [AOL Search](/drill/docs/aol-search)
+  * [Enron Emails](/drill/docs/enron-emails)
+  * [Wikipedia Edit History](/drill/docs/wikipedia-edit-history)
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/016-design.md
----------------------------------------------------------------------
diff --git a/_docs/016-design.md b/_docs/016-design.md
new file mode 100644
index 0000000..00b17e5
--- /dev/null
+++ b/_docs/016-design.md
@@ -0,0 +1,13 @@
+---
+title: "Design Docs"
+---
+Review the Apache Drill design docs for early descriptions of Apache Drill
+functionality, terms, and goals, and reference the research articles to learn
+about Apache Drill's history:
+
+  * [Drill Plan Syntax](/drill/docs/drill-plan-syntax)
+  * [RPC Overview](/drill/docs/rpc-overview)
+  * [Query Stages](/drill/docs/query-stages)
+  * [Useful Research](/drill/docs/useful-research)
+  * [Value Vectors](/drill/docs/value-vectors)
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/018-progress.md
----------------------------------------------------------------------
diff --git a/_docs/018-progress.md b/_docs/018-progress.md
new file mode 100644
index 0000000..bf19a29
--- /dev/null
+++ b/_docs/018-progress.md
@@ -0,0 +1,8 @@
+---
+title: "Progress Reports"
+---
+Review the following Apache Drill progress reports for a summary of issues,
+progression of the project, summary of mailing list discussions, and events:
+
+  * [2014 Q1 Drill Report](/drill/docs/2014-q1-drill-report)
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/019-bylaws.md
----------------------------------------------------------------------
diff --git a/_docs/019-bylaws.md b/_docs/019-bylaws.md
new file mode 100644
index 0000000..2c35042
--- /dev/null
+++ b/_docs/019-bylaws.md
@@ -0,0 +1,170 @@
+---
+title: "Project Bylaws"
+---
+## Introduction
+
+This document defines the bylaws under which the Apache Drill project
+operates. It defines the roles and responsibilities of the project, who may
+vote, how voting works, how conflicts are resolved, etc.
+
+Drill is a project of the [Apache Software
+Foundation](http://www.apache.org/foundation/). The foundation holds the
+copyright on Apache code including the code in the Drill codebase. The
+[foundation FAQ](http://www.apache.org/foundation/faq.html) explains the
+operation and background of the foundation.
+
+Drill is typical of Apache projects in that it operates under a set of
+principles, known collectively as the _Apache Way_. If you are new to Apache
+development, please refer to the [Incubator
+project](http://incubator.apache.org/) for more information on how Apache
+projects operate.
+
+## Roles and Responsibilities
+
+Apache projects define a set of roles with associated rights and
+responsibilities. These roles govern what tasks an individual may perform
+within the project. The roles are defined in the following sections.
+
+### Users
+
+The most important participants in the project are people who use our
+software. The majority of our contributors start out as users and guide their
+development efforts from the user's perspective.
+
+Users contribute to the Apache projects by providing feedback to contributors
+in the form of bug reports and feature suggestions. As well, users participate
+in the Apache community by helping other users on mailing lists and user
+support forums.
+
+### Contributors
+
+All of the volunteers who are contributing time, code, documentation, or
+resources to the Drill Project. A contributor that makes sustained, welcome
+contributions to the project may be invited to become a committer, though the
+exact timing of such invitations depends on many factors.
+
+### Committers
+
+The project's committers are responsible for the project's technical
+management. Committers have access to a specified set of subproject's code
+repositories. Committers on subprojects may cast binding votes on any
+technical discussion regarding that subproject.
+
+Committer access is by invitation only and must be approved by lazy consensus
+of the active PMC members. A Committer is considered _emeritus_ by his or her
+own declaration or by not contributing in any form to the project for over six
+months. An emeritus committer may request reinstatement of commit access from
+the PMC which will be sufficient to restore him or her to active committer
+status.
+
+Commit access can be revoked by a unanimous vote of all the active PMC members
+(except the committer in question if he or she is also a PMC member).
+
+All Apache committers are required to have a signed [Contributor License
+Agreement (CLA)](http://www.apache.org/licenses/icla.txt) on file with the
+Apache Software Foundation. There is a [Committer
+FAQ](http://www.apache.org/dev/committers.html) which provides more details on
+the requirements for committers.
+
+A committer who makes a sustained contribution to the project may be invited
+to become a member of the PMC. The form of contribution is not limited to
+code. It can also include code review, helping out users on the mailing lists,
+documentation, etc.
+
+### Project Management Committee
+
+The PMC is responsible to the board and the ASF for the management and
+oversight of the Apache Drill codebase. The responsibilities of the PMC
+include
+
+  * Deciding what is distributed as products of the Apache Drill project. In particular all releases must be approved by the PMC.
+  * Maintaining the project's shared resources, including the codebase repository, mailing lists, websites.
+  * Speaking on behalf of the project.
+  * Resolving license disputes regarding products of the project.
+  * Nominating new PMC members and committers.
+  * Maintaining these bylaws and other guidelines of the project.
+
+Membership of the PMC is by invitation only and must be approved by a lazy
+consensus of active PMC members. A PMC member is considered _emeritus_ by his
+or her own declaration or by not contributing in any form to the project for
+over six months. An emeritus member may request reinstatement to the PMC,
+which will be sufficient to restore him or her to active PMC member.
+
+Membership of the PMC can be revoked by an unanimous vote of all the active
+PMC members other than the member in question.
+
+The chair of the PMC is appointed by the ASF board. The chair is an office
+holder of the Apache Software Foundation (Vice President, Apache Drill) and
+has primary responsibility to the board for the management of the projects
+within the scope of the Drill PMC. The chair reports to the board quarterly on
+developments within the Drill project.
+
+The term of the chair is one year. When the current chair's term is up or if
+the chair resigns before the end of his or her term, the PMC votes to
+recommend a new chair using lazy consensus, but the decision must be ratified
+by the Apache board.
+
+## Decision Making
+
+Within the Drill project, different types of decisions require different forms
+of approval. For example, the previous section describes several decisions
+which require 'lazy consensus' approval. This section defines how voting is
+performed, the types of approvals, and which types of decision require which
+type of approval.
+
+### Voting
+
+Decisions regarding the project are made by votes on the primary project
+development mailing list
+_[dev@drill.apache.org](mailto:dev@drill.apache.org)_. Where necessary, PMC
+voting may take place on the private Drill PMC mailing list
+[private@drill.apache.org](mailto:private@drill.apache.org). Votes are clearly
+indicated by subject line starting with [VOTE]. Votes may contain multiple
+items for approval and these should be clearly separated. Voting is carried
+out by replying to the vote mail. Voting may take four flavors.
+
+ <table ><tbody><tr><td valign="top" >Vote</td><td valign="top" > </td></tr><tr><td valign="top" >+1</td><td valign="top" >'Yes,' 'Agree,' or 'the action should be performed.' In general, this vote also indicates a willingness on the behalf of the voter in 'making it happen'.</td></tr><tr><td valign="top" >+0</td><td valign="top" >This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.</td></tr><tr><td valign="top" >-0</td><td valign="top" >This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.</td></tr><tr><td valign="top" >-1</td><td valign="top" >This is a negative vote. On issues where consensus is required, this vote counts as a <strong>veto</strong>. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an al
 ternative course of action.</td></tr></tbody></table>
+  
+All participants in the Drill project are encouraged to show their agreement
+with or against a particular action by voting. For technical decisions, only
+the votes of active committers are binding. Non binding votes are still useful
+for those with binding votes to understand the perception of an action in the
+wider Drill community. For PMC decisions, only the votes of PMC members are
+binding.
+
+Voting can also be applied to changes already made to the Drill codebase.
+These typically take the form of a veto (-1) in reply to the commit message
+sent when the commit is made. Note that this should be a rare occurrence. All
+efforts should be made to discuss issues when they are still patches before
+the code is committed.
+
+### Approvals
+
+These are the types of approvals that can be sought. Different actions require
+different types of approvals.
+
+<table ><tbody><tr><td valign="top" >Approval Type</td><td valign="top" > </td></tr><tr><td valign="top" >Consensus</td><td valign="top" >For this to pass, all voters with binding votes must vote and there can be no binding vetoes (-1). Consensus votes are rarely required due to the impracticality of getting all eligible voters to cast a vote.</td></tr><tr><td valign="top" >Lazy Consensus</td><td valign="top" >Lazy consensus requires 3 binding +1 votes and no binding vetoes.</td></tr><tr><td valign="top" >Lazy Majority</td><td valign="top" >A lazy majority vote requires 3 binding +1 votes and more binding +1 votes that -1 votes.</td></tr><tr><td valign="top" >Lazy Approval</td><td valign="top" >An action with lazy approval is implicitly allowed unless a -1 vote is received, at which time, depending on the type of action, either lazy majority or lazy consensus approval must be obtained.</td></tr></tbody></table>  
+  
+### Vetoes
+
+A valid, binding veto cannot be overruled. If a veto is cast, it must be
+accompanied by a valid reason explaining the reasons for the veto. The
+validity of a veto, if challenged, can be confirmed by anyone who has a
+binding vote. This does not necessarily signify agreement with the veto -
+merely that the veto is valid.
+
+If you disagree with a valid veto, you must lobby the person casting the veto
+to withdraw his or her veto. If a veto is not withdrawn, the action that has
+been vetoed must be reversed in a timely manner.
+
+### Actions
+
+This section describes the various actions which are undertaken within the
+project, the corresponding approval required for that action and those who
+have binding votes over the action. It also specifies the minimum length of
+time that a vote must remain open, measured in business days. In general votes
+should not be called at times when it is known that interested members of the
+project will be unavailable.
+
+<table ><tbody><tr><td valign="top" >Action</td><td valign="top" >Description</td><td valign="top" >Approval</td><td valign="top" >Binding Votes</td><td valign="top" >Minimum Length</td></tr><tr><td valign="top" >Code Change</td><td valign="top" >A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc.</td><td valign="top" >Consensus approval of active committers, with a minimum of one +1. The code can be committed after the first +1</td><td valign="top" >Active committers</td><td valign="top" >1</td></tr><tr><td valign="top" >Release Plan</td><td valign="top" >Defines the timetable and actions for a release. The plan also nominates a Release Manager.</td><td valign="top" >Lazy majority</td><td valign="top" >Active committers</td><td valign="top" >3</td></tr><tr><td valign="top" >Product Release</td><td valign="top" >When a release of one of the project's products is ready, a vote is required to accept t
 he release as an official release of the project.</td><td valign="top" >Lazy Majority</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >Adoption of New Codebase</td><td valign="top" >When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr><tr><td valign="top" >New Committer</td><td valign="top" >When a new committer is proposed for the project.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >New PMC Member</td><td valign="top" >When a committer is proposed for the PMC.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign
 ="top" >3</td></tr><tr><td valign="top" >Committer Removal</td><td valign="top" >When removal of commit privileges is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the committer in question if a member of the PMC).</td><td valign="top" >6</td></tr><tr><td valign="top" >PMC Member Removal</td><td valign="top" >When removal of a PMC member is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the member in question).</td><td valign="top" >6</td></tr><tr><td valign="top" >Modifying Bylaws</td><td valign="top" >Modifying this document.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr></tbody></table>
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/connect/005-reg-hive.md
----------------------------------------------------------------------
diff --git a/_docs/connect/005-reg-hive.md b/_docs/connect/005-reg-hive.md
index 564bebc..9b44034 100644
--- a/_docs/connect/005-reg-hive.md
+++ b/_docs/connect/005-reg-hive.md
@@ -8,7 +8,7 @@ storage plugin instance for a Hive data source, provide a unique name for the
 instance, and identify the type as “`hive`”. You must also provide the
 metastore connection information.
 
-Currently, Drill only works with Hive version 0.12. To access Hive tables
+Currently, Drill supports Hive version 0.13. To access Hive tables
 using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits
 must have the SerDes or InputFormat/OutputFormat `JAR` files in the
 `<drill_installation_directory>/jars/3rdparty` folder.
@@ -44,6 +44,8 @@ To register a remote Hive metastore with Drill, complete the following steps:
         }       
   5. Click **Enable**.
   6. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath, add the following line to `drill-env.sh`.
+  
+        export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-0.20.2
 
 Once you have configured a storage plugin instance for a Hive data source, you
 can [query Hive tables](/drill/docs/querying-hive/).
@@ -80,4 +82,5 @@ steps:
   4. Click** Enable.**
   5. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath, add the following line to `drill-env.sh`.
   
-        export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-0.20.2
\ No newline at end of file
+        export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-0.20.2
+ 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/connect/007-mongo-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect/007-mongo-plugin.md b/_docs/connect/007-mongo-plugin.md
index fd5dba8..3ec4fdf 100644
--- a/_docs/connect/007-mongo-plugin.md
+++ b/_docs/connect/007-mongo-plugin.md
@@ -15,7 +15,7 @@ on the data using ANSI SQL.
 
 This tutorial assumes that you have Drill installed locally (embedded mode),
 as well as MongoDB. Examples in this tutorial use zip code aggregation data
-provided by MongoDB. Before You Begin provides links to download tools and data
+provided by MongDB. Before You Begin provides links to download tools and data
 used throughout the tutorial.
 
 **Note:** A local instance of Drill is used in this tutorial for simplicity. You can also run Drill and MongoDB together in distributed mode.
@@ -86,8 +86,8 @@ the `USE` command to change schema.
 ### Example Queries
 
 The following example queries are included for reference. However, you can use
-the SQL power of Apache Drill directly on MongoDB. For more information about,
-refer to the [SQL
+the SQL power of Apache Drill directly on MongoDB. For more information,
+refer to the [Apache Drill SQL
 Reference](/drill/docs/sql-reference).
 
 **Example 1: View mongo.zipdb Dataset**

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/data-sources/001-hive-types.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/001-hive-types.md b/_docs/data-sources/001-hive-types.md
new file mode 100644
index 0000000..34d5bb6
--- /dev/null
+++ b/_docs/data-sources/001-hive-types.md
@@ -0,0 +1,188 @@
+title: "Hive-to-Drill Data Type Mapping"
+parent: "Data Sources"
+---
+Using Drill you can read tables created in Hive that use data types compatible with Drill. Drill currently does not support writing Hive tables. The following table shows Drill support for Hive primitive types:
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>Hive Type</th>
+    <th>Drill Description</th>
+  </tr>
+  <tr>
+    <td>BIGINT</td>
+    <td>BIGINT</td>
+    <td>8-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>BOOLEAN</td>
+    <td>BOOLEAN</td>
+    <td>TRUE (1) or FALSE (0)</td>
+  </tr>
+  <tr>
+    <td>N/A</td>
+    <td>CHAR</td>
+    <td>Same as Varchar but having a fixed-length max 255</td>
+  </tr>
+  <tr>
+    <td>DATE</td>
+    <td>DATE</td>
+    <td>Years months and days in the form in the form YYYY-MM-DD</td>
+  </tr>
+  <tr>
+    <td>DECIMAL</td>
+    <td>DECIMAL</td>
+    <td>38-digit precision</td>
+  </tr>
+  <tr>
+    <td>FLOAT</td>
+    <td>FLOAT</td>
+    <td>4-byte single precision floating point number</td>
+  </tr>
+  <tr>
+    <td>DOUBLE</td>
+    <td>DOUBLE</td>
+    <td>8-byte double precision floating point number</td>
+  </tr>
+  <tr>
+    <td>INTEGER</td>
+    <td>INT</td>
+    <td>4-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>INTERVAL</td>
+    <td>N/A</td>
+    <td>Integer fields representing a period of time depending on the type of interval</td>
+  </tr>
+  <tr>
+    <td>INTERVALDAY</td>
+    <td>N/A</td>
+    <td>Integer fields representing a day</td>
+  </tr>
+  <tr>
+    <td>INTERVALYEAR</td>
+    <td>N/A</td>
+    <td>Integer fields representing a year</td>
+  </tr>
+  <tr>
+    <td>SMALLINT</td>
+    <td>SMALLINT</td>
+    <td>2-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>TIME</td>
+    <td>N/A</td>
+    <td>Hours minutes seconds 24-hour basis</td>
+  </tr>
+  <tr>
+    <td>TIMESTAMP</td>
+    <td>N/A</td>
+    <td>Conventional UNIX Epoch timestamp.</td>
+  </tr>
+  <tr>
+    <td>N/A</td>
+    <td>TIMESTAMP</td>
+    <td>JDBC timestamp in yyyy-mm-dd hh:mm:ss format</td>
+  </tr>
+  <tr>
+    <td>TIMESTAMPTZ</td>
+    <td>N/A</td>
+    <td>Hours ahead of or behind Coordinated Universal Time (UTC) or regional hours and minutes</td>
+  </tr>
+  <tr>
+    <td>N/A</td>
+    <td>STRING</td>
+    <td>Binary string (16)</td>
+  </tr>
+  <tr>
+    <td>BINARY</td>
+    <td>BINARY</td>
+    <td>Binary string</td>
+  </tr>
+  <tr>
+    <td>VARCHAR</td>
+    <td>VARCHAR</td>
+    <td>Character string variable length</td>
+  </tr>
+</table>
+
+## Unsupported Types
+The following Hive types are not supported:
+
+* LIST
+* MAP
+* STRUCT
+* TIMESTAMP (Unix Epoch format)
+* UNION
+
+The Hive version used in MapR supports the Hive timestamp in Unix Epoch format. Currently, the Apache Hive version used by Drill does not support this timestamp format. The workaround is to use the JDBC format for the timestamp, which Hive accepts and Drill uses, as shown in the type mapping example. The timestamp value appears in the CSV file in JDBC format: 2015-03-25 01:23:15. The Hive table defines column i as a timestamp column. The Drill extract function verifies that Drill interprets the timestamp correctly.
+
+## Type Mapping Example
+This example demonstrates the mapping of Hive data types to Drill data types. Using a CSV that has the following contents, you create a Hive table having values of different supported types:
+
+     100005,true,3.5,-1231.4,3.14,42,"SomeText",2015-03-25,2015-03-25 01:23:15 
+
+The example assumes that the CSV resides on the MapR file system (MapRFS) in the Drill sandbox: `/mapr/demo.mapr.com/data/`
+ 
+In Hive, you define an external table using the following query:
+
+    hive> CREATE EXTERNAL TABLE types_demo ( 
+          a bigint, 
+          b boolean, 
+          c DECIMAL(3, 2), 
+          d double, 
+          e float, 
+          f INT, 
+          g VARCHAR(64), 
+          h date,
+          i timestamp
+          ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+          LINES TERMINATED BY '\n' 
+          STORED AS TEXTFILE LOCATION '/mapr/demo.mapr.com/data/mytypes.csv';
+
+You check that Hive mapped the data from the CSV to the typed values as as expected:
+
+    hive> SELECT * FROM types_demo;
+    OK
+    100005	true	3.5	-1231.4	3.14	42	"SomeText"	2015-03-25   2015-03-25 01:23:15
+    Time taken: 0.524 seconds, Fetched: 1 row(s)
+
+In Drill, you use the Hive storage plugin that has the following definition.
+
+	{
+	  "type": "hive",
+	  "enabled": true,
+	  "configProps": {
+	    "hive.metastore.uris": "thrift://localhost:9083",
+	    "hive.metastore.sasl.enabled": "false"
+	  }
+	}
+
+Using the Hive storage plugin connects Drill to the Hive metastore containing the data.
+	
+	0: jdbc:drill:> USE hive;
+	+------------+------------+
+	|     ok     |  summary   |
+	+------------+------------+
+	| true       | Default schema changed to 'hive' |
+	+------------+------------+
+	1 row selected (0.067 seconds)
+	
+The data in the Hive table shows the expected values.
+	
+	0: jdbc:drill:> SELECT * FROM hive.`types_demo`;
+	+--------+------+------+---------+------+----+------------+------------+-----------+
+	|   a    |   b  |  c   |     d   |  e   | f  |     g      |     h      |     i     |
+	+------------+---------+---------+------+----+------------+------------+-----------+
+	| 100005 | true | 3.50 | -1231.4 | 3.14 | 42 | "SomeText" | 2015-03-25 | 2015-03-25 01:23:15.0 |
+	+--------+------+------+---------+------+----+------------+------------+-----------+
+	1 row selected (1.262 seconds)
+	
+To validate that Drill interprets the timestamp in column i correctly, use the extract function to extract part of the date:
+
+    0: jdbc:drill:> select extract(year from i) from hive.`types_demo`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | 2015       |
+    +------------+
+    1 row selected (0.387 seconds)

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/data-sources/002-hive-udf.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/002-hive-udf.md b/_docs/data-sources/002-hive-udf.md
new file mode 100644
index 0000000..7c7a48c
--- /dev/null
+++ b/_docs/data-sources/002-hive-udf.md
@@ -0,0 +1,39 @@
+title: "Deploying and Using a Hive UDF"
+parent: "Data Sources"
+---
+If the extensive Hive functions, such as the mathematical and date functions, which Drill supports do not meet your needs, you can use a Hive UDF in Drill queries. Drill supports your existing Hive scalar UDFs. You can do queries on Hive tables and access existing Hive input/output formats, including custom serdes. Drill serves as a complement to Hive deployments by offering low latency queries.
+
+## Creating the UDF
+You create the JAR for a UDF to use in Drill in a conventional manner with a few caveats, using a unique name and creating a Drill resource, covered in this section.
+
+1. Use a unique name for the Hive UDF to avoid conflicts with Drill custom functions of the same name.
+2. Create a custom Hive UDF using either of these APIs:
+
+   * Simple API: org.apache.hadoop.hive.ql.exec.UDF
+   * Complex API: org.apache.hadoop.hive.ql.udf.generic.GenericUDF
+3. Create an empty `drill-module.conf` in the resources directory in the Java project. 
+4. Export the logic to a JAR, including the `drill-module.conf` file in resources.
+
+The `drill-module.conf` file defines [startup options](/drill/docs/start-up-options/) and makes the JAR functions available to use in queries throughout the Hadoop cluster. After exporting the UDF logic to a JAR file, set up the UDF in Drill. Drill users can access the custom UDF for use in Hive queries.
+
+## Setting Up a UDF
+After you export the custom UDF as a JAR, perform the UDF setup tasks so Drill can access the UDF. The JAR needs to be available at query execution time as a session resource, so Drill queries can refer to the UDF by its name.
+ 
+To set up the UDF:
+
+1. Register Hive. [Register a Hive storage plugin](/drill/docs/registering-hive/) that connects Drill to a Hive data source.
+2. In Drill 0.7 and later, add the JAR for the UDF to the Drill CLASSPATH. In earlier versions of Drill, place the JAR file in the `/jars/3rdparty` directory of the Drill installation on all nodes running a Drillbit.
+3. On each Drill node in the cluster, restart the Drillbit.
+   `<drill installation directory>/bin/drillbit.sh restart`
+ 
+## Using a UDF
+Use a Hive UDF just as you would use a Drill custom function. For example, to query using a Hive UDF named upper-to-lower that takes a column.value argument, the SELECT statement looks something like this:  
+     
+     SELECT upper-to-lower(my_column.myvalue) FROM mytable;
+     
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/data-sources/003-parquet-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/003-parquet-ref.md b/_docs/data-sources/003-parquet-ref.md
new file mode 100644
index 0000000..f9b5924
--- /dev/null
+++ b/_docs/data-sources/003-parquet-ref.md
@@ -0,0 +1,287 @@
+
+title: "Parquet Format"
+parent: "Data Sources"
+---
+## Parquet Format
+[Apache Parquet](http://parquet.incubator.apache.org/documentation/latest) has the following characteristics:
+
+* Self-describing
+* Columnar format
+* Language-independent 
+
+Self-describing data embeds the schema or structure with the data itself. Hadoop use cases drive the growth of self-describing data formats, such as Parquet and JSON, and of NoSQL databases, such as HBase. These formats and databases are well suited for the agile and iterative development cycle of Hadoop applications and BI/analytics. Optimized for working with large files, Parquet arranges data in columns, putting related values in close proximity to each other to optimize query performance, minimize I/O, and facilitate compression. Parquet detects and encodes the same or similar data using a technique that conserves resources.
+
+Apache Drill includes the following support for Parquet:
+
+* Querying self-describing data in files or NoSQL databases without having to define and manage schema overlay definitions in centralized metastores
+* Creating Parquet files from other file formats, such as JSON, without any set up
+* Generating Parquet files that have evolving or changing schemas and querying the data on the fly
+* Handling Parquet scalar and complex data types, such as maps and arrays
+
+### Reading and Writing Parquet Files
+When a read of Parquet data occurs, Drill loads only the necessary columns of data, which reduces I/O. Reading only a small piece of the Parquet data from a data file or table, Drill can examine and analyze all values for a column across multiple files.
+Parquet is the default storage format for a [Create Table As Select (CTAS)](/drill/docs/create-table-as-ctas-command) command. You can create a Drill table from one format and store the data in another format, including Parquet.
+
+CTAS can use any data source provided by the storage plugin. 
+
+Parquet data generally resides in multiple files that resemble MapReduce output having numbered file names,  such as 0_0_0.parquet in a directory.
+
+To read Parquet data, point Drill to a single file or directory. Drill merges all files in a directory, including subdirectories, to create a single table.
+
+To write Parquet data using the CTAS command, set the session store.format option as shown in the next section. Alternatively, configure the storage plugin to point to the directory containing the Parquet files.
+
+### Configuring the Parquet Storage Format
+The default file type for writing data to a workspace is Parquet. You can change the default by setting a different format in the storage plugin definition. Use the store.format option to set the CTAS output format of a Parquet row group at the session or system level.
+
+Use the ALTER command to set the `store.format` option.
+         
+        ALTER SESSION SET `store.format` = 'parquet';
+        ALTER SYSTEM SET `store.format` = 'parquet';
+        
+Parquet is also the default Drill format for reading. For example, if you query a JSON file, Drill attempts to read the file in Parquet format first.
+
+### Configuring the Size of Parquet Files
+Configuring the size of Parquet files by setting the store.parquet.block-size can improve write performance. The block size is the size of MFS, HDFS, or the file system. 
+
+The larger the block size, the more memory Drill needs for buffering data. Parquet files that contain a single block maximize the amount of data Drill stores contiguously on disk. Given a single row group per file, Drill stores the entire Parquet file onto the block, avoiding network I/O.
+
+To maximize performance, set the target size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system by using the `store.parquet.block-size`:         
+        
+        ALTER SESSION SET `store.parquet.block-size` = 536870912;         
+        ALTER SYSTEM SET `store.parquet.block-size` = 536870912  
+
+The default block size is 536870912 bytes.
+
+### Type Mapping
+The high correlation between Parquet and SQL data types makes reading Parquet files effortless in Drill. Writing to Parquet files takes more work than reading. Because SQL does not support all Parquet data types, to prevent Drill from inferring a type other than one you want, use the [cast function] (/drill/docs/sql-functions) Drill offers more liberal casting capabilities than SQL for Parquet conversions if the Parquet data is of a logical type. 
+
+The following general process converts a file from JSON to Parquet:
+
+* Create or use an existing storage plugin that specifies the storage location of the Parquet file, mutability of the data, and supported file formats.
+* Take a look at the JSON data. 
+* Create a table that selects the JSON file.
+* In the CTAS command, cast JSON string data to corresponding SQL types.
+
+### Example: Read JSON, Write Parquet
+This example demonstrates a storage plugin definition, a sample row of data from a JSON file, and a Drill query that writes the JSON input to Parquet output. 
+
+#### Storage Plugin Definition
+The following example storage plugin defines these options
+
+* A connection to the home directory of the file system "file:///" instead of another location, such as the mapr-fs. 
+* A workspace named "home" that represents a location in the file system connection.
+* The path to home: "/home/mapr" 
+* The writable option set to true, so Drill can write the Parquet output.
+* A default input format of null. An error occurs if you read a file having an ambiguous extension. 
+* The storage formats in which Drill writes the data. 
+
+        {
+	      "type": "file",
+	      "enabled": true,
+	      "connection": "file:///",
+		  "workspaces": {
+		    "home": {
+		      "location": "/home/mapr",
+		      "writable": true,
+		      "defaultInputFormat": null
+		    }
+		  },
+		  "formats": {
+		    "parquet": {
+		      "type": "parquet"
+		    },
+		    "json": {
+		      "type": "json"
+		    }
+		  }
+		}
+
+First, the example storage plugin definition allows Drill to write data to Parquet or JSON. Next, the CTAS query gets the JSON file from and writes the Parquet directory of files to the home directory.
+
+#### Sample Row of JSON Data
+A JSON file contains data consisting of strings, typical of JSON data. The following example shows one row of the JSON file:
+
+        {"trans_id":0,"date":"2013-07-26","time":"04:56:59","amount":80.5,"user_info":
+          {"cust_id":28,"device":"IOS5","state":"mt"
+          },"marketing_info":
+            {"camp_id":4,"keywords":            ["go","to","thing","watch","made","laughing","might","pay","in","your","hold"]
+            },
+            "trans_info":
+              {"prod_id":[16],
+               "purch_flag":"false"
+              }
+        }
+              
+
+#### CTAS Query      
+The following example shows a CTAS query that creates a table from JSON data. The command casts the date, time, and amount strings to SQL types DATE, TIME, and DOUBLE. String-to-VARCHAR casting of the other strings occurs automatically.
+
+    CREATE TABLE home.sampleparquet AS 
+      (SELECT trans_id, 
+        cast(`date` AS date) transdate, 
+        cast(`time` AS time) transtime, 
+        cast(amount AS double) amount,`
+        user_info`,`marketing_info`, `trans_info` 
+        FROM home.`sample.json`);
+        
+The output is a Parquet file:
+
+    +------------+---------------------------+
+	|  Fragment  | Number of records written |
+	+------------+---------------------------+
+	| 0_0        | 5                         |
+	+------------+---------------------------+
+	1 row selected (1.369 seconds)
+
+For more examples of and information about using Parquet data, see ["Evolving Parquet as self-describing data format – New paradigms for consumerization of Hadoop data"](https://www.mapr.com/blog/evolving-parquet-self-describing-data-format-new-paradigms-consumerization-hadoop-data#.VNeqQbDF_8f).
+
+#### SQL Data Types to Parquet
+The first table in this section maps SQL data types to Parquet data types, limited intentionally by Parquet creators to minimize the impact on disk storage:
+
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>Parquet Type</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>BIGINT</td>
+    <td>INT64</td>
+    <td>8-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>BOOLEAN</td>
+    <td>BOOLEAN</td>
+    <td>TRUE (1) or FALSE (0)</td>
+  </tr>
+  <tr>
+    <td>N/A</td>
+    <td>BYTE_ARRAY</td>
+    <td>Arbitrarily long byte array</td>
+  </tr>
+  <tr>
+    <td>FLOAT</td>
+    <td>FLOAT</td>
+    <td>4-byte single precision floating point number</td>
+  </tr>
+  <tr>
+    <td>DOUBLE</td>
+    <td>DOUBLE</td>
+    <td>8-byte double precision floating point number</td>
+  </tr>
+  <tr>
+    <td>INTEGER</td>
+    <td>INT32</td>
+    <td>4-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td>INT96</td>
+    <td>12-byte signed int</td>
+  </tr>
+</table>
+
+#### SQL Types to Parquet Logical Types
+Parquet also supports logical types, fully described on the [Apache Parquet site](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md). Embedded types, JSON and BSON, annotate a binary primitive type representing a JSON or BSON document. The logical types and their mapping to SQL types are:
+ 
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>Drill Description</th>
+    <th>Parquet Logical Type</th>
+    <th>Parquet Description</th>
+  </tr>
+  <tr>
+    <td>DATE</td>
+    <td>Years months and days in the form in the form YYYY-MM-DD</td>
+    <td>DATE</td>
+    <td>Date, not including time of day. Uses the int32 annotation. Stores the number of days from the Unix epoch, 1 January 1970.</td>
+  </tr>
+  <tr>
+    <td>VARCHAR</td>
+    <td>Character string variable length</td>
+    <td>UTF8 (Strings)</td>
+    <td>Annotates the binary primitive type. The byte array is interpreted as a UTF-8 encoded character string.</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>INT_8</td>
+    <td>8 bits, signed</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>INT_16</td>
+    <td>16 bits, usigned</td>
+  </tr>
+  <tr>
+    <td>INT</td>
+    <td>4-byte signed integer</td>
+    <td>INT_32</td>
+    <td>32 bits, signed</td>
+  </tr>
+  <tr>
+    <td>DOUBLE</td>
+    <td>8-byte double precision floating point number</td>
+    <td>INT_64</td>
+    <td>64 bits, signed</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>UINT_8</td>
+    <td>8 bits, unsigned</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>UINT_16</td>
+    <td>16 bits, unsigned</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>UINT_32</td>
+    <td>32 bits, unsigned</td>
+  </tr>
+  <tr>
+    <td>None</td>
+    <td></td>
+    <td>UINT_64</td>
+    <td>64 bits, unsigned</td>
+  </tr>
+  <tr>
+    <td>DECIMAL</td>
+    <td>38-digit precision</td>
+    <td>DECIMAL</td>
+    <td>Arbitrary-precision signed decimal numbers of the form unscaledValue * 10^(-scale)</td>
+  </tr>
+  <tr>
+    <td>TIME</td>
+    <td>Hours, minutes, seconds, milliseconds; 24-hour basis</td>
+    <td>TIME_MILLIS</td>
+    <td>Logical time, not including the date. Annotates int32. Number of milliseconds after midnight.</td>
+  </tr>
+  <tr>
+    <td>TIMESTAMP</td>
+    <td>Year, month, day, and seconds</td>
+    <td>TIMESTAMP_MILLIS</td>
+    <td>Logical date and time. Annotates an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.</td>
+  </tr>
+  <tr>
+    <td>INTERVAL</td>
+    <td>Integer fields representing a period of time depending on the type of interval</td>
+    <td>INTERVAL</td>
+    <td>An interval of time. Annotates a fixed_len_byte_array of length 12. Months, days, and ms in unsigned little-endian format.</td>
+  </tr>
+</table>
+
+### Data Description Language Support
+Parquet supports the following data description languages:
+
+* Apache Avro
+* Apache Thrift
+* Google Protocol Buffers 
+
+Implement custom storage plugins, such as an Avro plugin, to create Parquet readers/writers for these formats. 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/data-sources/004-json-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md
new file mode 100644
index 0000000..db9e671
--- /dev/null
+++ b/_docs/data-sources/004-json-ref.md
@@ -0,0 +1,432 @@
+
+title: "JSON Data Model"
+parent: "Data Sources"
+---
+Drill supports [JSON (JavaScript Object Notation)](http://www.json.org/), a self-describing data format. The data itself implies its schema and has the following characteristics:
+
+* Language-independent
+* Textual format
+* Loosely defined, weak data typing
+
+Semi-structured JSON data often consists of complex, nested elements having schema-less fields that differ type-wise from row to row. The data can constantly evolve. Applications typically add and remove fields frequently to meet business requirements.
+
+Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. 
+
+Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. 
+
+Drill 0.8 and higher can  query compressed .gz files having JSON as well as uncompressed .json files.<<link to section>>.
+
+n addition to the examples presented later in this section, see "How to Analyze Highly Dynamic Datasets with Apache Drill" (https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill) for information about how to analyze a JSON data set.
+
+## Data Type Mapping
+JSON data consists of the following types:
+
+* Array: ordered values, separated by commas, enclosed in square brackets
+* Boolean: true or false
+* Number: double-precision floating point number, including exponential numbers. No octal, hexadecimal, NaN, or Infinity 
+* null: empty value
+* Object: unordered key/value collection enclosed in curly braces
+* String: Unicode enclosed in double quotation marks
+* Value: a string, number, true, false, null
+* Whitespace: used between tokens
+
+JSON data consists of the following types: 
+
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>JSON Type</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>BOOLEAN</td>
+    <td>Boolean</td>
+    <td>True or false</td>
+  </tr>
+  <tr>
+    <td>BIGINT</td>
+    <td>Numeric</td>
+    <td>Number having no decimal point in JSON, 8-byte signed integer in Drill</td>
+  </tr>
+   <tr>
+    <td>DOUBLE</td>
+    <td>Numeric</td>
+    <td>Number having a decimal point in JSON, 8-byte double precision floating point number in Drill</td>
+  </tr>
+  <tr>
+    <td>VARCHAR</td>
+    <td>String</td>
+    <td>Character string of variable length</td>
+  </tr>
+</table>
+
+JSON does not enforce types or distinguish between integers and floating point values. When reading numerical values from a JSON file, Drill distinguishes integers from floating point numbers by the presence or lack of a decimal point. If some numbers in a JSON map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error.
+
+### Handling Type Differences
+Use all text mode to prevent the schema change error described in the previous section. Set the `store.json.all_text_mode` property to true.
+
+    ALTER SYSTEM SET `store.json.all_text_mode` = true;
+
+When you set this option, Drill reads all data from the JSON files as VARCHAR. After reading the data, use a SELECT statement in Drill to cast data as follows:
+
+* Cast [JSON numeric values](/drill/docs/lession-2-run-queries-with-ansi-sql#return-customer-data-with-appropriate-data-types) to SQL types, such as BIGINT, DECIMAL, FLOAT, INTEGER, and SMALLINT.
+* Cast JSON strings to [Drill Date/Time Data Type Formats](/drill/docs/supported-date-time-data-type-formats).
+
+For example, apply a [Drill view] (link to view reference) to the data. 
+
+Drill uses [map and array data types](/drill/docs/data-types) internally for reading and writing complex and nested data structures from JSON. <<true?>>
+
+## Reading JSON
+To read JSON data using Drill, use a [file system storage plugin](link to plugin section) that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition. 
+
+JSON data is often complex. Data can be deeply nested and semi-structured. but [you can use workarounds ](link to section) covered later.
+
+Drill reads tuples defined in single objects, having no comma between objects. A JSON object is an unordered set of name/value pairs. Curly braces delimit objects in the JSON file:
+
+    { name: "Apples", desc: "Delicious" }
+    { name: "Oranges", desc: "Florida Navel" }
+    
+To read and [analyze complex JSON](link to Analyzing JSON) files, use the FLATTEN and KVGEN functions. Observe the following guidelines when reading JSON files:
+
+* Avoid queries that return objects larger than ??MB (16?).
+  These queries might be far less performant than those that return smaller objects.
+* Avoid queries that return portions of objects beyond the ??MB threshold. (16?)
+  These queries might be far less performant than queries that return ports of objects within the threshold.
+
+
+## Writing JSON
+You can write data from Drill to a JSON file. The following setup is required:
+
+* In the storage plugin definition, include a writable (mutable) workspace. For example:
+
+         {
+         . . .
+            "workspaces": {
+            . . .
+               "myjsonstore": {
+               "location": "/tmp",
+               "writable": true,
+            }
+       
+* Set the output format to JSON. For example:
+
+        ALTER SESSION SET `store.format`='json';
+    
+* Use the path to the workspace location in a CTAS command. for example:
+
+        USE myplugin.myworkspace;
+        CREATE TABLE my_json AS
+        SELECT my column from dfs.`<path_file_name>`;
+
+Drill performs the following actions, as shown in the complete [CTAS command example](/drill/docs/create-table-as-ctas-command):
+   
+* Creates a directory using table name.
+* Writes the JSON data to the directory in the workspace location.
+   
+Observe the following size limitations pertaining to JSON objects:
+
+* Objects must be smaller than the chunk size.
+* Objects must be smaller than ?GB (2?) on 32- and some 64-bit systems.
+* Objects must be smaller than the amount of memory available to Drill.
+
+## Analyzing JSON
+
+Generally, you query JSON files using the following syntax:
+
+* Dot notation to drill down into a JSON map.
+
+        SELECT level1.level2. . . . leveln FROM <storage plugin location>`myfile.json`
+        
+* Use square brackets, array-style notation to drill down into a JSON array.
+
+        SELECT level1.level2[n][2] FROM <storage plugin location>`myfile.json`;
+    
+  The first index position of an array is 0.
+
+Using the following techniques, you can query complex, nested JSON:
+
+* Generate key/value pairs for loosely structured data
+* Flatten nested data 
+
+### Generate Key/Value Pairs
+Use the ‘KVGen’ (Key Value Generator) with complex data that contains arbitrary maps consisting of dynamic and unknown element names, such as ticket_info in the following example:
+
+    {
+      "type": "ticket",
+      "venue": 123455,
+      "sales": {
+        "12-10": 532806,
+        "12-11": 112889,
+        "12-19": 898999,
+        "12-21": 10875
+      }
+    }
+    {
+      "type": "ticket",
+      "venue": 123456,
+      "sales": {
+        "12-10": 87350,
+        "12-15": 972880,
+        "12-19": 49999,
+        "12-21": 857475
+      }
+    }
+    
+
+This query reads the data, and the output shows how Drill restructures it:
+
+    SELECT * FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
+    
+    +------------+------------+------------+
+	|    type    |   venue    |   sales    |
+	+------------+------------+------------+
+	| ticket     | 123455     | {"12-10":532806,"12-11":112889,"12-19":898999,"12-21":10875} |
+	| ticket     | 123456     | {"12-10":87350,"12-19":49999,"12-21":857475,"12-15":972880} |
+	+------------+------------+------------+
+	2 rows selected (0.895 seconds)
+
+`KVGen` turns the dynamic map into an array of key-value pairs where keys represent the dynamic element names.
+
+    SELECT kvgen(sales) Revenue FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
+    
+	+--------------+
+	|   Revenue    |
+	+--------------+
+	| [{"key":"12-10","value":532806},{"key":"12-11","value":112889},{"key":"12-19","value":898999},{"key":"12-21","value":10875}] |
+	| [{"key":"12-10","value":87350},{"key":"12-19","value":49999},{"key":"12-21","value":857475},{"key":"12-15","value":972880}] |
+	+--------------+
+	2 rows selected (0.341 seconds)
+
+### Flatten JSON Data
+
+`Flatten` breaks the list of key-value pairs into separate rows on which you can apply analytic functions. The flatten function takes a JSON array, such as the output from kvgen(sales), as an argument. Using the all (*) wildcard as the argument is not supported and returns an error.
+
+	SELECT flatten(kvgen(sales)) Revenue 
+	FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
+	+--------------+
+	|   Revenue    |
+	+--------------+
+	| {"key":"12-10","value":532806} |
+	| {"key":"12-11","value":112889} |
+	| {"key":"12-19","value":898999} |
+	| {"key":"12-21","value":10875} |
+	| {"key":"12-10","value":87350} |
+	| {"key":"12-19","value":49999} |
+	| {"key":"12-21","value":857475} |
+	| {"key":"12-15","value":972880} |
+	+--------------+
+	8 rows selected (0.171 seconds)
+
+### Example: Aggregate Loosely Structured Data
+Continuing with the previous example, make sure all text mode is set to false to sum numerical values. 
+
+    ALTER SYSTEM SET `store.json.all_text_mode` = false;
+    
+Sum the ticket sales by combining the `sum`, `flatten`, and `kvgen` functions in a single query.
+
+    SELECT sum(tickettbl.tickets.`value`) AS Revenue 
+    FROM (SELECT flatten(kvgen(sales)) tickets 
+    FROM  dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json` ) tickettbl;
+    
+	+------------+
+	|  Revenue   |
+	+------------+
+	| 3523273    |
+	+------------+
+	1 row selected (0.194 seconds)
+
+
+### Example: Aggregate and Sort Data
+Sum the ticket sales for each date in December, and sort by total sales in ascending order.
+
+    SELECT `right`(tickettbl.tickets.key,2) December_Date, 
+    sum(tickettbl.tickets.`value`) Revenue 
+    FROM (select flatten(kvgen(sales)) tickets 
+    FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`) tickettbl
+    GROUP BY `right`(tickettbl.tickets.key,2) 
+    ORDER BY Revenue;
+
+	+---------------+--------------+
+	| December_Date | Revenue      |
+	+---------------+--------------+
+	| 11            | 112889       |
+	| 10            | 620156       |
+	| 21            | 868350       |
+	| 19            | 948998       |
+	| 15            | 972880       |
+	+---------------+--------------+
+	5 rows selected (0.203 seconds)
+
+### Example: Analyze a Map Field in an Array
+To access a map field in an array, use dot notation to drill down through the hierarchy of the JSON data to the field. The following example shows how to drill down to get the MAPBLKLOT property value the [City Lots San Francisco in .json](https://github.com/zemirco/sf-city-lots-json).
+
+![drill query flow]({{ site.baseurl }}/docs/img/json-workaround.png)
+
+        SELECT features[0].properties.MAPBLKLOT,  
+        FROM <storage location>.`citylots.json`;
+          
+        +------------+
+		|   EXPR$0   |
+		+------------+
+		| 0001001    |
+		+------------+
+		1 row selected (0.163 seconds)
+		
+To access the second geometry coordinate of the first city lot in the San Francisco city lots, use dot notation and array indexing notation:
+		
+		SELECT features[0].geometry.coordinates[0][1] 
+		FROM <storage location>.`citylots.json`;
+		+------------+
+		|   EXPR$0   |
+		+------------+
+		| 37.80848009696725 |
+		+------------+
+		1 row selected (0.19 seconds)
+
+More examples of drilling down into an array are shown in ["Selecting Nested Data for a Column"](/drill/docs/query-3-selecting-nested-data-for-a-column). 
+
+### Example: Analyze Map Fields in a Map
+This example uses a WHERE clause to drill down to a third level of the following JSON hierarchy to get the Id and weight of the person whose max_hdl exceeds 160, use dot notation as shown in the query that follows:
+
+    {
+	    "SOURCE": "Allegheny County",
+	    "TIMESTAMP": 1366369334989,
+	    "birth": {
+	        "id": 35731300,
+	        "dur": 215923,
+	        "firstname": "Jane",
+	        "lastname": "Doe",
+	        "weight": "CATEGORY_1",
+	        "bearer": {
+	            "father": "John Doe",
+	            "ss": "208-55-5983",
+	            "max_ldl": 180,
+	            "max_hdl": 200
+	        }
+	    }
+	} . . .
+
+	SELECT tbl.birth.id AS Id, tbl.birth.weight AS Weight 
+	FROM dfs.`/Users/drilluser/drill/vitalstat.json` AS tbl 
+	WHERE tbl.birth.id IN (
+	SELECT tbl1.birth.id 
+	FROM dfs.`/Users/drilluser/drill/vitalstat.json` AS tbl1 
+	WHERE tbl1.birth.bearer.max_hdl > 160); 
+	
+	+------------+------------+
+	|     Id     |   Weight   |
+	+------------+------------+
+	| 35731300   | CATEGORY_1 |
+	+------------+------------+
+	1 row selected (1.424 seconds)
+
+## Querying Compressed JSON
+
+You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed files having the .json extension as described in Reading and Writing JSON Files<<link to section>>. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file.
+
+      "extensions": [
+        "json",
+        "gz"
+      ]
+<<Is this going to be in 0.8?>>
+
+## Limitations and Workarounds
+In most cases, you can use a workaround, presented in the following sections, to overcome the following limitations:
+
+* Array at the root level
+* Complex nested data
+* Empty array
+* Lengthy JSON objects
+* Nested column names
+* Schema changes
+* Selecting all in a JSON directory query 
+
+### Array at the root level
+Drill cannot read an array at the root level, outside an object.
+
+Workaround: Remove square brackets at the root of the object.
+
+### Complex nested data
+Drill cannot read some complex nested arrays unless you use a table qualifier.
+
+Workaround: To query n-level nested data, use table alias to remove ambiguity. The table alias is required; otherwise column names such as user_info are parsed as table names by the SQL parser. The qualifier is not needed for data that is not nested, as shown in the following example:
+
+    {"dev_id": 0,
+	 "date":"07/26/2013",
+	 "time":"04:56:59",
+	 "user_info":
+	   {"user_id":28,
+	    "device":"A306",
+	    "state":"mt"
+	   },
+	   "marketing_info":
+	     {"promo_id":4,
+	      "keywords":  
+	       ["stay","to","think","watch","glasses",
+	         "joining","might","pay","in","your","buy"]
+	     },
+	     "dev_info":
+	       {"prod_id":[16],"purch_flag":"false"
+	       }
+	 }
+	. . .
+
+    SELECT dev_id, `date`, `time`, t.user_info.user_id, t.user_info.device, t.dev_info.prod_id 
+    FROM dfs.`/Users/mypath/example.json` t;
+
+### Empty array
+Drill cannot read an empty array, shown in the following example, and attempting to do so causes an error.
+
+        { "a":[] }
+
+Workaround: Remove empty arrays. 
+
+For example, you cannot query the [City Lots San Francisco in .json](https://github.com/zemirco/sf-city-lots-json) data unless you make the following modification.
+
+![drill query flow]({{ site.baseurl }}/docs/img/json-workaround.png)
+
+After removing the extraneous square brackets in the coordinates array, you can drill down to query all the data for the lots.
+
+### Lengthy JSON objects
+
+Drilling down into lengthy JSON objects, having just a few or a single set of curly braces, requires flattening and generation of keys.
+
+Workaround: 
+
+Separate lengthy objects into many objects delimited by curly braces using the following functions:
+ 
+  * FLATTEN <<link to example>> separates a set of nested JSON objects into individual rows in a DRILL table.
+  * KVGEN <<link to example>> separates objects having more elements than optimal for querying.
+  
+### Nested Column Names 
+
+You cannot use reserved words for nested column names because Drill returns null if you enclose n-level nested column names in back ticks. The previous example encloses the date and time column names in back ticks because the names are reserved words. The enclosure of column names in back ticks works because the date and time columns belong to the first level of the JSON object.
+
+### Schema changes
+Drill cannot read JSON files containing changes in the schema. For example, attempting to query an object having array elements of different data types cause an error:
+
+        . . .
+            "geometry": {
+                 "type": "Polygon",
+                 "coordinates": [
+                   [
+                     -122.42200352825247,
+                     37.80848009696725,
+                     0
+                   ],
+        . . .
+Drill interprets numbers that do not have a decimal point as BigInt values. In this example, Drill recognizes the first two coordinates as doubles and the third coordinate as a BigInt, which causes an error. 
+                
+Workaround: Set the `store.json.all_text_mode` property, described earlier, to true.
+
+    ALTER SYSTEM SET `store.json.all_text_mode` = true;
+
+### Selecting all in a JSON directory query
+Drill currently returns only fields common to all the files in a [directory query](link to basics tutorial) that selects all (SELECT *) JSON files.
+
+Workaround: Query each file individually.
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/dev-custom-fcn/002-dev-aggregate.md
----------------------------------------------------------------------
diff --git a/_docs/dev-custom-fcn/002-dev-aggregate.md b/_docs/dev-custom-fcn/002-dev-aggregate.md
index d1a3cfb..4fd14d7 100644
--- a/_docs/dev-custom-fcn/002-dev-aggregate.md
+++ b/_docs/dev-custom-fcn/002-dev-aggregate.md
@@ -5,7 +5,7 @@ parent: "Develop Custom Functions"
 Create a class within a Java package that implements Drill’s aggregate
 interface into the program. Include the required information for the function.
 Your function must include data types that Drill supports, such as int or
-BigInt. For a list of supported data types, refer to the [SQL Reference](/drill/docs/sql-reference).
+BigInt. For a list of supported data types, refer to the [SQL Reference](/drill/docs/sql-reference/).
 
 Complete the following steps to create an aggregate function:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/img/Untitled.png
----------------------------------------------------------------------
diff --git a/_docs/img/Untitled.png b/_docs/img/Untitled.png
deleted file mode 100644
index 7fea1e8..0000000
Binary files a/_docs/img/Untitled.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/img/json-workaround.png
----------------------------------------------------------------------
diff --git a/_docs/img/json-workaround.png b/_docs/img/json-workaround.png
new file mode 100644
index 0000000..f9f99dd
Binary files /dev/null and b/_docs/img/json-workaround.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/install/001-drill-in-10.md
----------------------------------------------------------------------
diff --git a/_docs/install/001-drill-in-10.md b/_docs/install/001-drill-in-10.md
index 13d2410..eddaf7e 100644
--- a/_docs/install/001-drill-in-10.md
+++ b/_docs/install/001-drill-in-10.md
@@ -89,7 +89,7 @@ commands. SQLLine is used as the shell for Drill. Drill follows the ANSI SQL:
 
 You must have the following software installed on your machine to run Drill:
 
-<table ><tbody><tr><td ><strong>Software</strong></td><td ><strong>Description</strong></td></tr><tr><td ><a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html" class="external-link" rel="nofollow">Oracle JDK version 7</a></td><td >A set of programming tools for developing Java applications.</td></tr></tbody></table>
+<table ><tbody><tr><td ><strong>Software</strong></td><td ><strong>Description</strong></td></tr><tr><td ><a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html" class="external-link" rel="nofollow">Oracle JDK version 7</a></td><td >A set of programming tools for developing Java applications.</td></tr></tbody></table></div>
 
   
 ### Prerequisite Validation

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/interfaces/001-odbc-win.md
----------------------------------------------------------------------
diff --git a/_docs/interfaces/001-odbc-win.md b/_docs/interfaces/001-odbc-win.md
index 2f08af2..86a4167 100644
--- a/_docs/interfaces/001-odbc-win.md
+++ b/_docs/interfaces/001-odbc-win.md
@@ -34,4 +34,5 @@ access data from a Hive table:
 
 The following components provide applications access to Drill data sources:
 
-<table ><tbody><tr><th >Component</th><th >Role</th></tr><tr><td valign="top">Drillbit</td><td valign="top">Accepts queries from clients, executes queries against Drill data sources, and returns the query results. </td></tr><tr><td valign="top">ODBC Data Source Administrator</td><td valign="top">The ODBC Data Source Administrator enables the creation of DSNs to Apache Drill data sources.<br /> In the figure above, the ODBC Data Source Administrator was used to create <code>Hive-DrillDataSources</code>.</td></tr><tr><td valign="top">ODBC DSN</td><td valign="top"><p>Provides applications information about how to connect to the Drill Source.</p>In the figure above, the <code>Hive-DrillDataSources</code> is a DSN that provides connection information to the Hive tables.</td></tr><tr><td colspan="1" valign="top">BI Tool</td><td colspan="1" valign="top"><p>Accesses Drill data sources using the connection information from the ODBC DSN.</p>In the figure above, the BI tool uses <code>Hive-Dri
 llDataSources</code> to access the <code>hive_student</code> table.</td></tr></tbody></table></div>
\ No newline at end of file
+<table ><tbody><tr><th >Component</th><th >Role</th></tr><tr><td valign="top">Drillbit</td><td valign="top">Accepts queries from clients, executes queries against Drill data sources, and returns the query results. </td></tr><tr><td valign="top">ODBC Data Source Administrator</td><td valign="top">The ODBC Data Source Administrator enables the creation of DSNs to Apache Drill data sources.<br /> In the figure above, the ODBC Data Source Administrator was used to create <code>Hive-DrillDataSources</code>.</td></tr><tr><td valign="top">ODBC DSN</td><td valign="top"><p>Provides applications information about how to connect to the Drill Source.</p>In the figure above, the <code>Hive-DrillDataSources</code> is a DSN that provides connection information to the Hive tables.</td></tr><tr><td colspan="1" valign="top">BI Tool</td><td colspan="1" valign="top"><p>Accesses Drill data sources using the connection information from the ODBC DSN.</p>In the figure above, the BI tool uses <code>Hive-Dri
 llDataSources</code> to access the <code>hive_student</code> table.</td></tr></tbody></table></div>
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/interfaces/odbc-win/003-connect-odbc-win.md
----------------------------------------------------------------------
diff --git a/_docs/interfaces/odbc-win/003-connect-odbc-win.md b/_docs/interfaces/odbc-win/003-connect-odbc-win.md
index 0d4cb8a..d60b294 100644
--- a/_docs/interfaces/odbc-win/003-connect-odbc-win.md
+++ b/_docs/interfaces/odbc-win/003-connect-odbc-win.md
@@ -14,7 +14,7 @@ Create Views](/drill/docs/using-drill-explorer-to-browse-data-and-create-views).
 In an ODBC-compliant BI tool, use the ODBC DSN to create an ODBC connection
 with one of the methods applicable to the data source type:
 
-<table ><tbody><tr><th >Data Source Type</th><th>ODBC Connection Method</th></tr><tr><td valign="top">Hive</td><td valign="top">Connect to a table.<br />Connect to the table using custom SQL.<br />Use Drill Explorer to create a view. Then use ODBC to connect to the view as if it were a table.</td></tr><tr><td valign="top">HBase<br /><span style="line-height: 1.4285715;background-color: transparent;">Parquet<br /></span><span style="line-height: 1.4285715;background-color: transparent;">JSON<br /></span><span style="line-height: 1.4285715;background-color: transparent;">CSV<br /></span><span style="line-height: 1.4285715;background-color: transparent;">TSV</span></td><td valign="top">Use Drill Explorer to create a view. Then use ODBC to connect to the view as if it were a table.<br />Connect to the data using custom SQL.</td></tr></tbody></table>
+<table ><tbody><tr><th >Data Source Type</th><th >ODBC Connection Method</th></tr><tr><td valign="top">Hive</td><td valign="top">Connect to a table.<br />Connect to the table using custom SQL.<br />Use Drill Explorer to create a view. Then use ODBC to connect to the view as if it were a table.</td></tr><tr><td valign="top">HBase<br /><span style="line-height: 1.4285715;background-color: transparent;">Parquet<br /></span><span style="line-height: 1.4285715;background-color: transparent;">JSON<br /></span><span style="line-height: 1.4285715;background-color: transparent;">CSV<br /></span><span style="line-height: 1.4285715;background-color: transparent;">TSV</span></td><td valign="top">Use Drill Explorer to create a view. Then use ODBC to connect to the view as if it were a table.<br />Connect to the data using custom SQL.</td></tr></tbody></table>
   
 For examples of how to connect to Drill data sources from a BI tool, see the
 [Step 3. Connect to Drill Data Sources from a BI Tool](/drill/docs/step-3-connect-to-drill-data-sources-from-a-bi-tool).

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/interfaces/odbc-win/004-tableau-examples.md
----------------------------------------------------------------------
diff --git a/_docs/interfaces/odbc-win/004-tableau-examples.md b/_docs/interfaces/odbc-win/004-tableau-examples.md
index d45f3f3..f25f50d 100644
--- a/_docs/interfaces/odbc-win/004-tableau-examples.md
+++ b/_docs/interfaces/odbc-win/004-tableau-examples.md
@@ -30,7 +30,7 @@ In this step, we will create a DSN that accesses a Hive table.
   3. Select **MapR Drill ODBC Driver** and click **Finish**.  
      The _MapR Drill ODBC Driver DSN Setup_ window appears.
   4. Enter a name for the data source.
-  5. Specify the connection type based on your requirements. The connection type provides the DSN access to Drill Data Sources. .  
+  5. Specify the connection type based on your requirements. The connection type provides the DSN access to Drill Data Sources.  
 In this example, we are connecting to a Zookeeper Quorum.
   6. In the **Schema** field, select the Hive schema.
      In this example, the Hive schema is named hive.default.
@@ -124,8 +124,8 @@ HBase table.
         hbase.voter
 
      HBase does not contain type information, so you need to cast the data in Drill
-Explorer. For information about SQL query support, see the [SQL
-Reference] (/drill/docs/sql-reference).
+Explorer. For information about SQL query support, see the SQL
+Reference in the [Apache Drill Wiki documentation](/drill/docs/sql-reference).
   9. To save the view, click **Create As**.
   10. Specify the schema where you want to save the view, enter a name for the view, and click **Save**.