You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/05/27 02:16:13 UTC

[1/4] drill git commit: exhume Basics Tutorial to address user question

Repository: drill
Updated Branches:
  refs/heads/gh-pages 497e61ef2 -> 446d71c24


exhume Basics Tutorial to address user question


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/0fb4e06e
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/0fb4e06e
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/0fb4e06e

Branch: refs/heads/gh-pages
Commit: 0fb4e06e8dba7d52c29ccd6ca13c2db7dc786fbb
Parents: 497e61e
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Mon May 25 12:21:22 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Tue May 26 17:14:28 2015 -0700

----------------------------------------------------------------------
 .../030-querying-plain-text-files.md            | 188 ++++++++++++++++++-
 .../040-querying-directories.md                 |  34 ++++
 .../030-date-time-functions-and-arithmetic.md   |   2 +-
 _docs/tutorials/020-drill-in-10-minutes.md      |   2 +-
 4 files changed, 219 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/0fb4e06e/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
index 8924835..ab73c57 100644
--- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
+++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
@@ -2,16 +2,20 @@
 title: "Querying Plain Text Files"
 parent: "Querying a File System"
 ---
-You can use Drill to access both structured file types and plain text files
-(flat files). This section shows a few simple examples that work on flat
-files:
+You can use Drill to access structured file types and plain text files
+(flat files), such as the following file types:
 
   * CSV files (comma-separated values)
   * TSV files (tab-separated values)
   * PSV files (pipe-separated values)
 
-The examples here show CSV files, but queries against TSV and PSV files return
-equivalent results. However, make sure that your registered storage plugins
+Follow these general guidelines for querying a plain text file:
+
+  * Use a storage plugin that defines the file format, such as comma-separated (CSV) or tab-separated values (TSV), of the data in the plain text file.
+  * In the SELECT statement, use the `COLUMNS[n]` syntax in lieu of column names, which do not exist in a plain text file. The first column is column `0`.
+  * In the FROM clause, use the path to the plain text file instead of using a table name. Enclose the path and file name in back ticks. 
+
+Make sure that your registered storage plugins
 recognize the appropriate file types and extensions. For example, the
 following configuration expects PSV files (files with a pipe delimiter) to
 have a `tbl` extension, not a `psv` extension. Drill returns a "file not
@@ -117,3 +121,177 @@ example:
 Note that the restriction with the use of aliases applies to queries against
 all data sources.
 
+## Example of Querying a TSV File
+
+This example uses a tab-separated value (TSV) file that you download from a
+Google internet site. The data in the file consists of phrases from books that
+Google scans and generates for its [Google Books Ngram
+Viewer](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html). You
+use the data to find the relative frequencies of Ngrams. 
+
+### About the Data
+
+Each line in the TSV file has the following structure:
+
+`ngram TAB year TAB match_count TAB volume_count NEWLINE`
+
+For example, lines 1722089 and 1722090 in the file contain this data:
+
+<table ><tbody><tr><th >ngram</th><th >year</th><th colspan="1" >match_count</th><th >volume_count</th></tr><tr><td ><p class="p1">Zoological Journal of the Linnean</p></td><td >2007</td><td colspan="1" >284</td><td >101</td></tr><tr><td colspan="1" ><p class="p1">Zoological Journal of the Linnean</p></td><td colspan="1" >2008</td><td colspan="1" >257</td><td colspan="1" >87</td></tr></tbody></table> 
+  
+In 2007, "Zoological Journal of the Linnean" occurred 284 times overall in 101
+distinct books of the Google sample.
+
+### Download and Set Up the Data
+
+After downloading the file, you use the `dfs` storage plugin, and then select
+data from the file as you would a table. In the SELECT statement, enclose the
+path and name of the file in back ticks.
+
+  1. Download the compressed Google Ngram data from this location:  
+    
+     http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-zo.gz
+
+  2. Unzip the file.  
+     A file named googlebooks-eng-all-5gram-20120701-zo appears.
+
+  3. Change the file name to add a `.tsv` extension.  
+The Drill `dfs` storage plugin definition includes a TSV format that requires
+a file to have this extension.
+
+### Query the Data
+
+Get data about "Zoological Journal of the Linnean" that appears more than 250
+times a year in the books that Google scans.
+
+  1. Switch back to using the `dfs` storage plugin.
+  
+          USE dfs;
+
+  2. Issue a SELECT statement to get the first three columns in the file.  
+     * In the FROM clause of the example, substitute your path to the TSV file.  
+     * Use aliases to replace the column headers, such as EXPR$0, with user-friendly column headers, Ngram, Publication Date, and Frequency.
+     * In the WHERE clause, enclose the string literal "Zoological Journal of the Linnean" in single quotation marks.  
+     * Limit the output to 10 rows.  
+  
+         SELECT COLUMNS[0] AS Ngram,
+                COLUMNS[1] AS Publication_Date,
+                COLUMNS[2] AS Frequency
+         FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
+         WHERE ((columns[0] = 'Zoological Journal of the Linnean')
+             AND (columns[2] > 250)) LIMIT 10;
+
+     The output is:
+
+         +------------------------------------+-------------------+------------+
+         |               Ngram                | Publication_Date  | Frequency  |
+         +------------------------------------+-------------------+------------+
+         | Zoological Journal of the Linnean  | 1993              | 297        |
+         | Zoological Journal of the Linnean  | 1997              | 255        |
+         | Zoological Journal of the Linnean  | 2003              | 254        |
+         | Zoological Journal of the Linnean  | 2007              | 284        |
+         | Zoological Journal of the Linnean  | 2008              | 257        |
+         +------------------------------------+-------------------+------------+
+         5 rows selected (1.175 seconds)
+
+The Drill default storage plugins support common file formats. If you need
+support for some other file format, such as GZ, create a custom storage plugin. You can also create a storage plugin to simplify querying file having long path names. A workspace name replaces the long path name.
+
+
+## Create a Storage Plugin
+
+This example covers how to create and use a storage plugin to simplify queries or to query a file type that `dfs` does not specify, GZ in this case. First, you create the storage plugin in the Drill Web UI. Next, you connect to the
+file through the plugin to query a file.
+
+You can create a storage plugin using the Apache Drill Web UI to query the GZ file containing the compressed TSV data directly.
+
+  1. Create an `ngram` directory on your file system.
+  2. Copy the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram` directory.
+  3. Open the Drill Web UI by navigating to <http://localhost:8047/storage>.   
+     To open the Drill Web UI, the [Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/) must still be running.
+  4. In New Storage Plugin, type `myplugin`.  
+     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png)    
+  5. Click **Create**.  
+     The Configuration screen appears.
+  6. Replace null with the following storage plugin definition, except on the location line, use the path to your `ngram` directory instead of the drilluser's path and give your workspace an arbitrary name, for example, ngram:
+  
+        {
+          "type": "file",
+          "enabled": true,
+          "connection": "file:///",
+          "workspaces": {
+            "ngram": {
+              "location": "/Users/drilluser/ngram",
+              "writable": false,
+              "defaultInputFormat": null
+           }
+         },
+         "formats": {
+           "tsv": {
+             "type": "text",
+             "extensions": [
+               "gz"
+             ],
+             "delimiter": "\t"
+            }
+          }
+        }
+
+  7. Click **Create**.  
+     The success message appears briefly.
+  8. Click **Back**.  
+     The new plugin appears in Enabled Storage Plugins.  
+     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) 
+  9. Go back to the Drill shell, and list the storage plugins.  
+          SHOW DATABASES;
+
+          +---------------------+
+          |     SCHEMA_NAME     |
+          +---------------------+
+          | INFORMATION_SCHEMA  |
+          | cp.default          |
+          | dfs.default         |
+          | dfs.root            |
+          | dfs.tmp             |
+          | myplugin.default    |
+          | myplugin.ngram      |
+          | sys                 |
+          +---------------------+
+          8 rows selected (0.105 seconds)
+
+Your custom plugin appears in the list and has two workspaces: the `ngram`
+workspace that you defined and a default workspace.
+
+### Connect to and Query a File
+
+When querying the same data source repeatedly, avoiding long path names is
+important. This exercise demonstrates how to simplify the query. Instead of
+using the full path to the Ngram file, you use dot notation in the FROM
+clause.
+
+``<workspace name>.`<location>```
+
+This syntax assumes you connected to a storage plugin that defines the
+location of the data. To query the data source while you are _not_ connected to
+that storage plugin, include the plugin name:
+
+``<plugin name>.<workspace name>.`<location>```
+
+This exercise shows how to query Ngram data when you are connected to `myplugin`.
+
+  1. Connect to the ngram file through the custom storage plugin.  
+     `USE myplugin;`
+  2. Get data about "Zoological Journal of the Linnean" that appears more than 250 times a year in the books that Google scans. In the FROM clause, instead of using the full path to the file as you did in the last exercise, connect to the data using the storage plugin workspace name ngram.
+  
+         SELECT COLUMNS[0], 
+                COLUMNS[1], 
+                COLUMNS[2] 
+         FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.gz` 
+         WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
+          AND (columns[2] > 250)) 
+         LIMIT 10;
+
+     The five rows of output appear.  
+
+To continue with this example and query multiple files in a directory, see the section, ["Example of Querying Multiple Files in a Directory"]({{site.baseurl}}/docs/querying-directories/#example-of-querying-multiple-files-in-a-directory).
+

http://git-wip-us.apache.org/repos/asf/drill/blob/0fb4e06e/_docs/query-data/query-a-file-system/040-querying-directories.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/040-querying-directories.md b/_docs/query-data/query-a-file-system/040-querying-directories.md
index 1a55b75..4a5b4ae 100644
--- a/_docs/query-data/query-a-file-system/040-querying-directories.md
+++ b/_docs/query-data/query-a-file-system/040-querying-directories.md
@@ -89,4 +89,38 @@ first level down from logs, `dir1` to the next level, and so on.
     +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
     10 rows selected (0.583 seconds)
 
+## Example of Querying Multiple Files in a Directory
+
+This example is a continuation of the example in the section, ["Example of Querying a TSV File"]({{site.baseurl}}/docs/querying-plain-text-files/#example-of-querying-a-tsv-file) that creates a subdirectory in the `ngram` directory and [custom plugin workspace]({{site.baseurl}}/docs/querying-plain-text-files/#create-a-storage-plugin) you created earlier.
+
+You download a second Ngram file. Next, you
+move both Ngram GZ files you downloaded to the `ngram` subdirectory. Finally, using the custom
+plugin workspace, you query both files. In the FROM clause, simply reference
+the subdirectory.
+
+  1. Download a second file of compressed Google Ngram data from this location: 
+  
+     http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-2gram-20120701-ze.gz
+  2. Move `googlebooks-eng-all-2gram-20120701-ze.gz` to the `ngram/myfiles` subdirectory. 
+  3. Move the 5gram file you downloaded earlier `googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram/myfiles` subdirectory.
+  4. In the Drill shell, use the `myplugin.ngrams` workspace. 
+   
+          USE myplugin.ngram;
+  5. Query the myfiles directory for the "Zoological Journal of the Linnean" or "zero temperatures" in books published in 1998.
+  
+          SELECT * 
+          FROM myfiles 
+          WHERE (((COLUMNS[0] = 'Zoological Journal of the Linnean')
+            OR (COLUMNS[0] = 'zero temperatures')) 
+            AND (COLUMNS[1] = '1998'));
+The output lists ngrams from both files.
+
+          +----------------------------------------------------------+
+          |                         columns                          |
+          +----------------------------------------------------------+
+          | ["Zoological Journal of the Linnean","1998","157","53"]  |
+          | ["zero temperatures","1998","628","487"]                 |
+          +----------------------------------------------------------+
+          2 rows selected (7.007 seconds)
+
 For more information about querying directories, see the section, ["Query Directory Functions"]({{site.baseurl}}/docs/query-directory-functions).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/0fb4e06e/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md b/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md
index 23b0983..a6df716 100644
--- a/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md
+++ b/_docs/sql-reference/sql-functions/030-date-time-functions-and-arithmetic.md
@@ -46,7 +46,7 @@ Find the interval between midnight today, April 3, 2015, and June 13, 1957.
     +------------+
     1 row selected (0.064 seconds)
 
-Find the interval between midnight today, May 21, 2015, and hire dates of employees 578 and 761 in the employees.json file included with the Drill installation.
+Find the interval between midnight today, May 21, 2015, and hire dates of employees 578 and 761 in the `employees.json` file included with the Drill installation.
 
     SELECT AGE(CAST(hire_date AS TIMESTAMP)) FROM cp.`employee.json` where employee_id IN( '578','761');
     +------------------+

http://git-wip-us.apache.org/repos/asf/drill/blob/0fb4e06e/_docs/tutorials/020-drill-in-10-minutes.md
----------------------------------------------------------------------
diff --git a/_docs/tutorials/020-drill-in-10-minutes.md b/_docs/tutorials/020-drill-in-10-minutes.md
index 6584021..cf21743 100755
--- a/_docs/tutorials/020-drill-in-10-minutes.md
+++ b/_docs/tutorials/020-drill-in-10-minutes.md
@@ -45,7 +45,7 @@ Complete the following steps to install Drill:
 
 1. In a terminal windows, change to the directory where you want to install Drill.
 
-2. To download the latest version of Apache Drill, download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz)or run one of the following commands, depending on which you have installed on your system:
+2. To download the latest version of Apache Drill, download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz) or run one of the following commands, depending on which you have installed on your system:
 
    * `wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`  
    *  `curl -o apache-drill-1.0.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`  


[4/4] drill git commit: DRILL-3169 multiple dir

Posted by br...@apache.org.
DRILL-3169 multiple dir


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/446d71c2
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/446d71c2
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/446d71c2

Branch: refs/heads/gh-pages
Commit: 446d71c242edf6ed6e65924e1b4089677540f151
Parents: fac8fd4
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Tue May 26 16:48:37 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Tue May 26 17:14:30 2015 -0700

----------------------------------------------------------------------
 .../030-querying-plain-text-files.md            | 95 ++------------------
 .../040-querying-directories.md                 | 45 ++--------
 2 files changed, 12 insertions(+), 128 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/446d71c2/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
index aeb3543..f79f2b9 100644
--- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
+++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
@@ -194,104 +194,23 @@ times a year in the books that Google scans.
          +------------------------------------+-------------------+------------+
          5 rows selected (1.175 seconds)
 
-The Drill default storage plugins support common file formats. If you need
-support for some other file format, such as GZ, create a custom storage plugin. You can also create a storage plugin to simplify querying files having long path names. A workspace name replaces the long path name.
+The Drill default storage plugins support common file formats. 
 
 
-## Create a Storage Plugin
+## Query the GZ File Directly
 
-This example covers how to create and use a storage plugin to simplify queries or to query a file type that `dfs` does not specify, GZ in this case. First, you create the storage plugin in the Drill Web UI. Next, you connect to the
-file through the plugin to query a file.
+This example covers how to query the GZ file containing the compressed TSV data. The GZ file name needs to be renamed to specify the type of delimited file, such as CSV or TSV. You add `.tsv` before the `.gz` extension in this example.
 
-You can create a storage plugin using the Apache Drill Web UI to query the GZ file containing the compressed TSV data.
-
-  1. Create an `ngram` directory on your file system.
-  2. Copy the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram` directory.
-  3. Open the Drill Web UI by navigating to <http://localhost:8047/storage>.   
-     To open the Drill Web UI, the [Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/) must still be running.
-  4. In New Storage Plugin, type `myplugin`.  
-     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png)    
-  5. Click **Create**.  
-     The Configuration screen appears.
-  6. Replace null with the following storage plugin definition, except on the location line, use the *full* path to your `ngram` directory instead of the drilluser's path and give your workspace an arbitrary name, for example, ngram:
-  
-        {
-          "type": "file",
-          "enabled": true,
-          "connection": "file:///",
-          "workspaces": {
-            "ngram": {
-              "location": "/Users/drilluser/ngram",
-              "writable": false,
-              "defaultInputFormat": null
-           }
-         },
-         "formats": {
-           "tsv": {
-             "type": "text",
-             "extensions": [
-               "gz"
-             ],
-             "delimiter": "\t"
-            }
-          }
-        }
-
-  7. Click **Create**.  
-     The success message appears briefly.
-  8. Click **Back**.  
-     The new plugin appears in Enabled Storage Plugins.  
-     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) 
-  9. Go back to the Drill shell, and list the storage plugins.  
-          SHOW DATABASES;
-
-          +---------------------+
-          |     SCHEMA_NAME     |
-          +---------------------+
-          | INFORMATION_SCHEMA  |
-          | cp.default          |
-          | dfs.default         |
-          | dfs.root            |
-          | dfs.tmp             |
-          | myplugin.default    |
-          | myplugin.ngram      |
-          | sys                 |
-          +---------------------+
-          8 rows selected (0.105 seconds)
-
-Your custom plugin appears in the list and has two workspaces: the `ngram`
-workspace that you defined and a default workspace.
-
-### Connect to and Query a File
-
-When querying the same data source repeatedly, avoiding long path names is
-important. This exercise demonstrates how to simplify the query. Instead of
-using the full path to the Ngram file, you use dot notation in the FROM
-clause.
-
-``<workspace name>.`<location>```
-
-This syntax assumes you connected to a storage plugin that defines the
-location of the data. To query the data source while you are _not_ connected to
-that storage plugin, include the plugin name:
-
-``<plugin name>.<workspace name>.`<location>```
-
-This exercise shows how to query Ngram data when you are connected to `myplugin`.
-
-  1. Connect to the ngram file through the custom storage plugin.  
-     `USE myplugin;`
-  2. Get data about "Zoological Journal of the Linnean" that appears more than 250 times a year in the books that Google scans. In the FROM clause, instead of using the full path to the file as you did in the last exercise, connect to the data using the storage plugin workspace name ngram.
+  1. Rename the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to googlebooks-eng-all-5gram-20120701-zo.tsv.gz.
+  2. Query the renamed GZ file directly to get data about "Zoological Journal of the Linnean" that appears more than 250 times a year in the books that Google scans. In the FROM clause, instead of using the full path to the file as you did in the last exercise, connect to the data using the storage plugin workspace name ngram.
   
          SELECT COLUMNS[0], 
                 COLUMNS[1], 
                 COLUMNS[2] 
-         FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.gz` 
+         FROM dfs.`/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv.gz` 
          WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
          AND (columns[2] > 250)) 
          LIMIT 10;
 
-     The five rows of output appear.  
-
-To continue with this example and query multiple files in a directory, see the section, ["Example of Querying Multiple Files in a Directory"]({{site.baseurl}}/docs/querying-directories/#example-of-querying-multiple-files-in-a-directory).
+     The 5 rows of output appear.  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/446d71c2/_docs/query-data/query-a-file-system/040-querying-directories.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/040-querying-directories.md b/_docs/query-data/query-a-file-system/040-querying-directories.md
index 4a5b4ae..88b5b40 100644
--- a/_docs/query-data/query-a-file-system/040-querying-directories.md
+++ b/_docs/query-data/query-a-file-system/040-querying-directories.md
@@ -13,8 +13,8 @@ same structure: `plays.csv` and `moreplays.csv`. The first file contains 7
 records and the second file contains 3 records. The following query returns
 the "union" of the two files, ordered by the first column:
 
-    0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
-    from dfs.`/Users/brumsby/drill/testdata` order by 1;
+    0: jdbc:drill:zk=local> SELECT COLUMNS[0] AS `Year`, COLUMNS[1] AS Play 
+    FROM dfs.`/Users/brumsby/drill/testdata` order by 1;
  
     +------------+------------------------+
     |    Year    |          Play          |
@@ -49,7 +49,7 @@ You can query all of these files, or a subset, by referencing the file system
 once in a Drill query. For example, the following query counts the number of
 records in all of the files inside the `2013` directory:
 
-    0: jdbc:drill:> select count(*) from MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ;
+    0: jdbc:drill:> SELECT COUNT(*) FROM MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ;
     +------------+
     |   EXPR$0   |
     +------------+
@@ -64,7 +64,7 @@ subdirectories: `2012`, `2013`, and `2014`. The following query constrains
 files inside the subdirectory named `2013`. The variable `dir0` refers to the
 first level down from logs, `dir1` to the next level, and so on.
 
-    0: jdbc:drill:> use bob.logdata;
+    0: jdbc:drill:> USE bob.logdata;
     +------------+-----------------------------------------+
     |     ok     |              summary                    |
     +------------+-----------------------------------------+
@@ -72,7 +72,7 @@ first level down from logs, `dir1` to the next level, and so on.
     +------------+-----------------------------------------+
     1 row selected (0.305 seconds)
  
-    0: jdbc:drill:> select * from logs where dir0='2013' limit 10;
+    0: jdbc:drill:> SELECT * FROM logs WHERE dir0='2013' LIMIT 10;
     +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
     |    dir0    |    dir1    |  trans_id  |    date    |    time    |  cust_id   |   device   |   state    |  camp_id   |  keywords   |
     +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
@@ -89,38 +89,3 @@ first level down from logs, `dir1` to the next level, and so on.
     +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
     10 rows selected (0.583 seconds)
 
-## Example of Querying Multiple Files in a Directory
-
-This example is a continuation of the example in the section, ["Example of Querying a TSV File"]({{site.baseurl}}/docs/querying-plain-text-files/#example-of-querying-a-tsv-file) that creates a subdirectory in the `ngram` directory and [custom plugin workspace]({{site.baseurl}}/docs/querying-plain-text-files/#create-a-storage-plugin) you created earlier.
-
-You download a second Ngram file. Next, you
-move both Ngram GZ files you downloaded to the `ngram` subdirectory. Finally, using the custom
-plugin workspace, you query both files. In the FROM clause, simply reference
-the subdirectory.
-
-  1. Download a second file of compressed Google Ngram data from this location: 
-  
-     http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-2gram-20120701-ze.gz
-  2. Move `googlebooks-eng-all-2gram-20120701-ze.gz` to the `ngram/myfiles` subdirectory. 
-  3. Move the 5gram file you downloaded earlier `googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram/myfiles` subdirectory.
-  4. In the Drill shell, use the `myplugin.ngrams` workspace. 
-   
-          USE myplugin.ngram;
-  5. Query the myfiles directory for the "Zoological Journal of the Linnean" or "zero temperatures" in books published in 1998.
-  
-          SELECT * 
-          FROM myfiles 
-          WHERE (((COLUMNS[0] = 'Zoological Journal of the Linnean')
-            OR (COLUMNS[0] = 'zero temperatures')) 
-            AND (COLUMNS[1] = '1998'));
-The output lists ngrams from both files.
-
-          +----------------------------------------------------------+
-          |                         columns                          |
-          +----------------------------------------------------------+
-          | ["Zoological Journal of the Linnean","1998","157","53"]  |
-          | ["zero temperatures","1998","628","487"]                 |
-          +----------------------------------------------------------+
-          2 rows selected (7.007 seconds)
-
-For more information about querying directories, see the section, ["Query Directory Functions"]({{site.baseurl}}/docs/query-directory-functions).
\ No newline at end of file


[2/4] drill git commit: DRILL-3179

Posted by br...@apache.org.
DRILL-3179


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/9ab4954d
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/9ab4954d
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/9ab4954d

Branch: refs/heads/gh-pages
Commit: 9ab4954d2e55df4c15d3c2c3768197210a7f25b5
Parents: 0fb4e06
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Tue May 26 14:39:52 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Tue May 26 17:14:29 2015 -0700

----------------------------------------------------------------------
 .../020-configuring-drill-memory.md             |   3 +
 .../060-configuring-a-shared-drillbit.md        |   3 +
 .../010-configuration-options-introduction.md   |   9 +-
 .../035-plugin-configuration-introduction.md    |   4 +-
 .../050-json-data-model.md                      | 106 ++++++++-----------
 .../030-querying-plain-text-files.md            |  22 ++--
 6 files changed, 64 insertions(+), 83 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md b/_docs/configure-drill/020-configuring-drill-memory.md
index 5948bcc..30d5121 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -38,3 +38,6 @@ The `drill-env.sh` file contains the following options:
 * Xmx specifies the maximum memory allocation pool for a Java Virtual Machine (JVM). 
 * Xms specifies the initial memory allocation pool.
 
+If performance is an issue, replace the -ea flag with -Dbounds=false, as shown in the following example:
+
+    export DRILL_JAVA_OPTS="-Xms1G -Xmx$DRILL_MAX_HEAP -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -Dbounds=false"
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/configure-drill/060-configuring-a-shared-drillbit.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/060-configuring-a-shared-drillbit.md b/_docs/configure-drill/060-configuring-a-shared-drillbit.md
index 1070586..2d31cef 100644
--- a/_docs/configure-drill/060-configuring-a-shared-drillbit.md
+++ b/_docs/configure-drill/060-configuring-a-shared-drillbit.md
@@ -10,6 +10,9 @@ Set [options in sys.options]({{site.baseurl}}/docs/configuration-options-introdu
 
 * exec.queue.large  
 * exec.queue.small  
+* exec.queue.threshold
+
+The exec.queue.threshold sets the cost threshold for determining whether query is large or small based on complexity. Complex queries have higher thresholds. The default, 30,000,000, represents the estimated rows that a query will process. To serialize incoming queries, set the small queue at 0 and the threshold at 0.
 
 For more information, see the section, ["Performance Tuning"](/docs/performance-tuning-introduction/).
 

http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
index bdd19f3..524ff67 100644
--- a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
+++ b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
@@ -24,10 +24,10 @@ The sys.options table lists the following options that you can set as a system o
 | exec.java_compiler_janino_maxsize              | 262144           | See the exec.java_compiler option comment. Accepts inputs of type LONG.                                                                                                                                                                                                                                                                                          |
 | exec.max_hash_table_size                       | 1073741824       | Ending size for hash tables. Range: 0 - 1073741824.                                                                                                                                                                                                                                                                                                              |
 | exec.min_hash_table_size                       | 65536            | Starting size for hash tables. Increase according to available memory to improve performance. Increasing for very large aggregations or joins when you have large amounts of memory for Drill to use. Range: 0 - 1073741824.                                                                                                                                     |
-| exec.queue.enable                              | FALSE            | Changes the state of query queues to control the number of queries that run simultaneously.                                                                                                                                                                                                                                                                      |
+| exec.queue.enable                              | FALSE            | Changes the state of query queues. False allows unlimited concurrent queries.                                                                                                                                                                                                                                                                                    |
 | exec.queue.large                               | 10               | Sets the number of large queries that can run concurrently in the cluster. Range: 0-1000                                                                                                                                                                                                                                                                         |
 | exec.queue.small                               | 100              | Sets the number of small queries that can run concurrently in the cluster. Range: 0-1001                                                                                                                                                                                                                                                                         |
-| exec.queue.threshold                           | 30000000         | Sets the cost threshold, which depends on the complexity of the queries in queue, for determining whether query is large or small. Complex queries have higher thresholds. Range: 0-9223372036854775807                                                                                                                                                          |
+| exec.queue.threshold                           | 30000000         | Sets the cost threshold for determining whether query is large or small based on complexity. Complex queries have higher thresholds. By default, an estimated 30,000,000 rows will be processed by a query. Range: 0-9223372036854775807                                                                                                                         |
 | exec.queue.timeout_millis                      | 300000           | Indicates how long a query can wait in queue before the query fails. Range: 0-9223372036854775807                                                                                                                                                                                                                                                                |
 | exec.schedule.assignment.old                   | FALSE            | Used to prevent query failure when no work units are assigned to a minor fragment, particularly when the number of files is much larger than the number of leaf fragments.                                                                                                                                                                                       |
 | exec.storage.enable_new_text_reader            | TRUE             | Enables the text reader that complies with the RFC 4180 standard for text/csv files.                                                                                                                                                                                                                                                                             |
@@ -80,7 +80,4 @@ The sys.options table lists the following options that you can set as a system o
 | store.parquet.enable_dictionary_encoding       | FALSE            | For internal use. Do not change.                                                                                                                                                                                                                                                                                                                                 |
 | store.parquet.use_new_reader                   | FALSE            | Not supported in this release.                                                                                                                                                                                                                                                                                                                                   |
 | store.text.estimated_row_size_bytes            | 100              | Estimate of the row size in a delimited text file, such as csv. The closer to actual, the better the query plan. Used for all csv files in the system/session where the value is set. Impacts the decision to plan a broadcast join or not.                                                                                                                      |
-| window.enable                                  | FALSE            | Not supported in this release. Coming soon.                                                                                                                                                                                                                                                                                                                      |
-
-
-
+| window.enable                                  | FALSE            | Not supported in this release. Coming soon.                                                                                                                                                                                                                                                                                                                      |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/connect-a-data-source/035-plugin-configuration-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/035-plugin-configuration-introduction.md b/_docs/connect-a-data-source/035-plugin-configuration-introduction.md
index 7c850ae..c6bcbf8 100644
--- a/_docs/connect-a-data-source/035-plugin-configuration-introduction.md
+++ b/_docs/connect-a-data-source/035-plugin-configuration-introduction.md
@@ -58,9 +58,9 @@ The following table describes the attributes you configure for storage plugins.
   </tr>
   <tr>
     <td>"workspaces". . . "location"</td>
-    <td>"location": "/"<br>"location": "/tmp"</td>
+    <td>"location": "/Users/johndoe/mydata"<br>"location": "/tmp"</td>
     <td>no</td>
-    <td>Path to a directory on the file system.</td>
+    <td>Full path to a directory on the file system.</td>
   </tr>
   <tr>
     <td>"workspaces". . . "writable"</td>

http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/data-sources-and-file-formats/050-json-data-model.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/050-json-data-model.md b/_docs/data-sources-and-file-formats/050-json-data-model.md
index 70ccf94..1b1660d 100644
--- a/_docs/data-sources-and-file-formats/050-json-data-model.md
+++ b/_docs/data-sources-and-file-formats/050-json-data-model.md
@@ -126,7 +126,9 @@ Using the following techniques, you can query complex, nested JSON:
 * Generate key/value pairs for loosely structured data
 
 ## Example: Flatten and Generate Key Values for Complex JSON
-This example uses the following data that represents unit sales of tickets to events that were sold over a period of for several days in different states:
+This example uses the following data that represents unit sales of tickets to events that were sold over a period of for several days in December:
+
+### ticket_sales.json Contents
 
     {
       "type": "ticket",
@@ -151,56 +153,32 @@ This example uses the following data that represents unit sales of tickets to ev
     
 Take a look at the data in Drill:
 
-    SELECT * FROM dfs.`/Users/drilluser/ticket_sales.json`;
-    +------------+------------+------------+------------+------------+
-    |    type    |  channel   |   month    |    day     |   sales    |
-    +------------+------------+------------+------------+------------+
-    | ticket     | 123455     | 12         | ["15","25","28","31"] | {"NY":"532806","PA":"112889","TX":"898999","UT":"10875"} |
-    | ticket     | 123456     | 12         | ["10","15","19","31"] | {"NY":"972880","PA":"857475","CA":"87350","OR":"49999"} |
-    +------------+------------+------------+------------+------------+
-    2 rows selected (0.041 seconds)
-
-### Flatten Arrays
-The FLATTEN function breaks the following _day arrays from the JSON example file shown earlier into separate rows.
-
-    "_day": [ 15, 25, 28, 31 ] 
-    "_day": [ 10, 15, 19, 31 ]
-
-Flatten the sales column of the ticket data onto separate rows, one row for each day in the array, for a better view of the data. FLATTEN copies the sales data related in the JSON object on each row.  Using the all (*) wildcard as the argument to flatten is not supported and returns an error.
-
-    SELECT flatten(tkt._day) AS `day`, tkt.sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
-
-    +------------+------------+
-    |    day     |   sales    |
-    +------------+------------+
-    | 15         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
-    | 25         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
-    | 28         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
-    | 31         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
-    | 10         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
-    | 15         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
-    | 19         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
-    | 31         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
-    +------------+------------+
-    8 rows selected (0.072 seconds)
+    +---------+---------+---------------------------------------------------------------+
+    |  type   |  venue  |                             sales                             |
+    +---------+---------+---------------------------------------------------------------+
+    | ticket  | 123455  | {"12-10":532806,"12-11":112889,"12-19":898999,"12-21":10875}  |
+    | ticket  | 123456  | {"12-10":87350,"12-19":49999,"12-21":857475,"12-15":972880}   |
+    +---------+---------+---------------------------------------------------------------+
+    2 rows selected (1.343 seconds)
+
 
 ### Generate Key/Value Pairs
-Use the KVGEN (Key Value Generator) function to generate key/value pairs from complex data. Generating key/value pairs is often helpful when working with data that contains arbitrary maps consisting of dynamic and unknown element names, such as the ticket sales data by state. For example purposes, take a look at how kvgen breaks the sales data into keys and values representing the states and number of tickets sold:
+Continuing with the data from [previous example]({{site.baseurl}}/docs/json-data-model/#example:-flatten-and-generate-key-values-for-complex-json), use the KVGEN (Key Value Generator) function to generate key/value pairs from complex data. Generating key/value pairs is often helpful when working with data that contains arbitrary maps consisting of dynamic and unknown element names, such as the ticket sales data in this example. For example purposes, take a look at how kvgen breaks the sales data into keys and values representing the key dates and number of tickets sold:
 
-    SELECT KVGEN(tkt.sales) AS state_sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
-    +-------------+
-    | state_sales |
-    +-------------+
-    | [{"key":"NY","value":532806},{"key":"PA","value":112889},{"key":"TX","value":898999},{"key":"UT","value":10875}] |
-    | [{"key":"NY","value":972880},{"key":"PA","value":857475},{"key":"CA","value":87350},{"key":"OR","value":49999}] |
-    +-------------+
-    2 rows selected (0.039 seconds)
+    SELECT KVGEN(tkt.sales) AS `key dates:tickets sold` FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
+    +---------------------------------------------------------------------------------------------------------------------------------------+
+    |                                                        key dates:tickets sold                                                         |
+    +---------------------------------------------------------------------------------------------------------------------------------------+
+    | [{"key":"12-10","value":"532806"},{"key":"12-11","value":"112889"},{"key":"12-19","value":"898999"},{"key":"12-21","value":"10875"}] |
+    | [{"key":"12-10","value":"87350"},{"key":"12-19","value":"49999"},{"key":"12-21","value":"857475"},{"key":"12-15","value":"972880"}] |
+    +---------------------------------------------------------------------------------------------------------------------------------------+
+    2 rows selected (0.106 seconds)
 
 KVGEN allows queries against maps where the keys themselves represent data rather than a schema, as shown in the next example.
 
 ### Flatten JSON Data
 
-FLATTEN breaks the list of key-value pairs into separate rows on which you can apply analytic functions. FLATTEN takes a JSON array, such as the output from kvgen(sales), as an argument. Using the all (*) wildcard as the argument is not supported and returns an error.
+FLATTEN breaks the list of key-value pairs into separate rows on which you can apply analytic functions. FLATTEN takes a JSON array, such as the output from kvgen(sales), as an argument. Using the all (*) wildcard as the argument is not supported and returns an error. The following example continues using data from the [previous example]({{site.baseurl}}/docs/json-data-model/#example:-flatten-and-generate-key-values-for-complex-json):
 
     SELECT FLATTEN(kvgen(sales)) Sales 
     FROM dfs.`/Users/drilluser/drill/ticket_sales.json`;
@@ -220,41 +198,41 @@ FLATTEN breaks the list of key-value pairs into separate rows on which you can a
     8 rows selected (0.171 seconds)
 
 ### Example: Aggregate Loosely Structured Data
-Use flatten and kvgen together to aggregate the data. Continuing with the previous example, make sure all text mode is set to false to sum numbers. Drill returns an error if you attempt to sum data in all text mode. 
+Use flatten and kvgen together to aggregate the data from the [previous example]({{site.baseurl}}/docs/json-data-model/#example:-flatten-and-generate-key-values-for-complex-json). Make sure all text mode is set to false to sum numbers. Drill returns an error if you attempt to sum data in all text mode. 
 
     ALTER SYSTEM SET `store.json.all_text_mode` = false;
     
 Sum the ticket sales by combining the `SUM`, `FLATTEN`, and `KVGEN` functions in a single query.
 
-    SELECT SUM(tkt.tot_sales.`value`) AS TotalSales FROM (SELECT flatten(kvgen(sales)) tot_sales FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt;
+    SELECT SUM(tkt.tot_sales.`value`) AS TicketSold FROM (SELECT flatten(kvgen(sales)) tot_sales FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt;
 
-    +------------+
-    | TotalSales |
-    +------------+
-    | 3523273    |
-    +------------+
-    1 row selected (0.081 seconds)
+    +--------------+
+    | TicketsSold  |
+    +--------------+
+    | 3523273.0    |
+    +--------------+
+    1 row selected (0.244 seconds)
 
 ### Example: Aggregate and Sort Data
-Sum the ticket sales by state and group by state and sort in ascending order. 
+Sum the ticket sales by state and group by day and sort in ascending order. 
 
-    SELECT `right`(tkt.tot_sales.key,2) State, 
+    SELECT `right`(tkt.tot_sales.key,2) `December Date`, 
     SUM(tkt.tot_sales.`value`) AS TotalSales 
-    FROM (SELECT flatten(kvgen(sales)) tot_sales 
+    FROM (SELECT FLATTEN(kvgen(sales)) tot_sales 
     FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt 
     GROUP BY `right`(tkt.tot_sales.key,2) 
     ORDER BY TotalSales;
 
-    +---------------+--------------+
-    | December_Date |  TotalSales  |
-    +---------------+--------------+
-    | 11            | 112889       |
-    | 10            | 620156       |
-    | 21            | 868350       |
-    | 19            | 948998       |
-    | 15            | 972880       |
-    +---------------+--------------+
-    5 rows selected (0.203 seconds)
+    +----------------+-------------+
+    | December Date  | TotalSales  |
+    +----------------+-------------+
+    | 11             | 112889.0    |
+    | 10             | 620156.0    |
+    | 21             | 868350.0    |
+    | 19             | 948998.0    |
+    | 15             | 972880.0    |
+    +----------------+-------------+
+    5 rows selected (0.252 seconds)
 
 ### Example: Access a Map Field in an Array
 To access a map field in an array, use dot notation to drill down through the hierarchy of the JSON data to the field. Examples are based on the following [City Lots San Francisco in .json](https://github.com/zemirco/sf-city-lots-json), modified slightly as described in the empty array workaround in ["Limitations and Workarounds."]({{ site.baseurl }}/docs/json-data-model#empty-array)

http://git-wip-us.apache.org/repos/asf/drill/blob/9ab4954d/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
index ab73c57..aeb3543 100644
--- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
+++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
@@ -157,7 +157,7 @@ path and name of the file in back ticks.
 
   3. Change the file name to add a `.tsv` extension.  
 The Drill `dfs` storage plugin definition includes a TSV format that requires
-a file to have this extension.
+a file to have this extension. Later, you learn how to skip this step and query the GZ file directly.
 
 ### Query the Data
 
@@ -174,12 +174,12 @@ times a year in the books that Google scans.
      * In the WHERE clause, enclose the string literal "Zoological Journal of the Linnean" in single quotation marks.  
      * Limit the output to 10 rows.  
   
-         SELECT COLUMNS[0] AS Ngram,
-                COLUMNS[1] AS Publication_Date,
-                COLUMNS[2] AS Frequency
-         FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
-         WHERE ((columns[0] = 'Zoological Journal of the Linnean')
-             AND (columns[2] > 250)) LIMIT 10;
+            SELECT COLUMNS[0] AS Ngram,
+                   COLUMNS[1] AS Publication_Date,
+                   COLUMNS[2] AS Frequency
+            FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
+            WHERE ((columns[0] = 'Zoological Journal of the Linnean')
+            AND (columns[2] > 250)) LIMIT 10;
 
      The output is:
 
@@ -195,7 +195,7 @@ times a year in the books that Google scans.
          5 rows selected (1.175 seconds)
 
 The Drill default storage plugins support common file formats. If you need
-support for some other file format, such as GZ, create a custom storage plugin. You can also create a storage plugin to simplify querying file having long path names. A workspace name replaces the long path name.
+support for some other file format, such as GZ, create a custom storage plugin. You can also create a storage plugin to simplify querying files having long path names. A workspace name replaces the long path name.
 
 
 ## Create a Storage Plugin
@@ -203,7 +203,7 @@ support for some other file format, such as GZ, create a custom storage plugin.
 This example covers how to create and use a storage plugin to simplify queries or to query a file type that `dfs` does not specify, GZ in this case. First, you create the storage plugin in the Drill Web UI. Next, you connect to the
 file through the plugin to query a file.
 
-You can create a storage plugin using the Apache Drill Web UI to query the GZ file containing the compressed TSV data directly.
+You can create a storage plugin using the Apache Drill Web UI to query the GZ file containing the compressed TSV data.
 
   1. Create an `ngram` directory on your file system.
   2. Copy the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram` directory.
@@ -213,7 +213,7 @@ You can create a storage plugin using the Apache Drill Web UI to query the GZ fi
      ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png)    
   5. Click **Create**.  
      The Configuration screen appears.
-  6. Replace null with the following storage plugin definition, except on the location line, use the path to your `ngram` directory instead of the drilluser's path and give your workspace an arbitrary name, for example, ngram:
+  6. Replace null with the following storage plugin definition, except on the location line, use the *full* path to your `ngram` directory instead of the drilluser's path and give your workspace an arbitrary name, for example, ngram:
   
         {
           "type": "file",
@@ -288,7 +288,7 @@ This exercise shows how to query Ngram data when you are connected to `myplugin`
                 COLUMNS[2] 
          FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.gz` 
          WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
-          AND (columns[2] > 250)) 
+         AND (columns[2] > 250)) 
          LIMIT 10;
 
      The five rows of output appear.  


[3/4] drill git commit: DRILL-3134

Posted by br...@apache.org.
DRILL-3134


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/fac8fd41
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/fac8fd41
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/fac8fd41

Branch: refs/heads/gh-pages
Commit: fac8fd415636c5abc96f1a081f5b529ed1cb4343
Parents: 9ab4954
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Tue May 26 14:57:48 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Tue May 26 17:14:29 2015 -0700

----------------------------------------------------------------------
 _docs/sql-reference/data-types/010-supported-data-types.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/fac8fd41/_docs/sql-reference/data-types/010-supported-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/data-types/010-supported-data-types.md b/_docs/sql-reference/data-types/010-supported-data-types.md
index 45a4366..48397d0 100644
--- a/_docs/sql-reference/data-types/010-supported-data-types.md
+++ b/_docs/sql-reference/data-types/010-supported-data-types.md
@@ -52,11 +52,11 @@ Drill uses map and array data types internally for reading complex and nested da
 
 `a[1]`  
 
-You can refer to the value for a key in a map using this syntax:
+You can refer to the value for a key in a map using dot notation:
 
-`m['k']`
+`t.m.k`
 
-The section [“Query Complex Data”]({{ site.baseurl }}/docs/querying-complex-data-introduction) shows how to use [composite types]({{site.baseurl}}/docs/supported-data-types/#composite-types) to access nested arrays. ["Handling Different Data Types"]({{ site.baseurl }}/docs/handling-different-data-types/#handling-json-and-parquet-data) includes examples of JSON maps and arrays. Drill provides functions for handling array and map types:
+The section [“Query Complex Data”]({{ site.baseurl }}/docs/querying-complex-data-introduction) shows how to use composite types to access nested arrays. ["Handling Different Data Types"]({{ site.baseurl }}/docs/handling-different-data-types/#handling-json-and-parquet-data) includes examples of JSON maps and arrays. Drill provides functions for handling array and map types:
 
 * ["KVGEN"]({{site.baseurl}}/docs/kvgen/)
 * ["FLATTEN"]({{site.baseurl}}/docs/flatten/)