You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/03/17 22:08:22 UTC
svn commit: r1667403 [3/5] - in /drill/site/trunk/content/drill: ./ docs/ docs/apache-drill-contribution-guidelines/ docs/apache-drill-in-10-minutes/ docs/connect-to-a-data-source/ docs/data-sources-and-file-formats/ docs/data-type-casting/ docs/date-t...

Added: drill/site/trunk/content/drill/docs/json-data-model/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/json-data-model/index.html?rev=1667403&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/json-data-model/index.html (added)
+++ drill/site/trunk/content/drill/docs/json-data-model/index.html Tue Mar 17 21:08:21 2015
@@ -0,0 +1,596 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>JSON Data Model - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>JSON Data Model</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>Drill supports <a href="http://www.json.org/">JSON (JavaScript Object Notation)</a>, a self-describing data format. The data itself implies its schema and has the following characteristics:</p>
+
+<ul>
+<li>Language-independent</li>
+<li>Textual format</li>
+<li>Loosely defined, weak data typing</li>
+</ul>
+
+<p>Semi-structured JSON data often consists of complex, nested elements having schema-less fields that differ type-wise from row to row. The data can constantly evolve. Applications typically add and remove fields frequently to meet business requirements.</p>
+
+<p>Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. </p>
+
+<p>Drill 0.8 and higher can <a href="/docs/drill-default-input-format#querying-compressed-json">query compressed .gz files</a> having JSON as well as uncompressed .json files. </p>
+
+<p>In addition to the examples presented later in this section, see <a href="https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill">&quot;How to Analyze Highly Dynamic Datasets with Apache Drill&quot;</a> for information about how to analyze a JSON data set.</p>
+
+<h2 id="data-type-mapping">Data Type Mapping</h2>
+
+<p>JSON data consists of the following types:</p>
+
+<ul>
+<li>Array: ordered values, separated by commas, enclosed in square brackets</li>
+<li>Boolean: true or false</li>
+<li>Number: double-precision floating point number, including exponential numbers. No octal, hexadecimal, NaN, or Infinity </li>
+<li>null: empty value</li>
+<li>Object: unordered key/value collection enclosed in curly braces</li>
+<li>String: Unicode enclosed in double quotation marks</li>
+<li>Value: a string, number, true, false, null</li>
+<li>Whitespace: used between tokens</li>
+</ul>
+
+<p>The following table shows SQL-JSON data type mapping, assuming you use the default <code>all_text_mode</code> option setting, false: </p>
+
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>JSON Type</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>boolean</td>
+    <td>Boolean</td>
+    <td>True or false</td>
+  </tr>
+  <tr>
+    <td>bigint</td>
+    <td>Numeric</td>
+    <td>Number having no decimal point in JSON, 8-byte signed integer in Drill</td>
+  </tr>
+   <tr>
+    <td>double</td>
+    <td>Numeric</td>
+    <td>Number having a decimal point in JSON, 8-byte double precision floating point number in Drill</td>
+  </tr>
+  <tr>
+    <td>varchar</td>
+    <td>String</td>
+    <td>Character string of variable length</td>
+  </tr>
+</table>
+
+<p>JSON does not enforce types or distinguish between integers and floating point values. When reading numerical values from a JSON file, Drill distinguishes integers from floating point numbers by the presence or lack of a decimal point. If some numbers in a JSON map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error.</p>
+
+<h3 id="handling-type-differences">Handling Type Differences</h3>
+
+<p>Use the all text mode to prevent the schema change error described in the previous section. Set the <code>store.json.all_text_mode</code> property to true.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER SYSTEM SET `store.json.all_text_mode` = true;
+</code></pre></div>
+<p>When you set this option, Drill reads all data from the JSON files as VARCHAR. After reading the data, use a SELECT statement in Drill to cast data as follows:</p>
+
+<ul>
+<li>Cast JSON numeric values to <a href="/docs/data-types">SQL types</a>, such as BIGINT, DECIMAL, FLOAT, INTEGER, and SMALLINT.</li>
+<li>Cast JSON strings to <a href="/docs/supported-date-time-data-type-formats">Drill Date/Time Data Type Formats</a>.</li>
+</ul>
+
+<p>Drill uses <a href="/docs/data-types">map and array data types</a> internally for reading complex and nested data structures from JSON. You can cast data in a map or array of data to return a value from the structure, as shown in <a href="/docs/lession-2-run-queries-with-ansi-sql">âCreate a view on a MapR-DB tableâ</a>. âQuery Complex Dataâ shows how to access nested arrays.</p>
+
+<h2 id="reading-json">Reading JSON</h2>
+
+<p>To read JSON data using Drill, use a <a href="/docs/connect-to-a-data-source">file system storage plugin</a> that defines the JSON format. You can use the <code>dfs</code> storage plugin, which includes the definition. </p>
+
+<p>JSON data is often complex. Data can be deeply nested and semi-structured. but <a href="/docs/json-data-model#limitations-and-workaroumds">you can use workarounds </a> covered later.</p>
+
+<p>Drill reads tuples defined in single objects, having no comma between objects. A JSON object is an unordered set of name/value pairs. Curly braces delimit objects in the JSON file:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{ name: &quot;Apples&quot;, desc: &quot;Delicious&quot; }
+{ name: &quot;Oranges&quot;, desc: &quot;Florida Navel&quot; }
+</code></pre></div>
+<p>To read and <a href="/docs/json-data-model#analyzing-json">analyze complex JSON</a> files, use the FLATTEN and KVGEN functions. </p>
+
+<h2 id="writing-json">Writing JSON</h2>
+
+<p>You can write data from Drill to a JSON file. The following setup is required:</p>
+
+<ul>
+<li><p>In the storage plugin definition, include a writable (mutable) workspace. For example:</p>
+
+<p>{
+. . .
+  &quot;workspaces&quot;: {
+. . .
+    &quot;myjsonstore&quot;: {
+      &quot;location&quot;: &quot;/tmp&quot;,
+      &quot;writable&quot;: true,
+    }
+. . .</p></li>
+<li><p>Set the output format to JSON. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER SESSION SET `store.format`=&#39;json&#39;;
+</code></pre></div></li>
+<li><p>Use the path to the workspace location in a CTAS command. for example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">USE myplugin.myworkspace;
+CREATE TABLE my_json AS
+SELECT my column from dfs.`&lt;path_file_name&gt;`;
+</code></pre></div></li>
+</ul>
+
+<p>Drill performs the following actions, as shown in the complete <a href="/docs/create-table-as-ctas-command">CTAS command example</a>:</p>
+
+<ul>
+<li>Creates a directory using table name.</li>
+<li>Writes the JSON data to the directory in the workspace location.</li>
+</ul>
+
+<h2 id="analyzing-json">Analyzing JSON</h2>
+
+<p>Generally, you query JSON files using the following syntax, which includes a table alias. The alias is typically required for querying complex data:</p>
+
+<ul>
+<li><p>Dot notation to drill down into a JSON map.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT t.level1.level2. . . . leveln FROM &lt;storage plugin location&gt;`myfile.json` t
+</code></pre></div></li>
+<li><p>Use square brackets, array-style notation to drill down into a JSON array.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT t.level1.level2[n][2] FROM &lt;storage plugin location&gt;`myfile.json` t;
+</code></pre></div></li>
+</ul>
+
+<p>The first index position of an array is 0.</p>
+
+<p>Drill returns null when a document does not have the specified map or level.</p>
+
+<p>Using the following techniques, you can query complex, nested JSON:</p>
+
+<ul>
+<li>Flatten nested data </li>
+<li>Generate key/value pairs for loosely structured data</li>
+</ul>
+
+<h2 id="example:-flatten-and-generate-key-values-for-complex-json">Example: Flatten and Generate Key Values for Complex JSON</h2>
+
+<p>This example uses the following data that represents unit sales of tickets to events that were sold over a period of for several days in different states:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;type&quot;: &quot;ticket&quot;,
+  &quot;venue&quot;: 123455,
+  &quot;sales&quot;: {
+    &quot;12-10&quot;: 532806,
+    &quot;12-11&quot;: 112889,
+    &quot;12-19&quot;: 898999,
+    &quot;12-21&quot;: 10875
+  }
+}
+{
+  &quot;type&quot;: &quot;ticket&quot;,
+  &quot;venue&quot;: 123456,
+  &quot;sales&quot;: {
+    &quot;12-10&quot;: 87350,
+    &quot;12-15&quot;: 972880,
+    &quot;12-19&quot;: 49999,
+    &quot;12-21&quot;: 857475
+  }
+}
+</code></pre></div>
+<p>Take a look at the data in Drill:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM dfs.`/Users/drilluser/ticket_sales.json`;
++------------+------------+------------+------------+------------+
+|    type    |  channel   |   month    |    day     |   sales    |
++------------+------------+------------+------------+------------+
+| ticket     | 123455     | 12         | [&quot;15&quot;,&quot;25&quot;,&quot;28&quot;,&quot;31&quot;] | {&quot;NY&quot;:&quot;532806&quot;,&quot;PA&quot;:&quot;112889&quot;,&quot;TX&quot;:&quot;898999&quot;,&quot;UT&quot;:&quot;10875&quot;} |
+| ticket     | 123456     | 12         | [&quot;10&quot;,&quot;15&quot;,&quot;19&quot;,&quot;31&quot;] | {&quot;NY&quot;:&quot;972880&quot;,&quot;PA&quot;:&quot;857475&quot;,&quot;CA&quot;:&quot;87350&quot;,&quot;OR&quot;:&quot;49999&quot;} |
++------------+------------+------------+------------+------------+
+2 rows selected (0.041 seconds)
+</code></pre></div>
+<h3 id="flatten-arrays">Flatten Arrays</h3>
+
+<p>The flatten function breaks the following _day arrays from the JSON example file shown earlier into separate rows.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">&quot;_day&quot;: [ 15, 25, 28, 31 ] 
+&quot;_day&quot;: [ 10, 15, 19, 31 ]
+</code></pre></div>
+<p>Flatten the sales column of the ticket data onto separate rows, one row for each day in the array, for a better view of the data. Flatten copies the sales data related in the JSON object on each row.  Using the all (*) wildcard as the argument to flatten is not supported and returns an error.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT flatten(tkt._day) AS `day`, tkt.sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
+
++------------+------------+
+|    day     |   sales    |
++------------+------------+
+| 15         | {&quot;NY&quot;:532806,&quot;PA&quot;:112889,&quot;TX&quot;:898999,&quot;UT&quot;:10875} |
+| 25         | {&quot;NY&quot;:532806,&quot;PA&quot;:112889,&quot;TX&quot;:898999,&quot;UT&quot;:10875} |
+| 28         | {&quot;NY&quot;:532806,&quot;PA&quot;:112889,&quot;TX&quot;:898999,&quot;UT&quot;:10875} |
+| 31         | {&quot;NY&quot;:532806,&quot;PA&quot;:112889,&quot;TX&quot;:898999,&quot;UT&quot;:10875} |
+| 10         | {&quot;NY&quot;:972880,&quot;PA&quot;:857475,&quot;CA&quot;:87350,&quot;OR&quot;:49999} |
+| 15         | {&quot;NY&quot;:972880,&quot;PA&quot;:857475,&quot;CA&quot;:87350,&quot;OR&quot;:49999} |
+| 19         | {&quot;NY&quot;:972880,&quot;PA&quot;:857475,&quot;CA&quot;:87350,&quot;OR&quot;:49999} |
+| 31         | {&quot;NY&quot;:972880,&quot;PA&quot;:857475,&quot;CA&quot;:87350,&quot;OR&quot;:49999} |
++------------+------------+
+8 rows selected (0.072 seconds)
+</code></pre></div>
+<h3 id="generate-key/value-pairs">Generate Key/Value Pairs</h3>
+
+<p>Use the kvgen (Key Value Generator) function to generate key/value pairs from complex data. Generating key/value pairs is often helpful when working with data that contains arbitrary maps consisting of dynamic and unknown element names, such as the ticket sales data by state. For example purposes, take a look at how kvgen breaks the sales data into keys and values representing the states and number of tickets sold:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT kvgen(tkt.sales) AS state_sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
++-------------+
+| state_sales |
++-------------+
+| [{&quot;key&quot;:&quot;NY&quot;,&quot;value&quot;:532806},{&quot;key&quot;:&quot;PA&quot;,&quot;value&quot;:112889},{&quot;key&quot;:&quot;TX&quot;,&quot;value&quot;:898999},{&quot;key&quot;:&quot;UT&quot;,&quot;value&quot;:10875}] |
+| [{&quot;key&quot;:&quot;NY&quot;,&quot;value&quot;:972880},{&quot;key&quot;:&quot;PA&quot;,&quot;value&quot;:857475},{&quot;key&quot;:&quot;CA&quot;,&quot;value&quot;:87350},{&quot;key&quot;:&quot;OR&quot;,&quot;value&quot;:49999}] |
++-------------+
+2 rows selected (0.039 seconds)
+</code></pre></div>
+<p>The purpose of using kvgen function is to allow queries against maps where the keys themselves represent data rather than a schema, as shown in the next example.</p>
+
+<h3 id="flatten-json-data">Flatten JSON Data</h3>
+
+<p><code>Flatten</code> breaks the list of key-value pairs into separate rows on which you can apply analytic functions. The flatten function takes a JSON array, such as the output from kvgen(sales), as an argument. Using the all (*) wildcard as the argument is not supported and returns an error.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT flatten(kvgen(sales)) Sales 
+FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
+
++--------------------------------+
+|           Sales                |
++--------------------------------+
+| {&quot;key&quot;:&quot;12-10&quot;,&quot;value&quot;:532806} |
+| {&quot;key&quot;:&quot;12-11&quot;,&quot;value&quot;:112889} |
+| {&quot;key&quot;:&quot;12-19&quot;,&quot;value&quot;:898999} |
+| {&quot;key&quot;:&quot;12-21&quot;,&quot;value&quot;:10875}  |
+| {&quot;key&quot;:&quot;12-10&quot;,&quot;value&quot;:87350}  |
+| {&quot;key&quot;:&quot;12-19&quot;,&quot;value&quot;:49999}  |
+| {&quot;key&quot;:&quot;12-21&quot;,&quot;value&quot;:857475} |
+| {&quot;key&quot;:&quot;12-15&quot;,&quot;value&quot;:972880} |
++--------------------------------+
+8 rows selected (0.171 seconds)
+</code></pre></div>
+<h3 id="example:-aggregate-loosely-structured-data">Example: Aggregate Loosely Structured Data</h3>
+
+<p>Use flatten and kvgen together to aggregate the data. Continuing with the previous example, make sure all text mode is set to false to sum numerical values. Drill returns an error if you attempt to sum data in in all text mode. </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER SYSTEM SET `store.json.all_text_mode` = false;
+</code></pre></div>
+<p>Sum the ticket sales by combining the <code>sum</code>, <code>flatten</code>, and <code>kvgen</code> functions in a single query.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT SUM(tkt.tot_sales.`value`) AS TotalSales FROM (SELECT flatten(kvgen(sales)) tot_sales FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt;
+
++------------+
+| TotalSales |
++------------+
+| 3523273    |
++------------+
+1 row selected (0.081 seconds)
+</code></pre></div>
+<h3 id="example:-aggregate-and-sort-data">Example: Aggregate and Sort Data</h3>
+
+<p>Sum the ticket sales by state and group by state and sort in ascending order. </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT `right`(tkt.tot_sales.key,2) State, 
+SUM(tkt.tot_sales.`value`) AS TotalSales 
+FROM (SELECT flatten(kvgen(sales)) tot_sales 
+FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt 
+GROUP BY `right`(tkt.tot_sales.key,2) 
+ORDER BY TotalSales;
+
++---------------+--------------+
+| December_Date |  TotalSales  |
++---------------+--------------+
+| 11            | 112889       |
+| 10            | 620156       |
+| 21            | 868350       |
+| 19            | 948998       |
+| 15            | 972880       |
++---------------+--------------+
+5 rows selected (0.203 seconds)
+</code></pre></div>
+<h3 id="example:-access-a-map-field-in-an-array">Example: Access a Map Field in an Array</h3>
+
+<p>To access a map field in an array, use dot notation to drill down through the hierarchy of the JSON data to the field. Examples are based on the following <a href="https://github.com/zemirco/sf-city-lots-json">City Lots San Francisco in .json</a>, modified slightly as described in the empty array workaround in <a href="/docs/json-data-model#empty-array">&quot;Limitations and Workarounds.&quot;</a></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;type&quot;: &quot;FeatureCollection&quot;,
+  &quot;features&quot;: [
+  { 
+    &quot;type&quot;: &quot;Feature&quot;, 
+    &quot;properties&quot;: 
+    { 
+      &quot;MAPBLKLOT&quot;: &quot;0001001&quot;, 
+      &quot;BLKLOT&quot;: &quot;0001001&quot;, 
+      &quot;BLOCK_NUM&quot;: &quot;0001&quot;, 
+      &quot;LOT_NUM&quot;: &quot;001&quot;, 
+      &quot;FROM_ST&quot;: &quot;0&quot;, 
+      &quot;TO_ST&quot;: &quot;0&quot;, 
+      &quot;STREET&quot;: &quot;UNKNOWN&quot;, 
+      &quot;ST_TYPE&quot;: null, 
+      &quot;ODD_EVEN&quot;: &quot;E&quot; }, 
+      &quot;geometry&quot;: 
+    { 
+        &quot;type&quot;: &quot;Polygon&quot;, 
+        &quot;coordinates&quot;: 
+        [ [ 
+        [ -122.422003528252475, 37.808480096967251, 0.0 ], 
+        [ -122.422076013325281, 37.808835019815085, 0.0 ], 
+        [ -122.421102174348633, 37.808803534992904, 0.0 ], 
+        [ -122.421062569067274, 37.808601056818148, 0.0 ], 
+        [ -122.422003528252475, 37.808480096967251, 0.0 ] 
+        ] ] 
+    }
+  },
+. . .
+</code></pre></div>
+<p>This example shows how to drill down using array notation plus dot notation in features[0].properties.MAPBLKLOT to get the MAPBLKLOT property value in the San Francisco city lots data:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/Users/drilluser/citylots.json`;
+
++------------+
+|   EXPR$0   |
++------------+
+| 0001001    |
++------------+
+1 row selected (0.163 seconds)
+</code></pre></div>
+<p>To access the second geometry coordinate of the first city lot in the San Francisco city lots, use array indexing notation for the coordinates as well as the features:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT features[0].geometry.coordinates[0][1] 
+FROM dfs.`/Users/drilluser/citylots.json`;
++-------------------+
+|      EXPR$0       |
++-------------------+
+| 37.80848009696725 |
++-------------------+
+1 row selected (0.19 seconds)
+</code></pre></div>
+<p>More examples of drilling down into an array are shown in <a href="/docs/query-3-selecting-nested-data-for-a-column">&quot;Selecting Nested Data for a Column&quot;</a>. </p>
+
+<h3 id="example:-flatten-an-array-of-maps-using-a-subquery">Example: Flatten an Array of Maps using a Subquery</h3>
+
+<p>By flattening the following JSON file, which contains an array of maps, you can evaluate the records of the flattened data. </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;name&quot;:&quot;classic&quot;,&quot;fillings&quot;:[ {&quot;name&quot;:&quot;sugar&quot;,&quot;cal&quot;:500} , {&quot;name&quot;:&quot;flour&quot;,&quot;cal&quot;:300} ] }
+
+SELECT flat.fill FROM (SELECT flatten(t.fillings) AS fill FROM dfs.flatten.`test.json` t) flat WHERE flat.fill.cal  &gt; 300;
+
++----------------------------+
+|           fill             |
++----------------------------+
+| {&quot;name&quot;:&quot;sugar&quot;,&quot;cal&quot;:500} |
++----------------------------+
+1 row selected (0.421 seconds)
+</code></pre></div>
+<p>Use a table alias for column fields and functions when working with complex data sets. Currently, you must use a subquery when operating on a flattened column. Eliminating the subquery and table alias in the WHERE clause, for example <code>flat.fillings[0].cal &gt; 300</code>, does not evaluate all records of the flattened data against the predicate and produces the wrong results.</p>
+
+<h3 id="example:-access-map-fields-in-a-map">Example: Access Map Fields in a Map</h3>
+
+<p>This example uses a WHERE clause to drill down to a third level of the following JSON hierarchy to get the max_hdl greater than 160:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;SOURCE&quot;: &quot;Allegheny County&quot;,
+  &quot;TIMESTAMP&quot;: 1366369334989,
+  &quot;birth&quot;: {
+    &quot;id&quot;: 35731300,
+    &quot;firstname&quot;: &quot;Jane&quot;,
+    &quot;lastname&quot;: &quot;Doe&quot;,
+    &quot;weight&quot;: &quot;CATEGORY_1&quot;,
+    &quot;bearer&quot;: {
+      &quot;father&quot;: &quot;John Doe&quot;,
+      &quot;ss&quot;: &quot;208-55-5983&quot;,
+      &quot;max_ldl&quot;: 180,
+      &quot;max_hdl&quot;: 200
+    }
+  }
+}
+{
+  &quot;SOURCE&quot;: &quot;Marin County&quot;,
+  &quot;TIMESTAMP&quot;: 1366369334,
+    &quot;birth&quot;: {
+      &quot;id&quot;: 35731309,
+      &quot;firstname&quot;: &quot;Somporn&quot;,
+      &quot;lastname&quot;: &quot;Thongnopneua&quot;,
+      &quot;weight&quot;: &quot;CATEGORY_2&quot;,
+      &quot;bearer&quot;: {
+        &quot;father&quot;: &quot;Jeiranan Thongnopneua&quot;,
+        &quot;ss&quot;: &quot;208-25-2223&quot;,
+        &quot;max_ldl&quot;: 110,
+        &quot;max_hdl&quot;: 150
+    }
+  }
+}
+</code></pre></div>
+<p>Use dot notation, for example <code>t.birth.lastname</code> and <code>t.birth.bearer.max_hdl</code> to drill down to the nested level:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT t.birth.lastname AS Name, t.birth.weight AS Weight 
+FROM dfs.`Users/drilluser/vitalstat.json` t 
+WHERE t.birth.bearer.max_hdl &lt; 160;
+
++----------------+------------+
+|    Name        |   Weight   |
++----------------+------------+
+| Thongneoupeanu | CATEGORY_2 |
++----------------+------------+
+1 row selected (0.142 seconds)
+</code></pre></div>
+<h2 id="limitations-and-workarounds">Limitations and Workarounds</h2>
+
+<p>In most cases, you can use a workaround, presented in the following sections, to overcome the following limitations:</p>
+
+<ul>
+<li>Array at the root level</li>
+<li>Complex nested data</li>
+<li>Empty array</li>
+<li>Lengthy JSON objects</li>
+<li>Complex JSON objects</li>
+<li>Nested column names</li>
+<li>Schema changes</li>
+<li>Selecting all in a JSON directory query </li>
+</ul>
+
+<h3 id="array-at-the-root-level">Array at the root level</h3>
+
+<p>Drill cannot read an array at the root level, outside an object.</p>
+
+<p>Workaround: Remove square brackets at the root of the object, as shown in the following example.</p>
+
+<p><img src="/docs/img/datasources-json-bracket.png" alt="drill query flow"></p>
+
+<h3 id="complex-nested-data">Complex nested data</h3>
+
+<p>Drill cannot read some complex nested arrays unless you use a table alias.</p>
+
+<p>Workaround: To query n-level nested data, use the table alias to remove ambiguity; otherwise, column names such as user_info are parsed as table names by the SQL parser. The alias is not needed for data that is not nested, as shown in the following example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;dev_id&quot;: 0,
+  &quot;date&quot;:&quot;07/26/2013&quot;,
+  &quot;time&quot;:&quot;04:56:59&quot;,
+  &quot;user_info&quot;:
+    {&quot;user_id&quot;:28,
+     &quot;device&quot;:&quot;A306&quot;,
+     &quot;state&quot;:&quot;mt&quot;
+    },
+    &quot;marketing_info&quot;:
+      {&quot;promo_id&quot;:4,
+       &quot;keywords&quot;:  
+        [&quot;stay&quot;,&quot;to&quot;,&quot;think&quot;,&quot;watch&quot;,&quot;glasses&quot;,
+         &quot;joining&quot;,&quot;might&quot;,&quot;pay&quot;,&quot;in&quot;,&quot;your&quot;,&quot;buy&quot;]
+      },
+      &quot;dev_info&quot;:
+        {&quot;prod_id&quot;:[16],&quot;purch_flag&quot;:&quot;false&quot;
+        }
+}
+. . .
+
+SELECT dev_id, `date`, `time`, t.user_info.user_id, t.user_info.device, t.dev_info.prod_id 
+FROM dfs.`/Users/mypath/example.json` t;
+</code></pre></div>
+<h3 id="empty-array">Empty array</h3>
+
+<p>Drill cannot read an empty array, shown in the following example, and attempting to do so causes an error.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">    { &quot;a&quot;:[] }
+</code></pre></div>
+<p>Workaround: Remove empty arrays. </p>
+
+<p>For example, you cannot query the <a href="https://github.com/zemirco/sf-city-lots-json">City Lots San Francisco in .json</a> data unless you make the following modification.</p>
+
+<p><img src="/docs/img/json-workaround.png" alt="drill query flow"></p>
+
+<p>After removing the extraneous square brackets in the coordinates array, you can drill down to query all the data for the lots.</p>
+
+<h3 id="lengthy-json-objects">Lengthy JSON objects</h3>
+
+<p>TBD statement about limits.</p>
+
+<h3 id="complex-json-objects">Complex JSON objects</h3>
+
+<p>Complex arrays and maps can be difficult or impossible to query.</p>
+
+<p>Workaround: Separate lengthy objects into objects delimited by curly braces using the following functions:</p>
+
+<p><a href="/docs/json-data-model#flatten-json-data">flatten</a> separates a set of nested JSON objects into individual rows in a DRILL table.
+<a href="/docs/json-data-model#generate-key-value-pairs">kvgen</a> separates objects having more elements than optimal for querying.</p>
+
+<h3 id="nested-column-names">Nested Column Names</h3>
+
+<p>You cannot use reserved words for nested column names because Drill returns null if you enclose n-level nested column names in back ticks. The previous example encloses the date and time column names in back ticks because the names are reserved words. The enclosure of column names in back ticks works because the date and time columns belong to the first level of the JSON object.</p>
+
+<p>For example, the following object contains the reserved word key, which you need to rename to <code>_key</code> or something other than non-reserved word:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;type&quot;: &quot;ticket&quot;,
+  &quot;channel&quot;: 123455,
+  &quot;_month&quot;: 12,
+  &quot;_day&quot;: [ 15, 25, 28, 31 ],
+  &quot;sales&quot;: {
+    &quot;NY&quot;: 532806,
+    &quot;PA&quot;: 112889,
+    &quot;TX&quot;: 898999,
+    &quot;UT&quot;: 10875
+    &quot;key&quot;: [ 78946, 39107, 76311 ]
+  }
+}
+</code></pre></div>
+<h3 id="schema-changes">Schema changes</h3>
+
+<p>Drill cannot read JSON files containing changes in the schema. For example, attempting to query an object having array elements of different data types cause an error:</p>
+
+<p><img src="/docs/img/data-sources-schemachg.png" alt="drill query flow"></p>
+
+<p>Drill interprets numbers that do not have a decimal point as BigInt values. In this example, Drill recognizes the first two coordinates as doubles and the third coordinate as a BigInt, which causes an error. </p>
+
+<p>Workaround: Set the <code>store.json.all_text_mode</code> property, described earlier, to true.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER SYSTEM SET `store.json.all_text_mode` = true;
+</code></pre></div>
+<h3 id="selecting-all-in-a-json-directory-query">Selecting all in a JSON directory query</h3>
+
+<p>Drill currently returns only fields common to all the files in a <a href="/docs/lesson-3-create-a-storage-plugin#query-multiple-files-in-a-directory">directory query</a> that selects all (SELECT *) JSON files.</p>
+
+<p>Workaround: Query each file individually.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright Â© 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html?rev=1667403&r1=1667402&r2=1667403&view=diff
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html (original)
+++ drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html Tue Mar 17 21:08:21 2015
@@ -89,24 +89,16 @@ format.</p>
 
 <h2 id="before-you-begin">Before You Begin</h2>
 
-<h3 id="start-sqlline">Start sqlline</h3>
+<h3 id="start-sqlline">Start SQLLine</h3>
 
-<p>If sqlline is not already started, use a Terminal or Command window to log
-into the demo VM as root, then enter <code>sqlline</code>:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">$ ssh root@10.250.0.6
-Password:
-Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
-Welcome to your Mapr Demo virtual machine.
-[root@maprdemo ~]# sqlline
-sqlline version 1.1.6
-0: jdbc:drill:&gt;
-</code></pre></div>
-<p>You can run queries from this prompt to complete the tutorial. To exit from
-<code>sqlline</code>, type:</p>
+<p>If SQLLine is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter <code>sqlline</code>, as described in <a href="/docs/getting-to-know-the-drill-sandbox">&quot;Getting to Know the Sandbox&quot;</a>:</p>
+
+<p>You can run queries from the <code>sqlline</code> prompt to complete the tutorial. To exit from
+SQLLine, type:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; !quit
 </code></pre></div>
-<p>Note that though this tutorial demonstrates the queries using SQLLine, you can
-also execute queries using the Drill Web UI.</p>
+<p>Examples in this tutorial use SQLLine. You can also execute queries using the Drill Web UI.</p>
 
 <h3 id="list-the-available-workspaces-and-databases:">List the available workspaces and databases:</h3>
 <div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; show databases;
@@ -119,7 +111,6 @@ also execute queries using the Drill Web
 | dfs.root    |
 | dfs.views   |
 | dfs.clicks  |
-| dfs.data    |
 | dfs.tmp     |
 | sys         |
 | maprdb      |
@@ -128,9 +119,9 @@ also execute queries using the Drill Web
 +-------------+
 12 rows selected
 </code></pre></div>
-<p>Note that this command exposes all the metadata available from the storage
-plugins configured with Drill as a set of schemas. This includes the Hive and
-MapR-DB databases as well as the workspaces configured in the file system. As
+<p>This command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. The Hive and
+MapR-DB databases, file system, and other data are configured in the file system. As
 you run queries in the tutorial, you will switch among these schemas by
 submitting the USE command. This behavior resembles the ability to use
 different database schemas (namespaces) in a relational database system.</p>
@@ -174,13 +165,13 @@ on the metadata available in the Hive me
 <h3 id="select-5-rows-from-the-orders-table:">Select 5 rows from the orders table:</h3>
 <div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from orders limit 5;
 +------------+------------+------------+------------+------------+-------------+
-| order_id | month | cust_id | state | prod_id | order_total |
+|  order_id  |   month    |  cust_id   |   state    |  prod_id   | order_total |
 +------------+------------+------------+------------+------------+-------------+
-| 67212 | June | 10001 | ca | 909 | 13 |
-| 70302 | June | 10004 | ga | 420 | 11 |
-| 69090 | June | 10011 | fl | 44 | 76 |
-| 68834 | June | 10012 | ar | 0 | 81 |
-| 71220 | June | 10018 | az | 411 | 24 |
+| 67212      | June       | 10001      | ca         | 909        | 13          |
+| 70302      | June       | 10004      | ga         | 420        | 11          |
+| 69090      | June       | 10011      | fl         | 44         | 76          |
+| 68834      | June       | 10012      | ar         | 0          | 81          |
+| 71220      | June       | 10018      | az         | 411        | 24          |
 +------------+------------+------------+------------+------------+-------------+
 </code></pre></div>
 <p>Because orders is a Hive table, you can query the data in the same way that
@@ -314,7 +305,7 @@ a relational database âtable.â
 directly on files and directories without the need for up-front schema
 definitions or schema management for any model changes. The schema is
 discovered on the fly based on the query. Drill supports queries on a variety
-of file formats including text, CSV, Parquet, and JSON in the 0.5 release.</p>
+of file formats including text, CSV, Parquet, and JSON.</p>
 
 <p>In this example, the clickstream data coming from the mobile/web applications
 is in JSON format. The JSON files have the following structure:</p>
@@ -340,7 +331,7 @@ setup beyond the definition of a workspa
 </code></pre></div>
 <p>In this case, setting the workspace is a mechanism for making queries easier
 to write. When you specify a file system workspace, you can shorten references
-to files in the FROM clause of your queries. Instead of having to provide the
+to files in your queries. Instead of having to provide the
 complete path to a file, you can provide the path relative to a directory
 location specified in the workspace. For example:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">&quot;location&quot;: &quot;/mapr/demo.mapr.com/data/nested&quot;

Modified: drill/site/trunk/content/drill/docs/lession-3-run-queries-on-complex-data-types/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-3-run-queries-on-complex-data-types/index.html?rev=1667403&r1=1667402&r2=1667403&view=diff
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-3-run-queries-on-complex-data-types/index.html (original)
+++ drill/site/trunk/content/drill/docs/lession-3-run-queries-on-complex-data-types/index.html Tue Mar 17 21:08:21 2015
@@ -112,19 +112,19 @@ exist. Here is a visual example of how t
 </code></pre></div>
 <h3 id="query-logs-data-for-a-specific-year:">Query logs data for a specific year:</h3>
 <div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from logs where dir0=&#39;2013&#39; limit 10;
-+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+-----------+------------+
-| dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_flag |
-+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+-----------+------------+
-| 2013 | 11 | 12119 | 11/09/2013 | 02:24:51 | 262 | IOS5 | ny | 0 | chamber | 198 | false |
-| 2013 | 11 | 12120 | 11/19/2013 | 09:37:43 | 0 | AOS4.4 | il | 2 | outside | 511 | false |
-| 2013 | 11 | 12134 | 11/10/2013 | 23:42:47 | 60343 | IOS5 | ma | 4 | and | 421 | false |
-| 2013 | 11 | 12135 | 11/16/2013 | 01:42:13 | 46762 | AOS4.3 | ca | 4 | here&#39;s | 349 | false |
-| 2013 | 11 | 12165 | 11/26/2013 | 21:58:09 | 41987 | AOS4.2 | mn | 4 | he | 271 | false |
-| 2013 | 11 | 12168 | 11/09/2013 | 23:41:48 | 8600 | IOS5 | in | 6 | i | 459 | false |
-| 2013 | 11 | 12196 | 11/20/2013 | 02:23:06 | 15603 | IOS5 | tn | 1 | like | 324 | false |
-| 2013 | 11 | 12203 | 11/25/2013 | 23:50:29 | 221 | IOS6 | tx | 10 | if | 323 | false |
-| 2013 | 11 | 12206 | 11/09/2013 | 23:53:01 | 2488 | AOS4.2 | tx | 14 | unlike | 296 | false |
-| 2013 | 11 | 12217 | 11/06/2013 | 23:51:56 | 0 | AOS4.2 | tx | 9 | can&#39;t | 54 | false |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
+|    dir0    |    dir1    |  trans_id  |    date    |    time    |  cust_id   |   device   |   state    |  camp_id   |  keywords  |  prod_id   | purch_flag |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
+| 2013       | 2          | 12115      | 02/23/2013 | 19:48:24   | 3          | IOS5       | az         | 5          | who&#39;s      | 6          | false      |
+| 2013       | 2          | 12127      | 02/26/2013 | 19:42:03   | 11459      | IOS5       | wa         | 10         | for        | 331        | false      |
+| 2013       | 2          | 12138      | 02/09/2013 | 05:49:01   | 1          | IOS6       | ca         | 7          | minutes    | 500        | false      |
+| 2013       | 2          | 12139      | 02/23/2013 | 06:58:20   | 1          | AOS4.4     | ms         | 7          | i          | 20         | false      |
+| 2013       | 2          | 12145      | 02/10/2013 | 10:14:56   | 10         | IOS5       | mi         | 6          | wrong      | 42         | false      |
+| 2013       | 2          | 12157      | 02/15/2013 | 02:49:22   | 102        | IOS5       | ny         | 5          | want       | 95         | false      |
+| 2013       | 2          | 12176      | 02/19/2013 | 08:39:02   | 28         | IOS5       | or         | 0          | and        | 351        | false      |
+| 2013       | 2          | 12194      | 02/24/2013 | 08:26:17   | 125445     | IOS5       | ar         | 0          | say        | 500        | true       |
+| 2013       | 2          | 12236      | 02/05/2013 | 01:40:05   | 10         | IOS5       | nj         | 2          | sir        | 393        | false      |
+| 2013       | 2          | 12249      | 02/03/2013 | 04:45:47   | 21725      | IOS5       | nj         | 5          | no         | 414        | false      |
 +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
 </code></pre></div>
 <p>This query constrains files inside the subdirectory named 2013. The variable
@@ -139,16 +139,13 @@ an IOS5 device in August 2013.</p>
 where dir0=&#39;2013&#39; and dir1=&#39;8&#39; and device=&#39;IOS5&#39; and purch_flag=&#39;true&#39;
 order by `date`;
 +------------+------------+------------+
-| yr | mth | cust_id |
+|     yr     |    mth     |  cust_id   |
 +------------+------------+------------+
-| 2013 | 8 | 4 |
-| 2013 | 8 | 521 |
-| 2013 | 8 | 1 |
-| 2013 | 8 | 2 |
-| 2013 | 8 | 4 |
-| 2013 | 8 | 549 |
-| 2013 | 8 | 72827 |
-| 2013 | 8 | 38127 |
+| 2013       | 8          | 4          |
+| 2013       | 8          | 521        |
+| 2013       | 8          | 1          |
+| 2013       | 8          | 2          |
+
 ...
 </code></pre></div>
 <h3 id="return-monthly-counts-per-customer-for-a-given-year:">Return monthly counts per customer for a given year:</h3>
@@ -208,13 +205,13 @@ data.</p>
 t.user_info.state as state
 from `clicks/clicks.json` t limit 5;
 +------------+------------+------------+
-| custid | device | state |
+|   custid   |   device   |   state    |
 +------------+------------+------------+
-| 22526 | IOS5 | il |
-| 16368 | AOS4.2 | nc |
-| 21449 | IOS6 | oh |
-| 20323 | IOS5 | oh |
-| 15360 | IOS5 | ca |
+| 22526      | IOS5       | il         |
+| 16368      | AOS4.2     | nc         |
+| 21449      | IOS6       | oh         |
+| 20323      | IOS5       | oh         |
+| 15360      | IOS5       | ca         |
 +------------+------------+------------+
 </code></pre></div>
 <p>This query uses a simple table.column.column notation to extract nested column
@@ -232,15 +229,14 @@ parsed as table names by the SQL parser.
 purchased
 from `clicks/clicks.json` t limit 5;
 +------------+------------+
-| prodid | purchased |
+|   prodid   | purchased  |
 +------------+------------+
-| [174,2] | false |
-| [] | false |
-| [582] | false |
-| [710,47] | false |
-| [0,8,170,173,1,124,46,764,30,711,0,3,25] | true |
-+------------+------------+
-5 rows selected
+| [174,2]    | false      |
+| []         | false      |
+| [582]      | false      |
+| [710,47]   | false      |
+| [0,8,170,173,1,124,46,764,30,711,0,3,25] | true       |
+    5 rows selected
 </code></pre></div>
 <p>Note that this result reveals that the prod_id column contains an array of IDs
 (one or more product ID values per row, separated by commas). The next step
@@ -299,15 +295,15 @@ from `clicks/clicks.json` t) sq
 where sq.prodid between 700 and 750 and sq.purchased=&#39;true&#39;
 order by sq.prodid;
 +------------+------------+------------+
-| trans_id | prodid | purchased |
+|  trans_id  |   prodid   | purchased  |
 +------------+------------+------------+
-| 21886 | 704 | true |
-| 20674 | 708 | true |
-| 22158 | 709 | true |
-| 34089 | 714 | true |
-| 22545 | 714 | true |
-| 37500 | 717 | true |
-| 36595 | 718 | true |
+| 21886      | 704        | true       |
+| 20674      | 708        | true       |
+| 22158      | 709        | true       |
+| 34089      | 714        | true       |
+| 22545      | 714        | true       |
+| 37500      | 717        | true       |
+| 36595      | 718        | true       |
 ...
 </code></pre></div>
 <p>This query assumes that there is some meaning to the array (that it is an
@@ -322,13 +318,13 @@ prod_count, t.trans_info.purch_flag as p
 from `clicks/clicks.json` t
 where t.trans_info.purch_flag = &#39;true&#39; order by prod_count desc;
 +------------+--------------+------------+------------+------------+------------+
-| trans_id | session_date | cust_id | device | prod_count | purch_flag |
+|  trans_id  | session_date |  cust_id   |   device   | prod_count | purch_flag |
 +------------+--------------+------------+------------+------------+------------+
-| 37426 | 2014-04-06 | 18709 | IOS5 | 34 | true |
-| 31589 | 2014-04-16 | 18576 | IOS6 | 31 | true |
-| 11600 | 2014-04-07 | 4260 | AOS4.2 | 28 | true |
-| 35074 | 2014-04-03 | 16697 | AOS4.3 | 27 | true |
-| 17192 | 2014-04-22 | 2501 | AOS4.2 | 26 | true |
+| 37426      | 2014-04-06   | 18709      | IOS5       | 34         | true       |
+| 31589      | 2014-04-16   | 18576      | IOS6       | 31         | true       |
+| 11600      | 2014-04-07   | 4260       | AOS4.2     | 28         | true       |
+| 35074      | 2014-04-03   | 16697      | AOS4.3     | 27         | true       |
+| 17192      | 2014-04-22   | 2501       | AOS4.2     | 26         | true       |
 ...
 </code></pre></div>
 <p>This query uses an SQL extension, the repeated_count function, to get an
@@ -357,7 +353,7 @@ where o.cust_id=t.user_info.cust_id
 and o.order_total &gt; (select avg(inord.order_total) 
 from hive.orders inord where inord.state = o.state);
 +------------+-------------+------------+
-|  cust_id   | order_total |   prod_id   |
+|  cust_id   | order_total |   prod_id  |
 +------------+-------------+------------+
 ...
 | 9650       | 69          | 16         |

Added: drill/site/trunk/content/drill/docs/lesson-1-connect-to-data-sources/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lesson-1-connect-to-data-sources/index.html?rev=1667403&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lesson-1-connect-to-data-sources/index.html (added)
+++ drill/site/trunk/content/drill/docs/lesson-1-connect-to-data-sources/index.html Tue Mar 17 21:08:21 2015
@@ -0,0 +1,180 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lesson 1: Connect to Data Sources - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lesson 1: Connect to Data Sources</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>This lesson shows how to connect to default data sources that Drill installs
+and configures through storage plugins. You learn how to list the storage
+plugins as you would list databases in SQL.</p>
+
+<h2 id="list-the-storage-plugins">List the Storage Plugins</h2>
+
+<p>To list the default storage plugins, use the SHOW DATABASES command.</p>
+
+<ol>
+<li><p>Issue the SHOW DATABASES command.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local&gt; SHOW DATABASES;  
+</code></pre></div>
+<p>The output lists the storage plugins, which you use as a SQL database, in
+<database>.<workspace> format.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">+-------------+
+| SCHEMA_NAME |
++-------------+
+| dfs.default |
+| dfs.root    |
+| dfs.tmp     |
+| cp.default  |
+| sys         |
+| INFORMATION_SCHEMA |
++-------------+
+6 rows selected (0.977 seconds)
+</code></pre></div></li>
+<li><p>Take a look at the list of storage plugins and workspaces that Drill recognizes.</p></li>
+</ol>
+
+<ul>
+<li><code>dfs</code> is the storage plugin for connecting to the <a href="/docs/querying-a-file-system">file system</a> data source on your machine.</li>
+<li><code>cp</code> is a storage plugin for connecting to a JAR data source used with MapR.</li>
+<li><code>sys</code> is a storage plugin for connecting to Drill <a href="/docs/querying-system-tables">system tables</a>.</li>
+<li><a href="/docs/querying-the-information-schema">INFORMATION_SCHEMA</a> is a storage plugin for connecting to an ANSI standard set of metadata tables.</li>
+</ul>
+
+<h2 id="list-tables">List Tables</h2>
+
+<p>You choose a storage plugin using the USE command. The output shows the status
+and description of the operation. After connecting to a data source, you can
+list available tables.</p>
+
+<ol>
+<li><p>Select the <code>sys</code> storage plugin.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  USE sys;
+  +------------+------------+
+  |     ok     |  summary   |
+  +------------+------------+
+  | true       | Default schema changed to &#39;sys&#39; |
+  +------------+------------+
+  1 row selected (0.034 seconds) 
+</code></pre></div></li>
+<li><p>List the tables in <code>sys</code>.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  SHOW TABLES;
+  0: jdbc:drill:zk=local&gt; SHOW TABLES;  
+
+  +--------------+------------+
+  | TABLE_SCHEMA | TABLE_NAME |
+  +--------------+------------+
+  | sys          | drillbits  |
+  | sys          | version    |
+  | sys          | options    |
+  +--------------+------------+
+</code></pre></div></li>
+<li><p>Select the INFORMATION_SCHEMA storage plugin.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  0: jdbc:drill:zk=local&gt; USE INFORMATION_SCHEMA;
+
+  +------------+------------+
+  |     ok     |  summary   |
+  +------------+------------+
+  | true       | Default schema changed to &#39;INFORMATION_SCHEMA&#39; |
+  +------------+------------+
+  1 row selected (0.023 seconds)
+</code></pre></div></li>
+<li><p>List the tables in INFORMATION_SCHEMA.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  0: jdbc:drill:zk=local&gt; SHOW TABLES;  
+
+  +--------------+------------+
+  | TABLE_SCHEMA | TABLE_NAME |
+  +--------------+------------+
+  | INFORMATION_SCHEMA | VIEWS      |
+  | INFORMATION_SCHEMA | COLUMNS    |
+  | INFORMATION_SCHEMA | TABLES     |
+  | INFORMATION_SCHEMA | CATALOGS   |
+  | INFORMATION_SCHEMA | SCHEMATA   |
+  +--------------+------------+
+  5 rows selected (0.082 seconds)
+</code></pre></div></li>
+</ol>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright Â© 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/lesson-2-query-plain-text/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lesson-2-query-plain-text/index.html?rev=1667403&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lesson-2-query-plain-text/index.html (added)
+++ drill/site/trunk/content/drill/docs/lesson-2-query-plain-text/index.html Tue Mar 17 21:08:21 2015
@@ -0,0 +1,191 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lesson 2: Query Plain Text - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lesson 2: Query Plain Text</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>The lesson shows you how to query a plain text file. Drill handles plain text
+files and directories like standard SQL tables and can infer knowledge about
+the schema of the data. No setup is required. For example, you do not need to
+perform extract, transform, and load (ETL) operations on the data source.
+Exercises in the tutorial demonstrate the general guidelines for querying a
+plain text file:</p>
+
+<ul>
+<li>Use a storage plugin that defines the file format, such as comma-separated (CSV) or tab-separated values (TSV), of the data in the plain text file.</li>
+<li>In the SELECT statement, use the <code>COLUMNS[n]</code> syntax in lieu of column names, which do not exist in a plain text file. The first column is column <code>0</code>.</li>
+<li>In the FROM clause, use the path to the plain text file instead of using a table name. Enclose the path and file name in backticks. </li>
+</ul>
+
+<h2 id="prerequisites">Prerequisites</h2>
+
+<p>This lesson uses a tab-separated value (TSV) files that you download from a
+Google internet site. The data in the file consists of phrases from books that
+Google scans and generates for its <a href="http://storage.googleapis.com/books/ngrams/books/datasetsv2.html">Google Books Ngram
+Viewer</a>. You
+use the data to find the relative frequencies of Ngrams.</p>
+
+<h2 id="about-the-data">About the Data</h2>
+
+<p>Each line in the TSV file has the following structure:</p>
+
+<p><code>ngram TAB year TAB match_count TAB volume_count NEWLINE</code></p>
+
+<p>For example, lines 1722089 and 1722090 in the file contain this data:</p>
+
+<table ><tbody><tr><th >ngram</th><th >year</th><th colspan="1" >match_count</th><th >volume_count</th></tr><tr><td ><p class="p1">Zoological Journal of the Linnean</p></td><td >2007</td><td colspan="1" >284</td><td >101</td></tr><tr><td colspan="1" ><p class="p1">Zoological Journal of the Linnean</p></td><td colspan="1" >2008</td><td colspan="1" >257</td><td colspan="1" >87</td></tr></tbody></table> 
+  
+
+<p>In 2007, &quot;Zoological Journal of the Linnean&quot; occurred 284 times overall in 101
+distinct books of the Google sample.</p>
+
+<h2 id="download-and-set-up-the-data">Download and Set Up the Data</h2>
+
+<p>After downloading the file, you use the <code>dfs</code> storage plugin, and then select
+data from the file as you would a table. In the SELECT statement, enclose the
+path and name of the file in backticks.</p>
+
+<ol>
+<li><p>Download the compressed Google Ngram data from this location:  </p>
+
+<p><a href="http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-zo.gz">http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-zo.gz</a>)</p></li>
+<li><p>Unzip the file.<br>
+ A file named googlebooks-eng-all-5gram-20120701-zo appears.</p></li>
+<li><p>Change the file name to add a <code>.tsv</code> extension.<br>
+The Drill <code>dfs</code> storage plugin definition includes a TSV format that requires
+a file to have this extension.</p></li>
+</ol>
+
+<h2 id="query-the-data">Query the Data</h2>
+
+<p>Get data about &quot;Zoological Journal of the Linnean&quot; that appears more than 250
+times a year in the books that Google scans.</p>
+
+<ol>
+<li><p>Switch back to using the <code>dfs</code> storage plugin.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  USE dfs;
+</code></pre></div></li>
+<li><p>Issue a SELECT statement to get the first three columns in the file. In the FROM clause of the example, substitute your path to the TSV file. In the WHERE clause, enclose the string literal &quot;Zoological Journal of the Linnean&quot; in single quotation marks. Limit the output to 10 rows.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> SELECT COLUMNS[0], COLUMNS[1], COLUMNS[2]
+ FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
+ WHERE ((columns[0] = &#39;Zoological Journal of the Linnean&#39;)
+   AND (columns[2] &gt; 250)) LIMIT 10;
+</code></pre></div>
+<p>The output is:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> +------------+------------+------------+
+ |   EXPR$0   |   EXPR$1   |   EXPR$2   |
+ +------------+------------+------------+
+ | Zoological Journal of the Linnean | 1993       | 297        |
+ | Zoological Journal of the Linnean | 1997       | 255        |
+ | Zoological Journal of the Linnean | 2003       | 254        |
+ | Zoological Journal of the Linnean | 2007       | 284        |
+ | Zoological Journal of the Linnean | 2008       | 257        |
+ +------------+------------+------------+
+ 5 rows selected (1.599 seconds)
+</code></pre></div></li>
+<li><p>Repeat the query using aliases to replace the column headers, such as EXPR$0, with user-friendly column headers, Ngram, Publication Date, and Frequency. In the FROM clause of the example, substitute your path to the TSV file. </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> SELECT COLUMNS[0] AS Ngram,
+        COLUMNS[1] AS Publication_Date,
+        COLUMNS[2] AS Frequency
+ FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
+ WHERE ((columns[0] = &#39;Zoological Journal of the Linnean&#39;)
+     AND (columns[2] &gt; 250)) LIMIT 10;
+</code></pre></div>
+<p>The improved output is:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> +------------+------------------+------------+
+ |   Ngram    | Publication_Date | Frequency  |
+ +------------+------------------+------------+
+ | Zoological Journal of the Linnean | 1993             | 297        |
+ | Zoological Journal of the Linnean | 1997             | 255        |
+ | Zoological Journal of the Linnean | 2003             | 254        |
+ | Zoological Journal of the Linnean | 2007             | 284        |
+ | Zoological Journal of the Linnean | 2008             | 257        |
+ +------------+------------------+------------+
+ 5 rows selected (1.628 seconds)
+</code></pre></div></li>
+</ol>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright Â© 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/lesson-3-create-a-storage-plugin/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lesson-3-create-a-storage-plugin/index.html?rev=1667403&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lesson-3-create-a-storage-plugin/index.html (added)
+++ drill/site/trunk/content/drill/docs/lesson-3-create-a-storage-plugin/index.html Tue Mar 17 21:08:21 2015
@@ -0,0 +1,232 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lesson 3: Create a Storage Plugin - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lesson 3: Create a Storage Plugin</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>The Drill default storage plugins support common file formats. If you need
+support for some other file format, create a custom storage plugin. You can also create a storage plugin to simplify querying file having long path names. A workspace name replaces the long path name.</p>
+
+<p>This lesson covers how to create and use a storage plugin to simplify queries. First,
+you create the storage plugin in the Drill Web UI. Next, you connect to the
+file through the plugin to query a file, and then a directory, and finally you
+query multiple files in a directory.</p>
+
+<h2 id="create-a-storage-plugin">Create a Storage Plugin</h2>
+
+<p>You can create a storage plugin using the Apache Drill Web UI.</p>
+
+<ol>
+<li>Create an <code>ngram</code> directory on your file system.</li>
+<li>Copy <code>googlebooks-eng-all-5gram-20120701-zo.tsv</code> to the <code>ngram</code> directory.</li>
+<li>Open the Drill Web UI by navigating to <a href="http://localhost:8047/storage">http://localhost:8047/storage</a>.<br>
+ To open the Drill Web UI, SQLLine must still be running.</li>
+<li>In New Storage Plugin, type <code>myplugin</code>.<br>
+ <img src="/docs/img/ngram_plugin.png" alt="new plugin"><br></li>
+<li>Click <strong>Create</strong>.<br>
+ The Configuration screen appears.</li>
+<li><p>Replace null with the following storage plugin definition, except on the location line, use the path to your <code>ngram</code> directory instead of the drilluser&#39;s path and give your workspace an arbitrary name, for example, ngram:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;type&quot;: &quot;file&quot;,
+  &quot;enabled&quot;: true,
+  &quot;connection&quot;: &quot;file:///&quot;,
+  &quot;workspaces&quot;: {
+    &quot;ngram&quot;: {
+      &quot;location&quot;: &quot;/Users/drilluser/ngram&quot;,
+      &quot;writable&quot;: false,
+      &quot;defaultInputFormat&quot;: null
+   }
+ },
+ &quot;formats&quot;: {
+   &quot;tsv&quot;: {
+     &quot;type&quot;: &quot;text&quot;,
+     &quot;extensions&quot;: [
+       &quot;tsv&quot;
+     ],
+     &quot;delimiter&quot;: &quot;\t&quot;
+    }
+  }
+}
+</code></pre></div></li>
+<li><p>Click <strong>Create</strong>.<br>
+ The success message appears briefly.</p></li>
+<li><p>Click <strong>Back</strong>.<br>
+ The new plugin appears in Enabled Storage Plugins.<br>
+ <img src="/docs/img/ngram_plugin.png" alt="new plugin"> </p></li>
+<li><p>Go back to the SQLLine prompt in the CLI, and list the storage plugins. Press RETURN in the CLI to get a prompt if necessary.</p></li>
+</ol>
+
+<p>Your custom plugin appears in the list and has two workspaces: the <code>ngram</code>
+workspace that you defined and a default workspace.</p>
+
+<h2 id="connect-to-and-query-a-file">Connect to and Query a File</h2>
+
+<p>When querying the same data source repeatedly, avoiding long path names is
+important. This exercise demonstrates how to simplify the query. Instead of
+using the full path to the Ngram file, you use dot notation in the FROM
+clause.</p>
+
+<p><code>&lt;workspace name&gt;.`&lt;location&gt;</code>`</p>
+
+<p>This syntax assumes you connected to a storage plugin that defines the
+location of the data. To query the data source while you are <em>not</em> connected to
+that storage plugin, include the plugin name:</p>
+
+<p><code>&lt;plugin name&gt;.&lt;workspace name&gt;.`&lt;location&gt;</code>`</p>
+
+<p>This exercise shows how to query Ngram data when you are, and when you are
+not, connected to <code>myplugin</code>.</p>
+
+<ol>
+<li>Connect to the ngram file through the custom storage plugin.<br>
+ <code>USE myplugin;</code></li>
+<li><p>Get data about &quot;Zoological Journal of the Linnean&quot; that appears more than 250 times a year in the books that Google scans. In the FROM clause, instead of using the full path to the file as you did in the last exercise, connect to the data using the storage plugin workspace name ngram.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> SELECT COLUMNS[0], 
+        COLUMNS[1], 
+        COLUMNS[2] 
+ FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.tsv` 
+ WHERE ((columns[0] = &#39;Zoological Journal of the Linnean&#39;) 
+  AND (columns[2] &gt; 250)) 
+ LIMIT 10;
+</code></pre></div>
+<p>The output consists of 5 rows of data.  </p></li>
+<li><p>Switch to the <code>dfs</code> storage plugin.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> 0: jdbc:drill:zk=local&gt; USE dfs;
+ +------------+------------+
+ |     ok     |  summary   |
+ +------------+------------+
+ | true       | Default schema changed to &#39;dfs&#39; |
+ +------------+------------+
+ 1 row selected (0.019 seconds)
+</code></pre></div></li>
+<li><p>Query the TSV file again. Because you switched to <code>dfs</code>, Drill does not know the location of the file. To provide the information to Drill, preface the file name with the storage plugin and workspace names in the FROM clause.  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> SELECT COLUMNS[0], 
+        COLUMNS[1], 
+        COLUMNS[2] 
+ FROM myplugin.ngram.`/googlebooks-eng-all-5gram-20120701-zo.tsv` 
+ WHERE ((columns[0] = &#39;Zoological Journal of the Linnean&#39;) 
+   AND (columns[2] &gt; 250)) 
+ LIMIT 10;
+</code></pre></div></li>
+</ol>
+
+<h2 id="query-multiple-files-in-a-directory">Query Multiple Files in a Directory</h2>
+
+<p>In this exercise, first you create a subdirectory in the <code>ngram</code> directory.
+Next, you download, unzip, and add an extension to a second Ngram file. You
+move both Ngram TSV files to the subdirectory. Finally, using the custom
+plugin workspace, you query both files. In the FROM clause, simply reference
+the subdirectory.</p>
+
+<ol>
+<li><p>Download a second file of compressed Google Ngram data from this location: </p>
+
+<p><a href="http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-2gram-20120701-ze.gz">http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-2gram-20120701-ze.gz</a></p></li>
+<li><p>Unzip <code>googlebooks-eng-all-2gram-20120701-ze.gz</code> and move <code>googlebooks-eng-all-2gram-20120701-ze</code> to the <code>ngram/myfiles</code> subdirectory. </p></li>
+<li><p>Change the name of <code>googlebooks-eng-all-2gram-20120701-ze</code> to add a <code>.tsv</code> extension.    </p></li>
+<li><p>Move the 5gram file you worked with earlier <code>googlebooks-eng-all-5gram-20120701-zo.tsv</code> from the <code>ngram</code> directory to the <code>ngram/myfiles</code> subdirectory.</p></li>
+<li><p>At the SQLLine prompt, use the <code>myplugin.ngrams</code> workspace. </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  USE myplugin.ngram;
+</code></pre></div></li>
+<li><p>Query the myfiles directory for the &quot;Zoological Journal of the Linnean&quot; or &quot;zero temperatures&quot; in books published in 1998.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  SELECT * 
+  FROM myfiles 
+  WHERE (((COLUMNS[0] = &#39;Zoological Journal of the Linnean&#39;)
+    OR (COLUMNS[0] = &#39;zero temperatures&#39;)) 
+    AND (COLUMNS[1] = &#39;1998&#39;));
+</code></pre></div>
+<p>The output lists ngrams from both files.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">  +------------+
+  |  columns   |
+  +------------+
+  | [&quot;Zoological Journal of the Linnean&quot;,&quot;1998&quot;,&quot;157&quot;,&quot;53&quot;] |
+  | [&quot;zero temperatures&quot;,&quot;1998&quot;,&quot;628&quot;,&quot;487&quot;] |
+  +------------+
+  2 rows selected (5.316 seconds)
+</code></pre></div></li>
+</ol>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright Â© 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/lexical-structure/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lexical-structure/index.html?rev=1667403&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lexical-structure/index.html (added)
+++ drill/site/trunk/content/drill/docs/lexical-structure/index.html Tue Mar 17 21:08:21 2015
@@ -0,0 +1,230 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lexical Structure - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lexical Structure</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>A SQL statement used in Drill can include one or more of the following parts:</p>
+
+<ul>
+<li>Clause, such as FROM</li>
+<li>Command, such as SELECT </li>
+<li>Expression, a combination of one or more values, operators, and SQL functions that evaluates to a value. For example, users.firstname is a period expression</li>
+<li>Function, scalar and aggregate, such as sum</li>
+<li><p>Literal value</p>
+
+<ul>
+<li><a href="/docs/lexical-structure#boolean">Boolean</a></li>
+<li><a href="/docs/lexical-structure#identifier">Identifier</a></li>
+<li><a href="/docs/lexical-structure#integer">Integer</a></li>
+<li><a href="/docs/lexical-structure#numeric-constant">Numeric constant</a></li>
+<li><a href="/docs/lexical-structure#string">String</a></li>
+</ul></li>
+<li><p>Operator, such as [NOT] IN, LIKE, and AND</p></li>
+<li><p>Predicate, such as a &gt; b in <code>SELECT * FROM myfile WHERE a &gt; b</code>.</p></li>
+<li><p><a href="/docs/lexical-structure#storage-plugin-and-workspace-references">Storage plugin and workspace reference</a></p></li>
+<li><p>Whitespace</p></li>
+</ul>
+
+<p>The upper/lowercase sensitivity of the parts differs.</p>
+
+<h2 id="case-sensitivity">Case-sensitivity</h2>
+
+<p>SQL function and command names are case-insensitive. Storage plugin and workspace names are case-sensitive. Column and table names are case-insensitive unless enclosed in double quotation marks. The double-quotation mark character can be used as an escape character for the double quotation mark.</p>
+
+<p>Keywords are case-insensitive. For example, the keywords SELECT and select are equivalent. This document shows keywords in uppercase.</p>
+
+<p>The sys.options table name and values are case-sensitive. The following query works:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM sys.options where NAME like &#39;%parquet%&#39;;
+</code></pre></div>
+<p>When using the ALTER command, specify the name in lower case. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER SESSION  set `store.parquet.compression`=&#39;snappy&#39;;
+</code></pre></div>
+<h2 id="storage-plugin-and-workspace-references">Storage Plugin and Workspace References</h2>
+
+<p>Storage plugin and workspace names are case-sensitive. The case of the name used in the query and the name in the storage plugin definition need to match. For example, defining a storage plugin named <code>dfs</code> and then referring to the plugin as <code>DFS</code> fails, but this query succeeds:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM dfs.`/Users/drilluser/ticket_sales.json`;
+</code></pre></div>
+<h2 id="literal-values">Literal Values</h2>
+
+<p>This section describes how to construct literals.</p>
+
+<h3 id="boolean">Boolean</h3>
+
+<p>Boolean values are true or false and are case-insensitive. Do not enclose the values in quotation marks.</p>
+
+<h3 id="identifier">Identifier</h3>
+
+<p>An identifier is a letter followed by any sequence of letters, digits, or the underscore. For example, names of tables, columns, and aliases are identifiers. Maximum length is 1024 characters. Enclose the following identifiers in back ticks:</p>
+
+<ul>
+<li>Keywords</li>
+<li>Identifiers that SQL cannot parse. </li>
+</ul>
+
+<p>For example, enclose the SQL keywords date and time in back ticks when referring to column names, but not when referring to data types:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">CREATE TABLE dfs.tmp.sampleparquet AS 
+(SELECT trans_id, 
+cast(`date` AS date) transdate, 
+cast(`time` AS time) transtime, 
+cast(amount AS double) amountm,
+user_info, marketing_info, trans_info 
+FROM dfs.`/Users/drilluser/sample.json`);
+</code></pre></div>
+<p>Table and column names are case-insensitive. Use back ticks to enclose names that contain special characters. Special characters are those other than the 52 Latin alphabet characters. For example, space and @ are special characters. </p>
+
+<p>The following example shows the keyword Year enclosed in back ticks. Because the column alias contains the special space character, also enclose the alias in back ticks, as shown in the following example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT extract(year from transdate) AS `Year`, t.user_info.cust_id AS `Customer Number` FROM dfs.tmp.`sampleparquet` t;
+
++------------+-----------------+
+|    Year    | Customer Number |
++------------+-----------------+
+| 2013       | 28              |
+| 2013       | 86623           |
+| 2013       | 11              |
+| 2013       | 666             |
+| 2013       | 999             |
++------------+-----------------+
+5 rows selected (0.051 seconds)
+</code></pre></div>
+<h3 id="integer">Integer</h3>
+
+<p>An integer value consists of an optional minus sign, -, followed by one or more digits.</p>
+
+<h3 id="numeric-constant">Numeric constant</h3>
+
+<p>Numeric constants include integers, floats, and values in E notation.</p>
+
+<ul>
+<li>Integers: 0-9 and a minus sign prefix</li>
+<li>Float: a series of one or more decimal digits, followed by a period, ., and one or more digits in decimal places. There is no optional + sign. Leading or trailing zeros are required before and after decimal points. For example, 0.52 and 52.0. </li>
+<li><p>E notation: Approximate-value numeric literals in scientific notation consist of a mantissa and exponent. Either or both parts can be signed. For example: 1.2E3, 1.2E-3, -1.2E3, -1.2E-3. Values consist of an optional negative sign (using -), a floating point number, letters e or E, a positive or negative sign (+ or -), and an integer exponent. For example, the following JSON file has data in E notation in two records.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;trans_id&quot;:0,
+ &quot;date&quot;:&quot;2013-07-26&quot;,
+ &quot;time&quot;:&quot;04:56:59&quot;,
+ &quot;amount&quot;:-2.6034345E+38,
+ &quot;trans_info&quot;:{&quot;prod_id&quot;:[16],
+ &quot;purch_flag&quot;:&quot;false&quot;
+}}
+
+{&quot;trans_id&quot;:1,
+ &quot;date&quot;:&quot;2013-05-16&quot;,
+ &quot;time&quot;:&quot;07:31:54&quot;,
+ &quot;amount&quot;:1.8887898E+38,
+ &quot;trans_info&quot;:{&quot;prod_id&quot;:[],
+ &quot;purch_flag&quot;:&quot;false&quot;
+}}
+</code></pre></div>
+<p>Aggregating the data in Drill produces scientific notation in the output:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT sum(amount) FROM dfs.`/Users/khahn/Documents/sample2.json`;
+
++------------+
+|   EXPR$0   |
++------------+
+| -7.146447E37 |
++------------+
+1 row selected (0.044 seconds)
+</code></pre></div></li>
+</ul>
+
+<p>Drill represents invalid values, such as the square root of a negative number, as NaN.</p>
+
+<h3 id="string">String</h3>
+
+<p>Strings are characters enclosed in single quotation marks. To use a single quotation mark itself (apostrophe) in a string, escape it using a single quotation mark. For example, the value Martha&#39;s Vineyard in the SOURCE column in the <code>vitalstat.json</code> file contains an apostrophe:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">+------------+
+|   SOURCE   |
++------------+
+| Martha&#39;s Vineyard |
+| Monroe County |
++------------+
+2 rows selected (0.053 seconds)
+</code></pre></div>
+<p>To refer to the string Martha&#39;s Vineyard in a query, use single quotation marks to enclose the string and escape the apostophe using a single quotation mark:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM dfs.`/Users/drilluser/vitalstat.json` t 
+WHERE t.source = &#39;Martha&#39;&#39;s Vineyard&#39;;
+</code></pre></div></div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright Â© 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>