You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by ts...@apache.org on 2015/01/15 06:11:48 UTC

svn commit: r1651949 [7/13] - in /drill/site/trunk/content/drill: ./ blog/2014/11/19/sql-on-mongodb/ blog/2014/12/02/drill-top-level-project/ blog/2014/12/09/running-sql-queries-on-amazon-s3/ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ blog/201...

Added: drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,142 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing Drill on Windows - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing Drill on Windows</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>You can install Drill on Windows 7 or 8. To install Drill on Windows, you must
+have JDK 7, and you must set the <code>JAVA_HOME</code> path in the Windows Environment
+Variables. You must also have a utility, such as
+<a href="http://www.7-zip.org/">7-zip</a>, installed on your machine. These instructions
+assume that the <a href="http://www.7-zip.org/">7-zip</a> decompression utility is
+installed to extract the Drill archive file that you download.</p>
+
+<h4 id="setting-java_home">Setting JAVA_HOME</h4>
+
+<p>Complete the following steps to set <code>JAVA_HOME</code>:</p>
+
+<ol>
+<li>Navigate to <code>Control Panel\All Control Panel Items\System</code>, and select <strong>Advanced System Settings</strong>. The System Properties window appears.</li>
+<li>On the Advanced tab, click <strong>Environment Variables</strong>. The Environment Variables window appears.</li>
+<li><p>Add/Edit <code>JAVA_HOME</code> to point to the location where the JDK software is located.</p>
+
+<p><strong>Example</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">C:\Program Files\Java\jdk1.7.0_65
+</code></pre></div></li>
+<li><p>Click <strong>OK</strong> to exit the windows.</p></li>
+</ol>
+
+<h4 id="installing-drill">Installing Drill</h4>
+
+<p>Complete the following steps to install Drill:</p>
+
+<ol>
+<li><p>Create a <code>drill</code> directory on your <code>C:\</code> drive, (or in some other location if you prefer).</p>
+
+<p><strong>Example</strong></p>
+
+<p>Do not include spaces in your directory path. If you include spaces in the
+directory path, Drill fails to run.</p></li>
+<li><p>Click the following link to download the latest, stable version of Apache Drill:</p>
+
+<p><a href="http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz">http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz</a></p></li>
+<li><p>Move the <code>apache-drill-&lt;version&gt;.tar.gz</code> file to the <code>drill</code> directory that you created on your <code>C:\</code> drive.</p></li>
+<li><p>Unzip the <code>TAR.GZ</code> file and the resulting <code>TAR</code> file.  </p>
+
+<p>a. Right-click <code>apache-drill-&lt;version&gt;.tar.gz,</code> and select<code>7-Zip&gt;Extract Here</code>. The utility extracts the <code>apache-drill-&lt;version&gt;.tar</code> file.
+b. Right-click <code>apache-drill-&lt;version&gt;.tar,</code>and select`<code>7-Zip&gt;Extract Here</code>. <code>The utility extracts the</code> apache-drill-<version> `folder.</p></li>
+<li><p>Open the <code>apache-drill-&lt;version&gt;</code>folder.</p></li>
+<li><p>Open the <code>bin</code> folder, and double-click on the <code>sqlline.bat</code> file. The Windows command prompt opens.</p></li>
+<li><p>At the <code>sqlline&gt;</code> prompt, type <code>!connect jdbc:drill:zk=local</code> and then press <code>Enter</code>.</p></li>
+<li><p>Enter the username and password.
+a. When prompted, enter the user name <code>admin</code> and then press Enter. 
+b. When prompted, enter the password <code>admin</code> and then press Enter. The cursor blinks for a few seconds and then <code>0: jdbc:drill:zk=local&gt;</code>displays in the prompt.</p></li>
+</ol>
+
+<p>At this point, you can submit queries to Drill. Refer to the <a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minute%0As#ApacheDrillin10Minutes-QuerySampleData">Query Sample Dat
+a</a> section of this document.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,141 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the Apache Drill Sandbox - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the Apache Drill Sandbox</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>This tutorial uses the MapR Sandbox, which is a Hadoop environment pre-configured with Apache Drill.</p>
+
+<p>To complete the tutorial on the MapR Sandbox with Apache Drill, work through
+the following pages in order:</p>
+
+<ul>
+<li><a href="/confluence/display/DRILL/Installing+the+Apache+Drill+Sandbox">Installing the Apache Drill Sandbox</a></li>
+<li><a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill Setup</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set">Lesson 1: Learn About the Data Set</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL">Lesson 2: Run Queries with ANSI SQL</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types">Lesson 3: Run Queries on Complex Data Types</a></li>
+<li><a href="/confluence/display/DRILL/Summary">Summary</a></li>
+</ul>
+
+<h1 id="about-apache-drill">About Apache Drill</h1>
+
+<p>Drill is an Apache open-source SQL query engine for Big Data exploration.
+Drill is designed from the ground up to support high-performance analysis on
+the semi-structured and rapidly evolving data coming from modern Big Data
+applications, while still providing the familiarity and ecosystem of ANSI SQL,
+the industry-standard query language. Drill provides plug-and-play integration
+with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers
+the following key features:</p>
+
+<ul>
+<li><p>Low-latency SQL queries</p></li>
+<li><p>Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore.</p></li>
+<li><p>ANSI SQL</p></li>
+<li><p>Nested data support</p></li>
+<li><p>Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs)</p></li>
+<li><p>BI/SQL tool integration using standard JDBC/ODBC drivers</p></li>
+</ul>
+
+<h1 id="mapr-sandbox-with-apache-drill">MapR Sandbox with Apache Drill</h1>
+
+<p>MapR includes Apache Drill as part of the Hadoop distribution. The MapR
+Sandbox with Apache Drill is a fully functional single-node cluster that can
+be used to get an overview on Apache Drill in a Hadoop environment. Business
+and technical analysts, product managers, and developers can use the sandbox
+environment to get a feel for the power and capabilities of Apache Drill by
+performing various types of queries. Once you get a flavor for the technology,
+refer to the <a href="http://incubator.apache.org/drill/">Apache Drill web site</a> and
+<a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki">Apache Drill documentation
+</a>for more
+details.</p>
+
+<p>Note that Hadoop is not a prerequisite for Drill and users can start ramping
+up with Drill by running SQL queries directly on the local file system. Refer
+to <a href="https://cwiki.apache.org/confluence/display/DR%0AILL/Apache+Drill+in+10+Minutes">Apache Drill in 10 minutes</a> for an introduction to using Drill in local
+(embedded) mode.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,155 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the MapR Sandbox with Apache Drill on VirtualBox - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the MapR Sandbox with Apache Drill on VirtualBox</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>The MapR Sandbox for Apache Drill on VirtualBox comes with NAT port forwarding
+enabled, which allows you to access the sandbox using localhost as hostname.</p>
+
+<p>Complete the following steps to install the MapR Sandbox with Apache Drill on
+VirtualBox:</p>
+
+<ol>
+<li><p>Download the MapR Sandbox with Apache Drill file to a directory on your machine:<br>
+<a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill</a></p></li>
+<li><p>Open the virtual machine player.</p></li>
+<li><p>Select <strong>File &gt; Import Appliance</strong>. The Import Virtual Appliance dialog appears.</p>
+
+<p><img src="../../../img/vbImport.png" alt=""></p></li>
+<li><p>Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill and click <strong>Next</strong>. The Appliance Settings window appears.</p>
+
+<p><img src="../../../img/vbapplSettings.png" alt=""></p></li>
+<li><p>Select the check box at the bottom of the screen: <strong>Reinitialize the MAC address of all network cards</strong>, then click <strong>Import</strong>. The Import Appliance imports the sandbox.</p></li>
+<li><p>When the import completes, select <strong>File &gt; Preferences</strong>. The VirtualBox - Settings dialog appears.</p>
+
+<p><img src="../../../img/vbNetwork.png" alt=""></p>
+
+<ol>
+<li>Select <strong>Network</strong>. </li>
+</ol>
+
+<p>The correct setting depends on your network connectivity when you run the
+Sandbox. In general, if you are going to use a wired Ethernet connection,
+select <strong>NAT Networks **and **vboxnet0</strong>. If you are going to use a wireless
+network, select <strong>Host-only Networks</strong> and the <strong>VirtualBox Host-Only Ethernet
+Adapter</strong>. If no adapters appear, click the green** +** button to add the
+VirtualBox adapter.</p>
+
+<p><img src="../../../img/vbMaprSetting.png" alt="">
+8. Click **OK **to continue.
+9. Click <img src="https://lh5.googleusercontent.com/6TjVEW28MJhPud2Nc2ButYB_GDqKTnadaluSulg0Zb259MgN1IRCgIlo-kMAEJ7lGWHf2aqc-nIjUsUFlaXP-LceAIKE5owNqXUWxXS0WXcBLWzUqg5X1VIXXswajb6oWA" alt="">. The MapR-Sandbox-For-Apache-Drill-0.6.0-r2-4.0.1 - Settings dialog appears.</p>
+
+<p><img src="../../../img/vbGenSettings.png" alt=""><br>
+10. Click <strong>OK</strong> to continue.
+11. Click <strong>Start</strong>. It takes a few minutes for the MapR services to start. After the MapR services start and installation completes, the following screen appears:</p>
+
+<p><img src="../../../img/vbloginSandbox.png" alt="">
+12. The client must be able to resolve the actual hostname of the Drill node(s) with the IP(s). Verify that a DNS entry was created on the client machine for the Drill node(s).<br>
+If a DNS entry does not exist, create the entry for the Drill node(s).</p></li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">* For Windows, create the entry in the %WINDIR%\system32\drivers\etc\hosts file.
+
+* For Linux and Mac, create the entry in /etc/hosts.  
+</code></pre></div>
+<p><drill-machine-IP> <drill-machine-hostname><br>
+Example: <code>127.0.1.1 maprdemo</code></p>
+
+<ol>
+<li><p>You can navigate to the URL provided or to <a href="http://localhost:8047">localhost:8047</a> to experience the Drill Web UI, or you can log into the sandbox through the command line.</p>
+
+<p>a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser&#39;s address bar.</p>
+
+<p>b. To log into the virtual machine and access the command line, enter Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter <code>mapr</code> as the login and password.</p></li>
+</ol>
+
+<h1 id="what&#39;s-next">What&#39;s Next</h1>
+
+<p>After downloading and installing the sandbox, continue with the tutorial by
+<a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill
+Setup</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,153 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the MapR Sandbox with Apache Drill on VMware Player/VMware Fusion - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the MapR Sandbox with Apache Drill on VMware Player/VMware Fusion</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>Complete the following steps to install the MapR Sandbox with Apache Drill on
+VMware Player or VMware Fusion:</p>
+
+<ol>
+<li><p>Download the MapR Sandbox with Drill file to a directory on your machine:<br>
+<a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill</a></p></li>
+<li><p>Open the virtual machine player, and select the *<em>Open a Virtual Machine *</em>option.</p></li>
+</ol>
+
+<p>Tip for VMware Fusion</p>
+
+<p>If you are running VMware Fusion, select** Import**.</p>
+
+<p><img src="../../../img/vmWelcome.png" alt=""></p>
+
+<ol>
+<li>Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill file, and select <code>MapR-Sandbox-For-Apache-Drill-4.0.1_VM.ova</code>.</li>
+</ol>
+
+<p><img src="../../../img/vmShare.png" alt=""></p>
+
+<p>The Import Virtual Machine dialog appears.</p>
+
+<ol>
+<li>Click <strong>Import</strong>. The virtual machine player imports the sandbox.</li>
+</ol>
+
+<p><img src="../../../img/vmLibrary.png" alt=""></p>
+
+<ol>
+<li>Select <code>MapR-Sandbox-For-Apache-Drill-4.0.1_VM</code>, and click <strong>Play virtual machine</strong>. It takes a few minutes for the MapR services to start.<br>
+After the MapR services start and installation completes, the following screen
+appears:</li>
+</ol>
+
+<p><img src="../../../img/loginSandbox.png" alt=""></p>
+
+<p>Note the URL provided in the screen, which corresponds to the Web UI in Apache
+Drill.</p>
+
+<ol>
+<li>Verify that a DNS entry was created on the host machine for the virtual machine. If not, create the entry.</li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">* For Linux and Mac, create the entry in `/etc/hosts`.  
+
+* For WIndows, create the entry in the `%WINDIR%\system32\drivers\etc\hosts` file.  
+</code></pre></div>
+<p>Example: <code>127.0.1.1 &lt;vm_hostname&gt;</code></p>
+
+<ol>
+<li><p>You can navigate to the URL provided to experience Drill Web UI or you can login to the sandbox through the command line.</p>
+
+<p>a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser&#39;s address bar.  </p>
+
+<p>b. To login to the virtual machine and access the command line, press Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter <code>mapr</code> as the login and password.</p></li>
+</ol>
+
+<h1 id="what&#39;s-next">What&#39;s Next</h1>
+
+<p>After downloading and installing the sandbox, continue with the tutorial by
+<a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill
+Setup</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/kvgen-function/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/kvgen-function/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/kvgen-function/index.html (added)
+++ drill/site/trunk/content/drill/docs/kvgen-function/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,226 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>KVGEN Function - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>KVGEN Function</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>KVGEN stands for <em>key-value generation</em>. This function is useful when complex
+data files contain arbitrary maps that consist of relatively &quot;unknown&quot; column
+names. Instead of having to specify columns in the map to access the data, you
+can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
+a map with a wide set of columns into an array of key-value pairs.</p>
+
+<p>In turn, you can write analytic queries that return a subset of the generated
+keys or constrain the keys in some way. For example, you can use the
+<a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a> function to break the
+array down into multiple distinct rows and further query those rows.</p>
+
+<p>For example, assume that a JSON file contains this data:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;a&quot;: &quot;valA&quot;, &quot;b&quot;: &quot;valB&quot;}
+{&quot;c&quot;: &quot;valC&quot;, &quot;d&quot;: &quot;valD&quot;}
+</code></pre></div>
+<p>KVGEN would operate on this data to generate:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">[{&quot;key&quot;: &quot;a&quot;, &quot;value&quot;: &quot;valA&quot;}, {&quot;key&quot;: &quot;b&quot;, &quot;value&quot;: &quot;valB&quot;}]
+[{&quot;key&quot;: &quot;c&quot;, &quot;value&quot;: &quot;valC&quot;}, {&quot;key&quot;: &quot;d&quot;, &quot;value&quot;: &quot;valD&quot;}]
+</code></pre></div>
+<p>Applying the <a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a> function to
+this data would return:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;key&quot;: &quot;a&quot;, &quot;value&quot;: &quot;valA&quot;}
+{&quot;key&quot;: &quot;b&quot;, &quot;value&quot;: &quot;valB&quot;}
+{&quot;key&quot;: &quot;c&quot;, &quot;value&quot;: &quot;valC&quot;}
+{&quot;key&quot;: &quot;d&quot;, &quot;value&quot;: &quot;valD&quot;}
+</code></pre></div>
+<p>Assume that a JSON file called <code>kvgendata.json</code> includes multiple records that
+look like this one:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+    &quot;rownum&quot;: 1,
+    &quot;bigintegercol&quot;: {
+        &quot;int_1&quot;: 1,
+        &quot;int_2&quot;: 2,
+        &quot;int_3&quot;: 3
+    },
+    &quot;varcharcol&quot;: {
+        &quot;varchar_1&quot;: &quot;abc&quot;,
+        &quot;varchar_2&quot;: &quot;def&quot;,
+        &quot;varchar_3&quot;: &quot;xyz&quot;
+    },
+    &quot;boolcol&quot;: {
+        &quot;boolean_1&quot;: true,
+        &quot;boolean_2&quot;: false,
+        &quot;boolean_3&quot;: true
+    },
+    &quot;float8col&quot;: {
+        &quot;f8_1&quot;: 1.1,
+        &quot;f8_2&quot;: 2.2
+    },
+    &quot;complex&quot;: [
+        {
+            &quot;col1&quot;: 3
+        },
+        {
+            &quot;col2&quot;: 2,
+            &quot;col3&quot;: 1
+        },
+        {
+            &quot;col1&quot;: 7
+        }
+    ]
+}
+
+{
+    &quot;rownum&quot;: 3,
+    &quot;bigintegercol&quot;: {
+        &quot;int_1&quot;: 1,
+        &quot;int_3&quot;: 3
+    },
+    &quot;varcharcol&quot;: {
+        &quot;varchar_1&quot;: &quot;abcde&quot;,
+        &quot;varchar_2&quot;: null,
+        &quot;varchar_3&quot;: &quot;xyz&quot;,
+        &quot;varchar_4&quot;: &quot;xyz2&quot;
+    },
+    &quot;boolcol&quot;: {
+        &quot;boolean_1&quot;: true,
+        &quot;boolean_2&quot;: false
+    },
+    &quot;float8col&quot;: {
+        &quot;f8_1&quot;: 1.1,
+        &quot;f8_3&quot;: 6.6
+    },
+    &quot;complex&quot;: [
+        {
+            &quot;col1&quot;: 2,
+            &quot;col3&quot;: 1
+        }
+    ]
+}
+...
+</code></pre></div>
+<p>A SELECT * query against this specific record returns the following row:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local&gt; select * from dfs.yelp.`kvgendata.json` where rownum=1;
+
++------------+---------------+------------+------------+------------+------------+
+|   rownum   | bigintegercol | varcharcol |  boolcol   | float8col  |  complex   |
++------------+---------------+------------+------------+------------+------------+
+| 1          | {&quot;int_1&quot;:1,&quot;int_2&quot;:2,&quot;int_3&quot;:3} | {&quot;varchar_1&quot;:&quot;abc&quot;,&quot;varchar_2&quot;:&quot;def&quot;,&quot;varchar_3&quot;:&quot;xyz&quot;} | {&quot;boolean_1&quot;:true,&quot;boolean_2&quot;:false,&quot;boolean_3&quot;:true} | {&quot;f8_1&quot;:1.1,&quot;f8_2&quot;:2.2} | [{&quot;col1&quot;:3},{&quot;col2&quot;:2,&quot;col3&quot;:1},{&quot;col1&quot;:7}] |
++------------+---------------+------------+------------+------------+------------+
+1 row selected (0.122 seconds)
+</code></pre></div>
+<p>You can use the KVGEN function to turn the maps in this data into key-value
+pairs. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local&gt; select kvgen(varcharcol) from dfs.yelp.`kvgendata.json`;
++------------+
+|   EXPR$0   |
++------------+
+| [{&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abc&quot;},{&quot;key&quot;:&quot;varchar_2&quot;,&quot;value&quot;:&quot;def&quot;},{&quot;key&quot;:&quot;varchar_3&quot;,&quot;value&quot;:&quot;xyz&quot;}] |
+| [{&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abcd&quot;}] |
+| [{&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abcde&quot;},{&quot;key&quot;:&quot;varchar_3&quot;,&quot;value&quot;:&quot;xyz&quot;},{&quot;key&quot;:&quot;varchar_4&quot;,&quot;value&quot;:&quot;xyz2&quot;}] |
+| [{&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abc&quot;},{&quot;key&quot;:&quot;varchar_2&quot;,&quot;value&quot;:&quot;def&quot;}] |
++------------+
+4 rows selected (0.091 seconds)
+</code></pre></div>
+<p>Now you can apply the FLATTEN function to break out the key-value pairs into
+distinct rows:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local&gt; select flatten(kvgen(varcharcol)) from dfs.yelp.`kvgendata.json`;
++------------+
+|   EXPR$0   |
++------------+
+| {&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abc&quot;} |
+| {&quot;key&quot;:&quot;varchar_2&quot;,&quot;value&quot;:&quot;def&quot;} |
+| {&quot;key&quot;:&quot;varchar_3&quot;,&quot;value&quot;:&quot;xyz&quot;} |
+| {&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abcd&quot;} |
+| {&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abcde&quot;} |
+| {&quot;key&quot;:&quot;varchar_3&quot;,&quot;value&quot;:&quot;xyz&quot;} |
+| {&quot;key&quot;:&quot;varchar_4&quot;,&quot;value&quot;:&quot;xyz2&quot;} |
+| {&quot;key&quot;:&quot;varchar_1&quot;,&quot;value&quot;:&quot;abc&quot;} |
+| {&quot;key&quot;:&quot;varchar_2&quot;,&quot;value&quot;:&quot;def&quot;} |
++------------+
+9 rows selected (0.151 seconds)
+</code></pre></div>
+<p>See the description of <a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a>
+for an example of a query against the flattened data.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html (added)
+++ drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,515 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lession 1: Learn about the Data Set - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lession 1: Learn about the Data Set</h1>
+
+</div>
+
+<div class="int_text" align="left"><h2 id="goal">Goal</h2>
+
+<p>This lesson is simply about discovering what data is available, in what
+format, using simple SQL SELECT statements. Drill is capable of analyzing data
+without prior knowledge or definition of its schema. This means that you can
+start querying data immediately (and even as it changes), regardless of its
+format.</p>
+
+<p>The data set for the tutorial consists of:</p>
+
+<ul>
+<li><p>Transactional data: stored as a Hive table</p></li>
+<li><p>Product catalog and master customer data: stored as MapR-DB tables</p></li>
+<li><p>Clickstream and logs data: stored in the MapR file system as JSON files</p></li>
+</ul>
+
+<h2 id="queries-in-this-lesson">Queries in This Lesson</h2>
+
+<p>This lesson consists of select * queries on each data source.</p>
+
+<h2 id="before-you-begin">Before You Begin</h2>
+
+<h3 id="start-sqlline">Start sqlline</h3>
+
+<p>If sqlline is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter <code>sqlline</code>:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">$ ssh root@10.250.0.6
+Password:
+Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+Welcome to your Mapr Demo virtual machine.
+[root@maprdemo ~]# sqlline
+sqlline version 1.1.6
+0: jdbc:drill:&gt;
+</code></pre></div>
+<p>You can run queries from this prompt to complete the tutorial. To exit from
+<code>sqlline</code>, type:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; !quit
+</code></pre></div>
+<p>Note that though this tutorial demonstrates the queries using SQLLine, you can
+also execute queries using the Drill Web UI.</p>
+
+<h3 id="list-the-available-workspaces-and-databases:">List the available workspaces and databases:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; show databases;
++-------------+
+| SCHEMA_NAME |
++-------------+
+| hive.default |
+| dfs.default |
+| dfs.logs    |
+| dfs.root    |
+| dfs.views   |
+| dfs.clicks  |
+| dfs.data    |
+| dfs.tmp     |
+| sys         |
+| maprdb      |
+| cp.default  |
+| INFORMATION_SCHEMA |
++-------------+
+12 rows selected
+</code></pre></div>
+<p>Note that this command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. This includes the Hive and
+MapR-DB databases as well as the workspaces configured in the file system. As
+you run queries in the tutorial, you will switch among these schemas by
+submitting the USE command. This behavior resembles the ability to use
+different database schemas (namespaces) in a relational database system.</p>
+
+<h2 id="query-hive-tables">Query Hive Tables</h2>
+
+<p>The orders table is a six-column Hive table defined in the Hive metastore.
+This is a Hive external table pointing to the data stored in flat files on the
+MapR file system. The orders table contains 122,000 rows.</p>
+
+<h3 id="set-the-schema-to-hive:">Set the schema to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use hive;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to &#39;hive&#39; |
++------------+------------+
+</code></pre></div>
+<p>You will run the USE command throughout this tutorial. The USE command sets
+the schema for the current session.</p>
+
+<h3 id="describe-the-table:">Describe the table:</h3>
+
+<p>You can use the DESCRIBE command to show the columns and data types for a Hive
+table:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; describe orders;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
++-------------+------------+-------------+
+| order_id    | BIGINT     | YES         |
+| month       | VARCHAR    | YES         |
+| cust_id     | BIGINT     | YES         |
+| state       | VARCHAR    | YES         |
+| prod_id     | BIGINT     | YES         |
+| order_total | INTEGER    | YES         |
++-------------+------------+-------------+
+</code></pre></div>
+<p>The DESCRIBE command returns complete schema information for Hive tables based
+on the metadata available in the Hive metastore.</p>
+
+<h3 id="select-5-rows-from-the-orders-table:">Select 5 rows from the orders table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from orders limit 5;
++------------+------------+------------+------------+------------+-------------+
+| order_id | month | cust_id | state | prod_id | order_total |
++------------+------------+------------+------------+------------+-------------+
+| 67212 | June | 10001 | ca | 909 | 13 |
+| 70302 | June | 10004 | ga | 420 | 11 |
+| 69090 | June | 10011 | fl | 44 | 76 |
+| 68834 | June | 10012 | ar | 0 | 81 |
+| 71220 | June | 10018 | az | 411 | 24 |
++------------+------------+------------+------------+------------+-------------+
+</code></pre></div>
+<p>Because orders is a Hive table, you can query the data in the same way that
+you would query the columns in a relational database table. Note the use of
+the standard LIMIT clause, which limits the result set to the specified number
+of rows. You can use LIMIT with or without an ORDER BY clause.</p>
+
+<p>Drill provides seamless integration with Hive by allowing queries on Hive
+tables defined in the metastore with no extra configuration. Note that Hive is
+not a prerequisite for Drill, but simply serves as a storage plugin or data
+source for Drill. Drill also lets users query all Hive file formats (including
+custom serdes). Additionally, any UDFs defined in Hive can be leveraged as
+part of Drill queries.</p>
+
+<p>Because Drill has its own low-latency SQL query execution engine, you can
+query Hive tables with high performance and support for interactive and ad-hoc
+data exploration.</p>
+
+<h2 id="query-mapr-db-and-hbase-tables">Query MapR-DB and HBase Tables</h2>
+
+<p>The customers and products tables are MapR-DB tables. MapR-DB is an enterprise
+in-Hadoop NoSQL database. It exposes the HBase API to support application
+development. Every MapR-DB table has a row_key, in addition to one or more
+column families. Each column family contains one or more specific columns. The
+row_key value is a primary key that uniquely identifies each row.</p>
+
+<p>Drill allows direct queries on MapR-DB and HBase tables. Unlike other SQL on
+Hadoop options, Drill requires no overlay schema definitions in Hive to work
+with this data. Think about a MapR-DB or HBase table with thousands of
+columns, such as a time-series database, and the pain of having to manage
+duplicate schemas for it in Hive!</p>
+
+<h3 id="products-table">Products Table</h3>
+
+<p>The products table has two column families.</p>
+
+<p>Column Family|Columns  </p>
+
+<p>---|---  </p>
+
+<p>details</p>
+
+<table><thead>
+<tr>
+</tr>
+</thead><tbody>
+</tbody></table>
+
+<p>name</p>
+
+<p>category  </p>
+
+<p>pricing</p>
+
+<table><thead>
+<tr>
+</tr>
+</thead><tbody>
+</tbody></table>
+
+<p>price  </p>
+
+<p>The products table contains 965 rows.</p>
+
+<h3 id="customers-table">Customers Table</h3>
+
+<p>The Customers table has three column families.</p>
+
+<table><thead>
+<tr>
+<th>Column Family</th>
+<th>Columns</th>
+</tr>
+</thead><tbody>
+<tr>
+<td>address</td>
+<td>state</td>
+</tr>
+<tr>
+<td>loyalty</td>
+<td>agg_rev</td>
+</tr>
+<tr>
+<td></td>
+<td>membership</td>
+</tr>
+<tr>
+<td>personal</td>
+<td>age</td>
+</tr>
+<tr>
+<td></td>
+<td>gender</td>
+</tr>
+</tbody></table>
+
+<p>The customers table contains 993 rows.</p>
+
+<h3 id="set-the-workspace-to-maprdb:">Set the workspace to maprdb:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use maprdb;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to &#39;maprdb&#39; |
++------------+------------+
+</code></pre></div>
+<h3 id="describe-the-tables:">Describe the tables:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; describe customers;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
++-------------+------------+-------------+
+| row_key     | ANY        | NO          |
+| address     | (VARCHAR(1), ANY) MAP | NO          |
+| loyalty     | (VARCHAR(1), ANY) MAP | NO          |
+| personal    | (VARCHAR(1), ANY) MAP | NO          |
++-------------+------------+-------------+
+
+0: jdbc:drill:&gt; describe products;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
++-------------+------------+-------------+
+| row_key     | ANY        | NO          |
+| details     | (VARCHAR(1), ANY) MAP | NO          |
+| pricing     | (VARCHAR(1), ANY) MAP | NO          |
++-------------+------------+-------------+
+</code></pre></div>
+<p>Unlike the Hive example, the DESCRIBE command does not return the full schema
+up to the column level. Wide-column NoSQL databases such as MapR-DB and HBase
+can be schema-less by design; every row has its own set of column name-value
+pairs in a given column family, and the column value can be of any data type,
+as determined by the application inserting the data.</p>
+
+<p>A “MAP” complex type in Drill represents this variable column name-value
+structure, and “ANY” represents the fact that the column value can be of any
+data type. Observe the row_key, which is also simply bytes and has the type
+ANY.</p>
+
+<h3 id="select-5-rows-from-the-products-table:">Select 5 rows from the products table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from products limit 5;
++------------+------------+------------+
+| row_key | details | pricing |
++------------+------------+------------+
+| [B@a1a3e25 | {&quot;category&quot;:&quot;bGFwdG9w&quot;,&quot;name&quot;:&quot;IlNvbnkgbm90ZWJvb2si&quot;} | {&quot;price&quot;:&quot;OTU5&quot;} |
+| [B@103a43af | {&quot;category&quot;:&quot;RW52ZWxvcGVz&quot;,&quot;name&quot;:&quot;IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw==&quot;} | {&quot;price&quot;:&quot;MT |
+| [B@61319e7b | {&quot;category&quot;:&quot;U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==&quot;,&quot;name&quot;:&quot;MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs&quot;} | {&quot;price&quot; |
+| [B@9bcf17 | {&quot;category&quot;:&quot;TGFiZWxz&quot;,&quot;name&quot;:&quot;QXZlcnkgNDk4&quot;} | {&quot;price&quot;:&quot;Mw==&quot;} |
+| [B@7538ef50 | {&quot;category&quot;:&quot;TGFiZWxz&quot;,&quot;name&quot;:&quot;QXZlcnkgNDk=&quot;} | {&quot;price&quot;:&quot;Mw==&quot;} |
+</code></pre></div>
+<p>Given that Drill requires no up front schema definitions indicating data
+types, the query returns the raw byte arrays for column values, just as they
+are stored in MapR-DB (or HBase). Observe that the column families (details
+and pricing) have the map data type and appear as JSON strings.</p>
+
+<p>In Lesson 2, you will use CAST functions to return typed data for each column.</p>
+
+<h3 id="select-5-rows-from-the-customers-table:">Select 5 rows from the customers table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">+0: jdbc:drill:&gt; select * from customers limit 5;
++------------+------------+------------+------------+
+| row_key | address | loyalty | personal |
++------------+------------+------------+------------+
+| [B@284bae62 | {&quot;state&quot;:&quot;Imt5Ig==&quot;} | {&quot;agg_rev&quot;:&quot;IjEwMDEtMzAwMCI=&quot;,&quot;membership&quot;:&quot;ImJhc2ljIg==&quot;} | {&quot;age&quot;:&quot;IjI2LTM1Ig==&quot;,&quot;gender&quot;:&quot;Ik1B |
+| [B@7ffa4523 | {&quot;state&quot;:&quot;ImNhIg==&quot;} | {&quot;agg_rev&quot;:&quot;IjAtMTAwIg==&quot;,&quot;membership&quot;:&quot;ImdvbGQi&quot;} | {&quot;age&quot;:&quot;IjI2LTM1Ig==&quot;,&quot;gender&quot;:&quot;IkZFTUFMRSI= |
+| [B@7d13e79 | {&quot;state&quot;:&quot;Im9rIg==&quot;} | {&quot;agg_rev&quot;:&quot;IjUwMS0xMDAwIg==&quot;,&quot;membership&quot;:&quot;InNpbHZlciI=&quot;} | {&quot;age&quot;:&quot;IjI2LTM1Ig==&quot;,&quot;gender&quot;:&quot;IkZFT |
+| [B@3a5c7df1 | {&quot;state&quot;:&quot;ImtzIg==&quot;} | {&quot;agg_rev&quot;:&quot;IjMwMDEtMTAwMDAwIg==&quot;,&quot;membership&quot;:&quot;ImdvbGQi&quot;} | {&quot;age&quot;:&quot;IjUxLTEwMCI=&quot;,&quot;gender&quot;:&quot;IkZF |
+| [B@e507726 | {&quot;state&quot;:&quot;Im5qIg==&quot;} | {&quot;agg_rev&quot;:&quot;IjAtMTAwIg==&quot;,&quot;membership&quot;:&quot;ImJhc2ljIg==&quot;} | {&quot;age&quot;:&quot;IjIxLTI1Ig==&quot;,&quot;gender&quot;:&quot;Ik1BTEUi&quot; |
++------------+------------+------------+------------+
+</code></pre></div>
+<p>Again the table returns byte data that needs to be cast to readable data
+types.</p>
+
+<h2 id="query-the-file-system">Query the File System</h2>
+
+<p>Along with querying a data source with full schemas (such as Hive) and partial
+schemas (such as MapR-DB and HBase), Drill offers the unique capability to
+perform SQL queries directly on file system. The file system could be a local
+file system, or a distributed file system such as MapR-FS, HDFS, or S3.</p>
+
+<p>In the context of Drill, a file or a directory is considered as synonymous to
+a relational database “table.” Therefore, you can perform SQL operations
+directly on files and directories without the need for up-front schema
+definitions or schema management for any model changes. The schema is
+discovered on the fly based on the query. Drill supports queries on a variety
+of file formats including text, CSV, Parquet, and JSON in the 0.5 release.</p>
+
+<p>In this example, the clickstream data coming from the mobile/web applications
+is in JSON format. The JSON files have the following structure:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{&quot;trans_id&quot;:31920,&quot;date&quot;:&quot;2014-04-26&quot;,&quot;time&quot;:&quot;12:17:12&quot;,&quot;user_info&quot;:{&quot;cust_id&quot;:22526,&quot;device&quot;:&quot;IOS5&quot;,&quot;state&quot;:&quot;il&quot;},&quot;trans_info&quot;:{&quot;prod_id&quot;:[174,2],&quot;purch_flag&quot;:&quot;false&quot;}}
+{&quot;trans_id&quot;:31026,&quot;date&quot;:&quot;2014-04-20&quot;,&quot;time&quot;:&quot;13:50:29&quot;,&quot;user_info&quot;:{&quot;cust_id&quot;:16368,&quot;device&quot;:&quot;AOS4.2&quot;,&quot;state&quot;:&quot;nc&quot;},&quot;trans_info&quot;:{&quot;prod_id&quot;:[],&quot;purch_flag&quot;:&quot;false&quot;}}
+{&quot;trans_id&quot;:33848,&quot;date&quot;:&quot;2014-04-10&quot;,&quot;time&quot;:&quot;04:44:42&quot;,&quot;user_info&quot;:{&quot;cust_id&quot;:21449,&quot;device&quot;:&quot;IOS6&quot;,&quot;state&quot;:&quot;oh&quot;},&quot;trans_info&quot;:{&quot;prod_id&quot;:[582],&quot;purch_flag&quot;:&quot;false&quot;}}
+</code></pre></div>
+<p>The clicks.json and clicks.campaign.json files contain metadata as part of the
+data itself (referred to as “self-describing” data). Also note that the data
+elements are complex, or nested. The initial queries below do not show how to
+unpack the nested data, but they show that easy access to the data requires no
+setup beyond the definition of a workspace.</p>
+
+<h3 id="query-nested-clickstream-data">Query nested clickstream data</h3>
+
+<h4 id="set-the-workspace-to-dfs.clicks:">Set the workspace to dfs.clicks:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> 0: jdbc:drill:&gt; use dfs.clicks;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to &#39;dfs.clicks&#39; |
++------------+------------+
+</code></pre></div>
+<p>In this case, setting the workspace is a mechanism for making queries easier
+to write. When you specify a file system workspace, you can shorten references
+to files in the FROM clause of your queries. Instead of having to provide the
+complete path to a file, you can provide the path relative to a directory
+location specified in the workspace. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">&quot;location&quot;: &quot;/mapr/demo.mapr.com/data/nested&quot;
+</code></pre></div>
+<p>Any file or directory that you want to query in this path can be referenced
+relative to this path. The clicks directory referred to in the following query
+is directly below the nested directory.</p>
+
+<h4 id="select-2-rows-from-the-clicks.json-file:">Select 2 rows from the clicks.json file:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from `clicks/clicks.json` limit 2;
++------------+------------+------------+------------+------------+
+|  trans_id  |    date    |    time    | user_info  | trans_info |
++------------+------------+------------+------------+------------+
+| 31920      | 2014-04-26 | 12:17:12   | {&quot;cust_id&quot;:22526,&quot;device&quot;:&quot;IOS5&quot;,&quot;state&quot;:&quot;il&quot;} | {&quot;prod_id&quot;:[174,2],&quot;purch_flag&quot;:&quot;false&quot;} |
+| 31026      | 2014-04-20 | 13:50:29   | {&quot;cust_id&quot;:16368,&quot;device&quot;:&quot;AOS4.2&quot;,&quot;state&quot;:&quot;nc&quot;} | {&quot;prod_id&quot;:[],&quot;purch_flag&quot;:&quot;false&quot;} |
++------------+------------+------------+------------+------------+
+2 rows selected
+</code></pre></div>
+<p>Note that the FROM clause reference points to a specific file. Drill expands
+the traditional concept of a “table reference” in a standard SQL FROM clause
+to refer to a file in a local or distributed file system.</p>
+
+<p>The only special requirement is the use of back ticks to enclose the file
+path. This is necessary whenever the file path contains Drill reserved words
+or characters.</p>
+
+<h4 id="select-2-rows-from-the-campaign.json-file:">Select 2 rows from the campaign.json file:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from `clicks/clicks.campaign.json` limit 2;
++------------+------------+------------+------------+------------+------------+
+|  trans_id  |    date    |    time    | user_info  |  ad_info   | trans_info |
++------------+------------+------------+------------+------------+------------+
+| 35232      | 2014-05-10 | 00:13:03   | {&quot;cust_id&quot;:18520,&quot;device&quot;:&quot;AOS4.3&quot;,&quot;state&quot;:&quot;tx&quot;} | {&quot;camp_id&quot;:&quot;null&quot;} | {&quot;prod_id&quot;:[7,7],&quot;purch_flag&quot;:&quot;true&quot;} |
+| 31995      | 2014-05-22 | 16:06:38   | {&quot;cust_id&quot;:17182,&quot;device&quot;:&quot;IOS6&quot;,&quot;state&quot;:&quot;fl&quot;} | {&quot;camp_id&quot;:&quot;null&quot;} | {&quot;prod_id&quot;:[],&quot;purch_flag&quot;:&quot;false&quot;} |
++------------+------------+------------+------------+------------+------------+
+2 rows selected
+</code></pre></div>
+<p>Notice that with a select * query, any complex data types such as maps and
+arrays return as JSON strings. You will see how to unpack this data using
+various SQL functions and operators in the next lesson.</p>
+
+<h2 id="query-logs-data">Query Logs Data</h2>
+
+<p>Unlike the previous example where we performed queries against clicks data in
+one file, logs data is stored as partitioned directories on the file system.
+The logs directory has three subdirectories:</p>
+
+<ul>
+<li><p>2012</p></li>
+<li><p>2013</p></li>
+<li><p>2014</p></li>
+</ul>
+
+<p>Each of these year directories fans out to a set of numbered month
+directories, and each month directory contains a JSON file with log records
+for that month. The total number of records in all log files is 48000.</p>
+
+<p>The files in the logs directory and its subdirectories are JSON files. There
+are many of these files, but you can use Drill to query them all as a single
+data source, or to query a subset of the files.</p>
+
+<h4 id="set-the-workspace-to-dfs.logs:">Set the workspace to dfs.logs:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> 0: jdbc:drill:&gt; use dfs.logs;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to &#39;dfs.logs&#39; |
++------------+------------+
+</code></pre></div>
+<h4 id="select-2-rows-from-the-logs-directory:">Select 2 rows from the logs directory:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from logs limit 2;
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+| dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_fl |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+| 2014 | 8 | 24181 | 08/02/2014 | 09:23:52 | 0 | IOS5 | il | 2 | wait | 128 | false |
+| 2014 | 8 | 24195 | 08/02/2014 | 07:58:19 | 243 | IOS5 | mo | 6 | hmm | 107 | false |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+</code></pre></div>
+<p>Note that this is flat JSON data. The dfs.clicks workspace location property
+points to a directory that contains the logs directory, making the FROM clause
+reference for this query very simple. You do not have to refer to the complete
+directory path on the file system.</p>
+
+<p>The column names dir0 and dir1 are special Drill variables that identify
+subdirectories below the logs directory. In Lesson 3, you will do more complex
+queries that leverage these dynamic variables.</p>
+
+<h4 id="find-the-total-number-of-rows-in-the-logs-directory-(all-files):">Find the total number of rows in the logs directory (all files):</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select count(*) from logs;
++------------+
+| EXPR$0 |
++------------+
+| 48000 |
++------------+
+</code></pre></div>
+<p>This query traverses all of the files in the logs directory and its
+subdirectories to return the total number of rows in those files.</p>
+
+<h1 id="what&#39;s-next">What&#39;s Next</h1>
+
+<p>Go to <a href="/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL">Lesson 2: Run Queries with ANSI
+SQL</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html (added)
+++ drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,459 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lession 2: Run Queries with ANSI SQL - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lession 2: Run Queries with ANSI SQL</h1>
+
+</div>
+
+<div class="int_text" align="left"><h2 id="goal">Goal</h2>
+
+<p>This lesson shows how to do some standard SQL analysis in Apache Drill: for
+example, summarizing data by using simple aggregate functions and connecting
+data sources by using joins. Note that Apache Drill provides ANSI SQL support,
+not a “SQL-like” interface.</p>
+
+<h2 id="queries-in-this-lesson">Queries in This Lesson</h2>
+
+<p>Now that you know what the data sources look like in their raw form, using
+select * queries, try running some simple but more useful queries on each data
+source. These queries demonstrate how Drill supports ANSI SQL constructs and
+also how you can combine data from different data sources in a single SELECT
+statement.</p>
+
+<ul>
+<li><p>Show an aggregate query on a single file or table. Use GROUP BY, WHERE, HAVING, and ORDER BY clauses.</p></li>
+<li><p>Perform joins between Hive, MapR-DB, and file system data sources.</p></li>
+<li><p>Use table and column aliases.</p></li>
+<li><p>Create a Drill view.</p></li>
+</ul>
+
+<h2 id="aggregation">Aggregation</h2>
+
+<h3 id="set-the-schema-to-hive:">Set the schema to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use hive;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | Default schema changed to &#39;hive&#39; |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-sales-totals-by-month:">Return sales totals by month:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select `month`, sum(order_total)
+from orders group by `month` order by 2 desc;
++------------+------------+
+| month | EXPR$1 |
++------------+------------+
+| June | 950481 |
+| May | 947796 |
+| March | 836809 |
+| April | 807291 |
+| July | 757395 |
+| October | 676236 |
+| August | 572269 |
+| February | 532901 |
+| September | 373100 |
+| January | 346536 |
++------------+------------+
+</code></pre></div>
+<p>Drill supports SQL aggregate functions such as SUM, MAX, AVG, and MIN.
+Standard SQL clauses work in the same way in Drill queries as in relational
+database queries.</p>
+
+<p>Note that back ticks are required for the “month” column only because “month”
+is a reserved word in SQL.</p>
+
+<h3 id="return-the-top-20-sales-totals-by-month-and-state:">Return the top 20 sales totals by month and state:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select `month`, state, sum(order_total) as sales from orders group by `month`, state
+order by 3 desc limit 20;
++------------+------------+------------+
+|   month    |   state    |   sales    |
++------------+------------+------------+
+| May        | ca         | 119586     |
+| June       | ca         | 116322     |
+| April      | ca         | 101363     |
+| March      | ca         | 99540      |
+| July       | ca         | 90285      |
+| October    | ca         | 80090      |
+| June       | tx         | 78363      |
+| May        | tx         | 77247      |
+| March      | tx         | 73815      |
+| August     | ca         | 71255      |
+| April      | tx         | 68385      |
+| July       | tx         | 63858      |
+| February   | ca         | 63527      |
+| June       | fl         | 62199      |
+| June       | ny         | 62052      |
+| May        | fl         | 61651      |
+| May        | ny         | 59369      |
+| October    | tx         | 55076      |
+| March      | fl         | 54867      |
+| March      | ny         | 52101      |
++------------+------------+------------+
+20 rows selected
+</code></pre></div>
+<p>Note the alias for the result of the SUM function. Drill supports column
+aliases and table aliases.</p>
+
+<h2 id="having-clause">HAVING Clause</h2>
+
+<p>This query uses the HAVING clause to constrain an aggregate result.</p>
+
+<h3 id="set-the-workspace-to-dfs.clicks">Set the workspace to dfs.clicks</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use dfs.clicks;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | Default schema changed to &#39;dfs.clicks&#39; |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-total-number-of-clicks-for-devices-that-indicate-high-click-throughs:">Return total number of clicks for devices that indicate high click-throughs:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select t.user_info.device, count(*) from `clicks/clicks.json` t 
+group by t.user_info.device
+having count(*) &gt; 1000;
++------------+------------+
+|   EXPR$0   |   EXPR$1   |
++------------+------------+
+| IOS5       | 11814      |
+| AOS4.2     | 5986       |
+| IOS6       | 4464       |
+| IOS7       | 3135       |
+| AOS4.4     | 1562       |
+| AOS4.3     | 3039       |
++------------+------------+
+</code></pre></div>
+<p>The aggregate is a count of the records for each different mobile device in
+the clickstream data. Only the activity for the devices that registered more
+than 1000 transactions qualify for the result set.</p>
+
+<h2 id="union-operator">UNION Operator</h2>
+
+<p>Use the same workspace as before (dfs.clicks).</p>
+
+<h3 id="combine-clicks-activity-from-before-and-after-the-marketing-campaign">Combine clicks activity from before and after the marketing campaign</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select t.trans_id transaction, t.user_info.cust_id customer from `clicks/clicks.campaign.json` t 
+union all 
+select u.trans_id, u.user_info.cust_id  from `clicks/clicks.json` u limit 5;
++-------------+------------+
+| transaction |  customer  |
++-------------+------------+
+| 35232       | 18520      |
+| 31995       | 17182      |
+| 35760       | 18228      |
+| 37090       | 17015      |
+| 37838       | 18737      |
++-------------+------------+
+</code></pre></div>
+<p>This UNION ALL query returns rows that exist in two files (and includes any
+duplicate rows from those files): <code>clicks.campaign.json</code> and <code>clicks.json</code>.</p>
+
+<h2 id="subqueries">Subqueries</h2>
+
+<h3 id="set-the-workspace-to-hive:">Set the workspace to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use hive;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | Default schema changed to &#39;hive&#39; |
++------------+------------+
+</code></pre></div>
+<h3 id="compare-order-totals-across-states:">Compare order totals across states:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select o1.cust_id, sum(o1.order_total) as ny_sales,
+(select sum(o2.order_total) from hive.orders o2
+where o1.cust_id=o2.cust_id and state=&#39;ca&#39;) as ca_sales
+from hive.orders o1 where o1.state=&#39;ny&#39; group by o1.cust_id
+order by cust_id limit 20;
++------------+------------+------------+
+|  cust_id   |  ny_sales  |  ca_sales  |
++------------+------------+------------+
+| 1001       | 72         | 47         |
+| 1002       | 108        | 198        |
+| 1003       | 83         | null       |
+| 1004       | 86         | 210        |
+| 1005       | 168        | 153        |
+| 1006       | 29         | 326        |
+| 1008       | 105        | 168        |
+| 1009       | 443        | 127        |
+| 1010       | 75         | 18         |
+| 1012       | 110        | null       |
+| 1013       | 19         | null       |
+| 1014       | 106        | 162        |
+| 1015       | 220        | 153        |
+| 1016       | 85         | 159        |
+| 1017       | 82         | 56         |
+| 1019       | 37         | 196        |
+| 1020       | 193        | 165        |
+| 1022       | 124        | null       |
+| 1023       | 166        | 149        |
+| 1024       | 233        | null       |
++------------+------------+------------+
+</code></pre></div>
+<p>This example demonstrates Drill support for correlated subqueries. This query
+uses a subquery in the select list and correlates the result of the subquery
+with the outer query, using the cust_id column reference. The subquery returns
+the sum of order totals for California, and the outer query returns the
+equivalent sum, for the same cust_id, for New York.</p>
+
+<p>The result set is sorted by the cust_id and presents the sales totals side by
+side for easy comparison. Null values indicate customer IDs that did not
+register any sales in that state.</p>
+
+<h2 id="cast-function">CAST Function</h2>
+
+<h3 id="use-the-maprdb-workspace:">Use the maprdb workspace:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use maprdb;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | Default schema changed to &#39;maprdb&#39; |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-customer-data-with-appropriate-data-types">Return customer data with appropriate data types</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select cast(row_key as int) as cust_id, cast(t.personal.name as varchar(20)) as name, 
+cast(t.personal.gender as varchar(10)) as gender, cast(t.personal.age as varchar(10)) as age,
+cast(t.address.state as varchar(4)) as state, cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev, 
+cast(t.loyalty.membership as varchar(20)) as membership
+from customers t limit 5;
++------------+------------+------------+------------+------------    +------------+------------+
+|  cust_id   |    name    |   gender   |    age     |   state    |  agg_rev   | membership |
++------------+------------+------------+------------+------------+------------+------------+
+| 10001      | &quot;Corrine Mecham&quot; | &quot;FEMALE&quot;   | &quot;15-20&quot;    | &quot;va&quot;       | 197.00     | &quot;silver&quot;   |
+| 10005      | &quot;Brittany Park&quot; | &quot;MALE&quot;     | &quot;26-35&quot;    | &quot;in&quot;       | 230.00     | &quot;silver&quot;   |
+| 10006      | &quot;Rose Lokey&quot; | &quot;MALE&quot;     | &quot;26-35&quot;    | &quot;ca&quot;       | 250.00     | &quot;silver&quot;   |
+| 10007      | &quot;James Fowler&quot; | &quot;FEMALE&quot;   | &quot;51-100&quot;   | &quot;me&quot;       | 263.00     | &quot;silver&quot;   |
+| 10010      | &quot;Guillermo Koehler&quot; | &quot;OTHER&quot;    | &quot;51-100&quot;   | &quot;mn&quot;       | 202.00     | &quot;silver&quot;   |
++------------+------------+------------+------------+------------+------------+------------+
+5 rows selected
+</code></pre></div>
+<p>Note the following features of this query:</p>
+
+<ul>
+<li>The CAST function is required for every column in the table. This function returns the MapR-DB/HBase binary data as readable integers and strings. Alternatively, you can use CONVERT_TO/CONVERT_FROM functions to decode the columns. CONVERT_TO and CONVERT_FROM are more efficient than CAST in most cases.</li>
+<li>The row_key column functions as the primary key of the table (a customer ID in this case).</li>
+<li>The table alias t is required; otherwise the column family names would be parsed as table names and the query would return an error.</li>
+</ul>
+
+<h3 id="remove-the-quotes-from-the-strings:">Remove the quotes from the strings:</h3>
+
+<p>You can use the regexp_replace function to remove the quotes around the
+strings in the query results. For example, to return a state name va instead
+of “va”:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select cast(row_key as int), regexp_replace(cast(t.address.state as varchar(10)),&#39;&quot;&#39;,&#39;&#39;)
+from customers t limit 1;
++------------+------------+
+|   EXPR$0   |   EXPR$1   |
++------------+------------+
+| 10001      | va         |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h2 id="create-view-command">CREATE VIEW Command</h2>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; use dfs.views;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to &#39;dfs.views&#39; |
++------------+------------+
+</code></pre></div>
+<h3 id="use-a-mutable-workspace:">Use a mutable workspace:</h3>
+
+<p>A mutable (or writable) workspace is a workspace that is enabled for “write”
+operations. This attribute is part of the storage plugin configuration. You
+can create Drill views and tables in mutable workspaces.</p>
+
+<h3 id="create-a-view-on-a-mapr-db-table">Create a view on a MapR-DB table</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; create or replace view custview as select cast(row_key as int) as cust_id,
+cast(t.personal.name as varchar(20)) as name, 
+cast(t.personal.gender as varchar(10)) as gender, 
+cast(t.personal.age as varchar(10)) as age, 
+cast(t.address.state as varchar(4)) as state,
+cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+cast(t.loyalty.membership as varchar(20)) as membership
+from maprdb.customers t;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | View &#39;custview&#39; replaced successfully in &#39;dfs.views&#39; schema |
++------------+------------+
+1 row selected
+</code></pre></div>
+<p>Drill provides CREATE OR REPLACE VIEW syntax similar to relational databases
+to create views. Use the OR REPLACE option to make it easier to update the
+view later without having to remove it first. Note that the FROM clause in
+this example must refer to maprdb.customers. The MapR-DB tables are not
+directly visible to the dfs.views workspace.</p>
+
+<p>Unlike a traditional database where views typically are DBA/developer-driven
+operations, file system-based views in Drill are very lightweight. A view is
+simply a special file with a specific extension (.drill). You can store views
+even in your local file system or point to a specific workspace. You can
+specify any query against any Drill data source in the body of the CREATE VIEW
+statement.</p>
+
+<p>Drill provides a decentralized metadata model. Drill is able to query metadata
+defined in data sources such as Hive, HBase, and the file system. Drill also
+supports the creation of metadata in the file system.</p>
+
+<h3 id="query-data-from-the-view:">Query data from the view:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select * from custview limit 1;
++------------+------------+------------+------------+------------+------------+------------+
+|  cust_id   |    name    |   gender   |    age     |   state    |  agg_rev   | membership |
++------------+------------+------------+------------+------------+------------+------------+
+| 10001      | &quot;Corrine Mecham&quot; | &quot;FEMALE&quot;   | &quot;15-20&quot;    | &quot;va&quot;       | 197.00     | &quot;silver&quot;   |
++------------+------------+------------+------------+------------+------------+------------+
+</code></pre></div>
+<p>Once the users get an idea on what data is available by exploring it directly
+from file system , views can be used as a way to take the data in downstream
+tools like Tableau, Microstrategy etc for downstream analysis and
+visualization. For these tools, a view appears simply as a “table” with
+selectable “columns” in it.</p>
+
+<h2 id="query-across-data-sources">Query Across Data Sources</h2>
+
+<p>Continue using dfs.views for this query.</p>
+
+<h3 id="join-the-customers-view-and-the-orders-table:">Join the customers view and the orders table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select membership, sum(order_total) as sales from hive.orders, custview
+where orders.cust_id=custview.cust_id
+group by membership order by 2;
++------------+------------+
+| membership |   sales    |
++------------+------------+
+| &quot;basic&quot;    | 380665     |
+| &quot;silver&quot;   | 708438     |
+| &quot;gold&quot;     | 2787682    |
++------------+------------+
+3 rows selected
+</code></pre></div>
+<p>In this query, we are reading data from a MapR-DB table (represented by
+custview) and combining it with the order information in Hive. When doing
+cross data source queries such as this, you need to use fully qualified
+table/view names. For example, the orders table is prefixed by “hive,” which
+is the storage plugin name registered with Drill. We are not using any prefix
+for “custview” because we explicitly switched the dfs.views workspace where
+custview is stored.</p>
+
+<p>Note: If the results of any of your queries appear to be truncated because the
+rows are wide, set the maximum width of the display to 10000:</p>
+
+<p>Do not use a semicolon for this SET command.</p>
+
+<h3 id="join-the-customers,-orders,-and-clickstream-data:">Join the customers, orders, and clickstream data:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:&gt; select custview.membership, sum(orders.order_total) as sales from hive.orders, custview,
+dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c 
+where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id 
+group by custview.membership order by 2;
++------------+------------+
+| membership |   sales    |
++------------+------------+
+| &quot;basic&quot;    | 372866     |
+| &quot;silver&quot;   | 728424     |
+| &quot;gold&quot;     | 7050198    |
++------------+------------+
+3 rows selected
+</code></pre></div>
+<p>This three-way join selects from three different data sources in one query:</p>
+
+<ul>
+<li>hive.orders table</li>
+<li>custview (a view of the HBase customers table)</li>
+<li>clicks.json file</li>
+</ul>
+
+<p>The join column for both sets of join conditions is the cust_id column. The
+views workspace is used for this query so that custview can be accessed. The
+hive.orders table is also visible to the query.</p>
+
+<p>However, note that the JSON file is not directly visible from the views
+workspace, so the query specifies the full path to the file:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
+</code></pre></div>
+<h1 id="what&#39;s-next">What&#39;s Next</h1>
+
+<p>Go to <a href="/confluence/display/DRILL/%0ALesson+3%3A+Run+Queries+on+Complex+Data+Types">Lesson 3: Run Queries on Complex Data Types</a>. </p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>