You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by ts...@apache.org on 2015/01/15 06:11:48 UTC
svn commit: r1651949 [7/13] - in /drill/site/trunk/content/drill: ./
blog/2014/11/19/sql-on-mongodb/ blog/2014/12/02/drill-top-level-project/
blog/2014/12/09/running-sql-queries-on-amazon-s3/
blog/2014/12/11/apache-drill-qa-panelist-spotlight/ blog/201...
Added: drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-drill-on-windows/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,142 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing Drill on Windows - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing Drill on Windows</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>You can install Drill on Windows 7 or 8. To install Drill on Windows, you must
+have JDK 7, and you must set the <code>JAVA_HOME</code> path in the Windows Environment
+Variables. You must also have a utility, such as
+<a href="http://www.7-zip.org/">7-zip</a>, installed on your machine. These instructions
+assume that the <a href="http://www.7-zip.org/">7-zip</a> decompression utility is
+installed to extract the Drill archive file that you download.</p>
+
+<h4 id="setting-java_home">Setting JAVA_HOME</h4>
+
+<p>Complete the following steps to set <code>JAVA_HOME</code>:</p>
+
+<ol>
+<li>Navigate to <code>Control Panel\All Control Panel Items\System</code>, and select <strong>Advanced System Settings</strong>. The System Properties window appears.</li>
+<li>On the Advanced tab, click <strong>Environment Variables</strong>. The Environment Variables window appears.</li>
+<li><p>Add/Edit <code>JAVA_HOME</code> to point to the location where the JDK software is located.</p>
+
+<p><strong>Example</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">C:\Program Files\Java\jdk1.7.0_65
+</code></pre></div></li>
+<li><p>Click <strong>OK</strong> to exit the windows.</p></li>
+</ol>
+
+<h4 id="installing-drill">Installing Drill</h4>
+
+<p>Complete the following steps to install Drill:</p>
+
+<ol>
+<li><p>Create a <code>drill</code> directory on your <code>C:\</code> drive, (or in some other location if you prefer).</p>
+
+<p><strong>Example</strong></p>
+
+<p>Do not include spaces in your directory path. If you include spaces in the
+directory path, Drill fails to run.</p></li>
+<li><p>Click the following link to download the latest, stable version of Apache Drill:</p>
+
+<p><a href="http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz">http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz</a></p></li>
+<li><p>Move the <code>apache-drill-<version>.tar.gz</code> file to the <code>drill</code> directory that you created on your <code>C:\</code> drive.</p></li>
+<li><p>Unzip the <code>TAR.GZ</code> file and the resulting <code>TAR</code> file. </p>
+
+<p>a. Right-click <code>apache-drill-<version>.tar.gz,</code> and select<code>7-Zip>Extract Here</code>. The utility extracts the <code>apache-drill-<version>.tar</code> file.
+b. Right-click <code>apache-drill-<version>.tar,</code>and select`<code>7-Zip>Extract Here</code>. <code>The utility extracts the</code> apache-drill-<version> `folder.</p></li>
+<li><p>Open the <code>apache-drill-<version></code>folder.</p></li>
+<li><p>Open the <code>bin</code> folder, and double-click on the <code>sqlline.bat</code> file. The Windows command prompt opens.</p></li>
+<li><p>At the <code>sqlline></code> prompt, type <code>!connect jdbc:drill:zk=local</code> and then press <code>Enter</code>.</p></li>
+<li><p>Enter the username and password.
+a. When prompted, enter the user name <code>admin</code> and then press Enter.
+b. When prompted, enter the password <code>admin</code> and then press Enter. The cursor blinks for a few seconds and then <code>0: jdbc:drill:zk=local></code>displays in the prompt.</p></li>
+</ol>
+
+<p>At this point, you can submit queries to Drill. Refer to the <a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minute%0As#ApacheDrillin10Minutes-QuerySampleData">Query Sample Dat
+a</a> section of this document.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-apache-drill-sandbox/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,141 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the Apache Drill Sandbox - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the Apache Drill Sandbox</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>This tutorial uses the MapR Sandbox, which is a Hadoop environment pre-configured with Apache Drill.</p>
+
+<p>To complete the tutorial on the MapR Sandbox with Apache Drill, work through
+the following pages in order:</p>
+
+<ul>
+<li><a href="/confluence/display/DRILL/Installing+the+Apache+Drill+Sandbox">Installing the Apache Drill Sandbox</a></li>
+<li><a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill Setup</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set">Lesson 1: Learn About the Data Set</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL">Lesson 2: Run Queries with ANSI SQL</a></li>
+<li><a href="/confluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types">Lesson 3: Run Queries on Complex Data Types</a></li>
+<li><a href="/confluence/display/DRILL/Summary">Summary</a></li>
+</ul>
+
+<h1 id="about-apache-drill">About Apache Drill</h1>
+
+<p>Drill is an Apache open-source SQL query engine for Big Data exploration.
+Drill is designed from the ground up to support high-performance analysis on
+the semi-structured and rapidly evolving data coming from modern Big Data
+applications, while still providing the familiarity and ecosystem of ANSI SQL,
+the industry-standard query language. Drill provides plug-and-play integration
+with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers
+the following key features:</p>
+
+<ul>
+<li><p>Low-latency SQL queries</p></li>
+<li><p>Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore.</p></li>
+<li><p>ANSI SQL</p></li>
+<li><p>Nested data support</p></li>
+<li><p>Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs)</p></li>
+<li><p>BI/SQL tool integration using standard JDBC/ODBC drivers</p></li>
+</ul>
+
+<h1 id="mapr-sandbox-with-apache-drill">MapR Sandbox with Apache Drill</h1>
+
+<p>MapR includes Apache Drill as part of the Hadoop distribution. The MapR
+Sandbox with Apache Drill is a fully functional single-node cluster that can
+be used to get an overview on Apache Drill in a Hadoop environment. Business
+and technical analysts, product managers, and developers can use the sandbox
+environment to get a feel for the power and capabilities of Apache Drill by
+performing various types of queries. Once you get a flavor for the technology,
+refer to the <a href="http://incubator.apache.org/drill/">Apache Drill web site</a> and
+<a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki">Apache Drill documentation
+</a>for more
+details.</p>
+
+<p>Note that Hadoop is not a prerequisite for Drill and users can start ramping
+up with Drill by running SQL queries directly on the local file system. Refer
+to <a href="https://cwiki.apache.org/confluence/display/DR%0AILL/Apache+Drill+in+10+Minutes">Apache Drill in 10 minutes</a> for an introduction to using Drill in local
+(embedded) mode.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-virtualbox/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,155 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the MapR Sandbox with Apache Drill on VirtualBox - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the MapR Sandbox with Apache Drill on VirtualBox</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>The MapR Sandbox for Apache Drill on VirtualBox comes with NAT port forwarding
+enabled, which allows you to access the sandbox using localhost as hostname.</p>
+
+<p>Complete the following steps to install the MapR Sandbox with Apache Drill on
+VirtualBox:</p>
+
+<ol>
+<li><p>Download the MapR Sandbox with Apache Drill file to a directory on your machine:<br>
+<a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill</a></p></li>
+<li><p>Open the virtual machine player.</p></li>
+<li><p>Select <strong>File > Import Appliance</strong>. The Import Virtual Appliance dialog appears.</p>
+
+<p><img src="../../../img/vbImport.png" alt=""></p></li>
+<li><p>Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill and click <strong>Next</strong>. The Appliance Settings window appears.</p>
+
+<p><img src="../../../img/vbapplSettings.png" alt=""></p></li>
+<li><p>Select the check box at the bottom of the screen: <strong>Reinitialize the MAC address of all network cards</strong>, then click <strong>Import</strong>. The Import Appliance imports the sandbox.</p></li>
+<li><p>When the import completes, select <strong>File > Preferences</strong>. The VirtualBox - Settings dialog appears.</p>
+
+<p><img src="../../../img/vbNetwork.png" alt=""></p>
+
+<ol>
+<li>Select <strong>Network</strong>. </li>
+</ol>
+
+<p>The correct setting depends on your network connectivity when you run the
+Sandbox. In general, if you are going to use a wired Ethernet connection,
+select <strong>NAT Networks **and **vboxnet0</strong>. If you are going to use a wireless
+network, select <strong>Host-only Networks</strong> and the <strong>VirtualBox Host-Only Ethernet
+Adapter</strong>. If no adapters appear, click the green** +** button to add the
+VirtualBox adapter.</p>
+
+<p><img src="../../../img/vbMaprSetting.png" alt="">
+8. Click **OK **to continue.
+9. Click <img src="https://lh5.googleusercontent.com/6TjVEW28MJhPud2Nc2ButYB_GDqKTnadaluSulg0Zb259MgN1IRCgIlo-kMAEJ7lGWHf2aqc-nIjUsUFlaXP-LceAIKE5owNqXUWxXS0WXcBLWzUqg5X1VIXXswajb6oWA" alt="">. The MapR-Sandbox-For-Apache-Drill-0.6.0-r2-4.0.1 - Settings dialog appears.</p>
+
+<p><img src="../../../img/vbGenSettings.png" alt=""><br>
+10. Click <strong>OK</strong> to continue.
+11. Click <strong>Start</strong>. It takes a few minutes for the MapR services to start. After the MapR services start and installation completes, the following screen appears:</p>
+
+<p><img src="../../../img/vbloginSandbox.png" alt="">
+12. The client must be able to resolve the actual hostname of the Drill node(s) with the IP(s). Verify that a DNS entry was created on the client machine for the Drill node(s).<br>
+If a DNS entry does not exist, create the entry for the Drill node(s).</p></li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">* For Windows, create the entry in the %WINDIR%\system32\drivers\etc\hosts file.
+
+* For Linux and Mac, create the entry in /etc/hosts.
+</code></pre></div>
+<p><drill-machine-IP> <drill-machine-hostname><br>
+Example: <code>127.0.1.1 maprdemo</code></p>
+
+<ol>
+<li><p>You can navigate to the URL provided or to <a href="http://localhost:8047">localhost:8047</a> to experience the Drill Web UI, or you can log into the sandbox through the command line.</p>
+
+<p>a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser's address bar.</p>
+
+<p>b. To log into the virtual machine and access the command line, enter Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter <code>mapr</code> as the login and password.</p></li>
+</ol>
+
+<h1 id="what's-next">What's Next</h1>
+
+<p>After downloading and installing the sandbox, continue with the tutorial by
+<a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill
+Setup</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html (added)
+++ drill/site/trunk/content/drill/docs/installing-the-mapr-sandbox-with-apache-drill-on-vmware-player-vmware-fusion/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,153 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Installing the MapR Sandbox with Apache Drill on VMware Player/VMware Fusion - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Installing the MapR Sandbox with Apache Drill on VMware Player/VMware Fusion</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>Complete the following steps to install the MapR Sandbox with Apache Drill on
+VMware Player or VMware Fusion:</p>
+
+<ol>
+<li><p>Download the MapR Sandbox with Drill file to a directory on your machine:<br>
+<a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill</a></p></li>
+<li><p>Open the virtual machine player, and select the *<em>Open a Virtual Machine *</em>option.</p></li>
+</ol>
+
+<p>Tip for VMware Fusion</p>
+
+<p>If you are running VMware Fusion, select** Import**.</p>
+
+<p><img src="../../../img/vmWelcome.png" alt=""></p>
+
+<ol>
+<li>Navigate to the directory where you downloaded the MapR Sandbox with Apache Drill file, and select <code>MapR-Sandbox-For-Apache-Drill-4.0.1_VM.ova</code>.</li>
+</ol>
+
+<p><img src="../../../img/vmShare.png" alt=""></p>
+
+<p>The Import Virtual Machine dialog appears.</p>
+
+<ol>
+<li>Click <strong>Import</strong>. The virtual machine player imports the sandbox.</li>
+</ol>
+
+<p><img src="../../../img/vmLibrary.png" alt=""></p>
+
+<ol>
+<li>Select <code>MapR-Sandbox-For-Apache-Drill-4.0.1_VM</code>, and click <strong>Play virtual machine</strong>. It takes a few minutes for the MapR services to start.<br>
+After the MapR services start and installation completes, the following screen
+appears:</li>
+</ol>
+
+<p><img src="../../../img/loginSandbox.png" alt=""></p>
+
+<p>Note the URL provided in the screen, which corresponds to the Web UI in Apache
+Drill.</p>
+
+<ol>
+<li>Verify that a DNS entry was created on the host machine for the virtual machine. If not, create the entry.</li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">* For Linux and Mac, create the entry in `/etc/hosts`.
+
+* For WIndows, create the entry in the `%WINDIR%\system32\drivers\etc\hosts` file.
+</code></pre></div>
+<p>Example: <code>127.0.1.1 <vm_hostname></code></p>
+
+<ol>
+<li><p>You can navigate to the URL provided to experience Drill Web UI or you can login to the sandbox through the command line.</p>
+
+<p>a. To navigate to the MapR Sandbox with Apache Drill, enter the provided URL in your browser's address bar. </p>
+
+<p>b. To login to the virtual machine and access the command line, press Alt+F2 on Windows or Option+F5 on Mac. When prompted, enter <code>mapr</code> as the login and password.</p></li>
+</ol>
+
+<h1 id="what's-next">What's Next</h1>
+
+<p>After downloading and installing the sandbox, continue with the tutorial by
+<a href="/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup">Getting to Know the Drill
+Setup</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/kvgen-function/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/kvgen-function/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/kvgen-function/index.html (added)
+++ drill/site/trunk/content/drill/docs/kvgen-function/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,226 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>KVGEN Function - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>KVGEN Function</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>KVGEN stands for <em>key-value generation</em>. This function is useful when complex
+data files contain arbitrary maps that consist of relatively "unknown" column
+names. Instead of having to specify columns in the map to access the data, you
+can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
+a map with a wide set of columns into an array of key-value pairs.</p>
+
+<p>In turn, you can write analytic queries that return a subset of the generated
+keys or constrain the keys in some way. For example, you can use the
+<a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a> function to break the
+array down into multiple distinct rows and further query those rows.</p>
+
+<p>For example, assume that a JSON file contains this data: </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{"a": "valA", "b": "valB"}
+{"c": "valC", "d": "valD"}
+</code></pre></div>
+<p>KVGEN would operate on this data to generate:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">[{"key": "a", "value": "valA"}, {"key": "b", "value": "valB"}]
+[{"key": "c", "value": "valC"}, {"key": "d", "value": "valD"}]
+</code></pre></div>
+<p>Applying the <a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a> function to
+this data would return:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{"key": "a", "value": "valA"}
+{"key": "b", "value": "valB"}
+{"key": "c", "value": "valC"}
+{"key": "d", "value": "valD"}
+</code></pre></div>
+<p>Assume that a JSON file called <code>kvgendata.json</code> includes multiple records that
+look like this one:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+ "rownum": 1,
+ "bigintegercol": {
+ "int_1": 1,
+ "int_2": 2,
+ "int_3": 3
+ },
+ "varcharcol": {
+ "varchar_1": "abc",
+ "varchar_2": "def",
+ "varchar_3": "xyz"
+ },
+ "boolcol": {
+ "boolean_1": true,
+ "boolean_2": false,
+ "boolean_3": true
+ },
+ "float8col": {
+ "f8_1": 1.1,
+ "f8_2": 2.2
+ },
+ "complex": [
+ {
+ "col1": 3
+ },
+ {
+ "col2": 2,
+ "col3": 1
+ },
+ {
+ "col1": 7
+ }
+ ]
+}
+
+{
+ "rownum": 3,
+ "bigintegercol": {
+ "int_1": 1,
+ "int_3": 3
+ },
+ "varcharcol": {
+ "varchar_1": "abcde",
+ "varchar_2": null,
+ "varchar_3": "xyz",
+ "varchar_4": "xyz2"
+ },
+ "boolcol": {
+ "boolean_1": true,
+ "boolean_2": false
+ },
+ "float8col": {
+ "f8_1": 1.1,
+ "f8_3": 6.6
+ },
+ "complex": [
+ {
+ "col1": 2,
+ "col3": 1
+ }
+ ]
+}
+...
+</code></pre></div>
+<p>A SELECT * query against this specific record returns the following row:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local> select * from dfs.yelp.`kvgendata.json` where rownum=1;
+
++------------+---------------+------------+------------+------------+------------+
+| rownum | bigintegercol | varcharcol | boolcol | float8col | complex |
++------------+---------------+------------+------------+------------+------------+
+| 1 | {"int_1":1,"int_2":2,"int_3":3} | {"varchar_1":"abc","varchar_2":"def","varchar_3":"xyz"} | {"boolean_1":true,"boolean_2":false,"boolean_3":true} | {"f8_1":1.1,"f8_2":2.2} | [{"col1":3},{"col2":2,"col3":1},{"col1":7}] |
++------------+---------------+------------+------------+------------+------------+
+1 row selected (0.122 seconds)
+</code></pre></div>
+<p>You can use the KVGEN function to turn the maps in this data into key-value
+pairs. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local> select kvgen(varcharcol) from dfs.yelp.`kvgendata.json`;
++------------+
+| EXPR$0 |
++------------+
+| [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"},{"key":"varchar_3","value":"xyz"}] |
+| [{"key":"varchar_1","value":"abcd"}] |
+| [{"key":"varchar_1","value":"abcde"},{"key":"varchar_3","value":"xyz"},{"key":"varchar_4","value":"xyz2"}] |
+| [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"}] |
++------------+
+4 rows selected (0.091 seconds)
+</code></pre></div>
+<p>Now you can apply the FLATTEN function to break out the key-value pairs into
+distinct rows:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:zk=local> select flatten(kvgen(varcharcol)) from dfs.yelp.`kvgendata.json`;
++------------+
+| EXPR$0 |
++------------+
+| {"key":"varchar_1","value":"abc"} |
+| {"key":"varchar_2","value":"def"} |
+| {"key":"varchar_3","value":"xyz"} |
+| {"key":"varchar_1","value":"abcd"} |
+| {"key":"varchar_1","value":"abcde"} |
+| {"key":"varchar_3","value":"xyz"} |
+| {"key":"varchar_4","value":"xyz2"} |
+| {"key":"varchar_1","value":"abc"} |
+| {"key":"varchar_2","value":"def"} |
++------------+
+9 rows selected (0.151 seconds)
+</code></pre></div>
+<p>See the description of <a href="/confluence/display/DRILL/FLATTEN+Function">FLATTEN</a>
+for an example of a query against the flattened data.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html (added)
+++ drill/site/trunk/content/drill/docs/lession-1-learn-about-the-data-set/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,515 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lession 1: Learn about the Data Set - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lession 1: Learn about the Data Set</h1>
+
+</div>
+
+<div class="int_text" align="left"><h2 id="goal">Goal</h2>
+
+<p>This lesson is simply about discovering what data is available, in what
+format, using simple SQL SELECT statements. Drill is capable of analyzing data
+without prior knowledge or definition of its schema. This means that you can
+start querying data immediately (and even as it changes), regardless of its
+format.</p>
+
+<p>The data set for the tutorial consists of:</p>
+
+<ul>
+<li><p>Transactional data: stored as a Hive table</p></li>
+<li><p>Product catalog and master customer data: stored as MapR-DB tables</p></li>
+<li><p>Clickstream and logs data: stored in the MapR file system as JSON files</p></li>
+</ul>
+
+<h2 id="queries-in-this-lesson">Queries in This Lesson</h2>
+
+<p>This lesson consists of select * queries on each data source.</p>
+
+<h2 id="before-you-begin">Before You Begin</h2>
+
+<h3 id="start-sqlline">Start sqlline</h3>
+
+<p>If sqlline is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter <code>sqlline</code>:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">$ ssh root@10.250.0.6
+Password:
+Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+Welcome to your Mapr Demo virtual machine.
+[root@maprdemo ~]# sqlline
+sqlline version 1.1.6
+0: jdbc:drill:>
+</code></pre></div>
+<p>You can run queries from this prompt to complete the tutorial. To exit from
+<code>sqlline</code>, type:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> !quit
+</code></pre></div>
+<p>Note that though this tutorial demonstrates the queries using SQLLine, you can
+also execute queries using the Drill Web UI.</p>
+
+<h3 id="list-the-available-workspaces-and-databases:">List the available workspaces and databases:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> show databases;
++-------------+
+| SCHEMA_NAME |
++-------------+
+| hive.default |
+| dfs.default |
+| dfs.logs |
+| dfs.root |
+| dfs.views |
+| dfs.clicks |
+| dfs.data |
+| dfs.tmp |
+| sys |
+| maprdb |
+| cp.default |
+| INFORMATION_SCHEMA |
++-------------+
+12 rows selected
+</code></pre></div>
+<p>Note that this command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. This includes the Hive and
+MapR-DB databases as well as the workspaces configured in the file system. As
+you run queries in the tutorial, you will switch among these schemas by
+submitting the USE command. This behavior resembles the ability to use
+different database schemas (namespaces) in a relational database system.</p>
+
+<h2 id="query-hive-tables">Query Hive Tables</h2>
+
+<p>The orders table is a six-column Hive table defined in the Hive metastore.
+This is a Hive external table pointing to the data stored in flat files on the
+MapR file system. The orders table contains 122,000 rows.</p>
+
+<h3 id="set-the-schema-to-hive:">Set the schema to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use hive;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'hive' |
++------------+------------+
+</code></pre></div>
+<p>You will run the USE command throughout this tutorial. The USE command sets
+the schema for the current session.</p>
+
+<h3 id="describe-the-table:">Describe the table:</h3>
+
+<p>You can use the DESCRIBE command to show the columns and data types for a Hive
+table:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> describe orders;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
++-------------+------------+-------------+
+| order_id | BIGINT | YES |
+| month | VARCHAR | YES |
+| cust_id | BIGINT | YES |
+| state | VARCHAR | YES |
+| prod_id | BIGINT | YES |
+| order_total | INTEGER | YES |
++-------------+------------+-------------+
+</code></pre></div>
+<p>The DESCRIBE command returns complete schema information for Hive tables based
+on the metadata available in the Hive metastore.</p>
+
+<h3 id="select-5-rows-from-the-orders-table:">Select 5 rows from the orders table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from orders limit 5;
++------------+------------+------------+------------+------------+-------------+
+| order_id | month | cust_id | state | prod_id | order_total |
++------------+------------+------------+------------+------------+-------------+
+| 67212 | June | 10001 | ca | 909 | 13 |
+| 70302 | June | 10004 | ga | 420 | 11 |
+| 69090 | June | 10011 | fl | 44 | 76 |
+| 68834 | June | 10012 | ar | 0 | 81 |
+| 71220 | June | 10018 | az | 411 | 24 |
++------------+------------+------------+------------+------------+-------------+
+</code></pre></div>
+<p>Because orders is a Hive table, you can query the data in the same way that
+you would query the columns in a relational database table. Note the use of
+the standard LIMIT clause, which limits the result set to the specified number
+of rows. You can use LIMIT with or without an ORDER BY clause.</p>
+
+<p>Drill provides seamless integration with Hive by allowing queries on Hive
+tables defined in the metastore with no extra configuration. Note that Hive is
+not a prerequisite for Drill, but simply serves as a storage plugin or data
+source for Drill. Drill also lets users query all Hive file formats (including
+custom serdes). Additionally, any UDFs defined in Hive can be leveraged as
+part of Drill queries.</p>
+
+<p>Because Drill has its own low-latency SQL query execution engine, you can
+query Hive tables with high performance and support for interactive and ad-hoc
+data exploration.</p>
+
+<h2 id="query-mapr-db-and-hbase-tables">Query MapR-DB and HBase Tables</h2>
+
+<p>The customers and products tables are MapR-DB tables. MapR-DB is an enterprise
+in-Hadoop NoSQL database. It exposes the HBase API to support application
+development. Every MapR-DB table has a row_key, in addition to one or more
+column families. Each column family contains one or more specific columns. The
+row_key value is a primary key that uniquely identifies each row.</p>
+
+<p>Drill allows direct queries on MapR-DB and HBase tables. Unlike other SQL on
+Hadoop options, Drill requires no overlay schema definitions in Hive to work
+with this data. Think about a MapR-DB or HBase table with thousands of
+columns, such as a time-series database, and the pain of having to manage
+duplicate schemas for it in Hive!</p>
+
+<h3 id="products-table">Products Table</h3>
+
+<p>The products table has two column families.</p>
+
+<p>Column Family|Columns </p>
+
+<p>---|--- </p>
+
+<p>details</p>
+
+<table><thead>
+<tr>
+</tr>
+</thead><tbody>
+</tbody></table>
+
+<p>name</p>
+
+<p>category </p>
+
+<p>pricing</p>
+
+<table><thead>
+<tr>
+</tr>
+</thead><tbody>
+</tbody></table>
+
+<p>price </p>
+
+<p>The products table contains 965 rows.</p>
+
+<h3 id="customers-table">Customers Table</h3>
+
+<p>The Customers table has three column families.</p>
+
+<table><thead>
+<tr>
+<th>Column Family</th>
+<th>Columns</th>
+</tr>
+</thead><tbody>
+<tr>
+<td>address</td>
+<td>state</td>
+</tr>
+<tr>
+<td>loyalty</td>
+<td>agg_rev</td>
+</tr>
+<tr>
+<td></td>
+<td>membership</td>
+</tr>
+<tr>
+<td>personal</td>
+<td>age</td>
+</tr>
+<tr>
+<td></td>
+<td>gender</td>
+</tr>
+</tbody></table>
+
+<p>The customers table contains 993 rows.</p>
+
+<h3 id="set-the-workspace-to-maprdb:">Set the workspace to maprdb:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use maprdb;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'maprdb' |
++------------+------------+
+</code></pre></div>
+<h3 id="describe-the-tables:">Describe the tables:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> describe customers;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
++-------------+------------+-------------+
+| row_key | ANY | NO |
+| address | (VARCHAR(1), ANY) MAP | NO |
+| loyalty | (VARCHAR(1), ANY) MAP | NO |
+| personal | (VARCHAR(1), ANY) MAP | NO |
++-------------+------------+-------------+
+
+0: jdbc:drill:> describe products;
++-------------+------------+-------------+
+| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
++-------------+------------+-------------+
+| row_key | ANY | NO |
+| details | (VARCHAR(1), ANY) MAP | NO |
+| pricing | (VARCHAR(1), ANY) MAP | NO |
++-------------+------------+-------------+
+</code></pre></div>
+<p>Unlike the Hive example, the DESCRIBE command does not return the full schema
+up to the column level. Wide-column NoSQL databases such as MapR-DB and HBase
+can be schema-less by design; every row has its own set of column name-value
+pairs in a given column family, and the column value can be of any data type,
+as determined by the application inserting the data.</p>
+
+<p>A âMAPâ complex type in Drill represents this variable column name-value
+structure, and âANYâ represents the fact that the column value can be of any
+data type. Observe the row_key, which is also simply bytes and has the type
+ANY.</p>
+
+<h3 id="select-5-rows-from-the-products-table:">Select 5 rows from the products table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from products limit 5;
++------------+------------+------------+
+| row_key | details | pricing |
++------------+------------+------------+
+| [B@a1a3e25 | {"category":"bGFwdG9w","name":"IlNvbnkgbm90ZWJvb2si"} | {"price":"OTU5"} |
+| [B@103a43af | {"category":"RW52ZWxvcGVz","name":"IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw=="} | {"price":"MT |
+| [B@61319e7b | {"category":"U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==","name":"MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs"} | {"price" |
+| [B@9bcf17 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk4"} | {"price":"Mw=="} |
+| [B@7538ef50 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk="} | {"price":"Mw=="} |
+</code></pre></div>
+<p>Given that Drill requires no up front schema definitions indicating data
+types, the query returns the raw byte arrays for column values, just as they
+are stored in MapR-DB (or HBase). Observe that the column families (details
+and pricing) have the map data type and appear as JSON strings.</p>
+
+<p>In Lesson 2, you will use CAST functions to return typed data for each column.</p>
+
+<h3 id="select-5-rows-from-the-customers-table:">Select 5 rows from the customers table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">+0: jdbc:drill:> select * from customers limit 5;
++------------+------------+------------+------------+
+| row_key | address | loyalty | personal |
++------------+------------+------------+------------+
+| [B@284bae62 | {"state":"Imt5Ig=="} | {"agg_rev":"IjEwMDEtMzAwMCI=","membership":"ImJhc2ljIg=="} | {"age":"IjI2LTM1Ig==","gender":"Ik1B |
+| [B@7ffa4523 | {"state":"ImNhIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImdvbGQi"} | {"age":"IjI2LTM1Ig==","gender":"IkZFTUFMRSI= |
+| [B@7d13e79 | {"state":"Im9rIg=="} | {"agg_rev":"IjUwMS0xMDAwIg==","membership":"InNpbHZlciI="} | {"age":"IjI2LTM1Ig==","gender":"IkZFT |
+| [B@3a5c7df1 | {"state":"ImtzIg=="} | {"agg_rev":"IjMwMDEtMTAwMDAwIg==","membership":"ImdvbGQi"} | {"age":"IjUxLTEwMCI=","gender":"IkZF |
+| [B@e507726 | {"state":"Im5qIg=="} | {"agg_rev":"IjAtMTAwIg==","membership":"ImJhc2ljIg=="} | {"age":"IjIxLTI1Ig==","gender":"Ik1BTEUi" |
++------------+------------+------------+------------+
+</code></pre></div>
+<p>Again the table returns byte data that needs to be cast to readable data
+types.</p>
+
+<h2 id="query-the-file-system">Query the File System</h2>
+
+<p>Along with querying a data source with full schemas (such as Hive) and partial
+schemas (such as MapR-DB and HBase), Drill offers the unique capability to
+perform SQL queries directly on file system. The file system could be a local
+file system, or a distributed file system such as MapR-FS, HDFS, or S3.</p>
+
+<p>In the context of Drill, a file or a directory is considered as synonymous to
+a relational database âtable.â Therefore, you can perform SQL operations
+directly on files and directories without the need for up-front schema
+definitions or schema management for any model changes. The schema is
+discovered on the fly based on the query. Drill supports queries on a variety
+of file formats including text, CSV, Parquet, and JSON in the 0.5 release.</p>
+
+<p>In this example, the clickstream data coming from the mobile/web applications
+is in JSON format. The JSON files have the following structure:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{"trans_id":31920,"date":"2014-04-26","time":"12:17:12","user_info":{"cust_id":22526,"device":"IOS5","state":"il"},"trans_info":{"prod_id":[174,2],"purch_flag":"false"}}
+{"trans_id":31026,"date":"2014-04-20","time":"13:50:29","user_info":{"cust_id":16368,"device":"AOS4.2","state":"nc"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
+{"trans_id":33848,"date":"2014-04-10","time":"04:44:42","user_info":{"cust_id":21449,"device":"IOS6","state":"oh"},"trans_info":{"prod_id":[582],"purch_flag":"false"}}
+</code></pre></div>
+<p>The clicks.json and clicks.campaign.json files contain metadata as part of the
+data itself (referred to as âself-describingâ data). Also note that the data
+elements are complex, or nested. The initial queries below do not show how to
+unpack the nested data, but they show that easy access to the data requires no
+setup beyond the definition of a workspace.</p>
+
+<h3 id="query-nested-clickstream-data">Query nested clickstream data</h3>
+
+<h4 id="set-the-workspace-to-dfs.clicks:">Set the workspace to dfs.clicks:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> 0: jdbc:drill:> use dfs.clicks;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'dfs.clicks' |
++------------+------------+
+</code></pre></div>
+<p>In this case, setting the workspace is a mechanism for making queries easier
+to write. When you specify a file system workspace, you can shorten references
+to files in the FROM clause of your queries. Instead of having to provide the
+complete path to a file, you can provide the path relative to a directory
+location specified in the workspace. For example:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">"location": "/mapr/demo.mapr.com/data/nested"
+</code></pre></div>
+<p>Any file or directory that you want to query in this path can be referenced
+relative to this path. The clicks directory referred to in the following query
+is directly below the nested directory.</p>
+
+<h4 id="select-2-rows-from-the-clicks.json-file:">Select 2 rows from the clicks.json file:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
++------------+------------+------------+------------+------------+
+| trans_id | date | time | user_info | trans_info |
++------------+------------+------------+------------+------------+
+| 31920 | 2014-04-26 | 12:17:12 | {"cust_id":22526,"device":"IOS5","state":"il"} | {"prod_id":[174,2],"purch_flag":"false"} |
+| 31026 | 2014-04-20 | 13:50:29 | {"cust_id":16368,"device":"AOS4.2","state":"nc"} | {"prod_id":[],"purch_flag":"false"} |
++------------+------------+------------+------------+------------+
+2 rows selected
+</code></pre></div>
+<p>Note that the FROM clause reference points to a specific file. Drill expands
+the traditional concept of a âtable referenceâ in a standard SQL FROM clause
+to refer to a file in a local or distributed file system.</p>
+
+<p>The only special requirement is the use of back ticks to enclose the file
+path. This is necessary whenever the file path contains Drill reserved words
+or characters.</p>
+
+<h4 id="select-2-rows-from-the-campaign.json-file:">Select 2 rows from the campaign.json file:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from `clicks/clicks.campaign.json` limit 2;
++------------+------------+------------+------------+------------+------------+
+| trans_id | date | time | user_info | ad_info | trans_info |
++------------+------------+------------+------------+------------+------------+
+| 35232 | 2014-05-10 | 00:13:03 | {"cust_id":18520,"device":"AOS4.3","state":"tx"} | {"camp_id":"null"} | {"prod_id":[7,7],"purch_flag":"true"} |
+| 31995 | 2014-05-22 | 16:06:38 | {"cust_id":17182,"device":"IOS6","state":"fl"} | {"camp_id":"null"} | {"prod_id":[],"purch_flag":"false"} |
++------------+------------+------------+------------+------------+------------+
+2 rows selected
+</code></pre></div>
+<p>Notice that with a select * query, any complex data types such as maps and
+arrays return as JSON strings. You will see how to unpack this data using
+various SQL functions and operators in the next lesson.</p>
+
+<h2 id="query-logs-data">Query Logs Data</h2>
+
+<p>Unlike the previous example where we performed queries against clicks data in
+one file, logs data is stored as partitioned directories on the file system.
+The logs directory has three subdirectories:</p>
+
+<ul>
+<li><p>2012</p></li>
+<li><p>2013</p></li>
+<li><p>2014</p></li>
+</ul>
+
+<p>Each of these year directories fans out to a set of numbered month
+directories, and each month directory contains a JSON file with log records
+for that month. The total number of records in all log files is 48000.</p>
+
+<p>The files in the logs directory and its subdirectories are JSON files. There
+are many of these files, but you can use Drill to query them all as a single
+data source, or to query a subset of the files.</p>
+
+<h4 id="set-the-workspace-to-dfs.logs:">Set the workspace to dfs.logs:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text"> 0: jdbc:drill:> use dfs.logs;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'dfs.logs' |
++------------+------------+
+</code></pre></div>
+<h4 id="select-2-rows-from-the-logs-directory:">Select 2 rows from the logs directory:</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from logs limit 2;
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+| dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_fl |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+| 2014 | 8 | 24181 | 08/02/2014 | 09:23:52 | 0 | IOS5 | il | 2 | wait | 128 | false |
+| 2014 | 8 | 24195 | 08/02/2014 | 07:58:19 | 243 | IOS5 | mo | 6 | hmm | 107 | false |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+</code></pre></div>
+<p>Note that this is flat JSON data. The dfs.clicks workspace location property
+points to a directory that contains the logs directory, making the FROM clause
+reference for this query very simple. You do not have to refer to the complete
+directory path on the file system.</p>
+
+<p>The column names dir0 and dir1 are special Drill variables that identify
+subdirectories below the logs directory. In Lesson 3, you will do more complex
+queries that leverage these dynamic variables.</p>
+
+<h4 id="find-the-total-number-of-rows-in-the-logs-directory-(all-files):">Find the total number of rows in the logs directory (all files):</h4>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select count(*) from logs;
++------------+
+| EXPR$0 |
++------------+
+| 48000 |
++------------+
+</code></pre></div>
+<p>This query traverses all of the files in the logs directory and its
+subdirectories to return the total number of rows in those files.</p>
+
+<h1 id="what's-next">What's Next</h1>
+
+<p>Go to <a href="/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL">Lesson 2: Run Queries with ANSI
+SQL</a>.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>
Added: drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html (added)
+++ drill/site/trunk/content/drill/docs/lession-2-run-queries-with-ansi-sql/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,459 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Lession 2: Run Queries with ANSI SQL - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+ <li class="logo"><a href="/"></a></li>
+ <li>
+ <a href="/overview/">Documentation</a>
+ <ul>
+ <li><a href="/overview/">Overview </a></li>
+ <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+ <li><a href="/why/">Why Drill? </a></li>
+ <li><a href="/architecture/">Architecture</a></li>
+ </ul>
+ </li>
+ <li>
+ <a href="/community/">Community</a>
+ <ul>
+ <li><a href="/team/">Team</a></li>
+ <li><a href="/community/#events">Events and Meetups</a></li>
+ <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+ <li><a href="/community/#getinvolved">Get Involved</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+ <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+ </ul>
+ </li>
+ <li><a href="/faq/">FAQ</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+ <li class="l"><span> </span></li>
+ <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Lession 2: Run Queries with ANSI SQL</h1>
+
+</div>
+
+<div class="int_text" align="left"><h2 id="goal">Goal</h2>
+
+<p>This lesson shows how to do some standard SQL analysis in Apache Drill: for
+example, summarizing data by using simple aggregate functions and connecting
+data sources by using joins. Note that Apache Drill provides ANSI SQL support,
+not a âSQL-likeâ interface.</p>
+
+<h2 id="queries-in-this-lesson">Queries in This Lesson</h2>
+
+<p>Now that you know what the data sources look like in their raw form, using
+select * queries, try running some simple but more useful queries on each data
+source. These queries demonstrate how Drill supports ANSI SQL constructs and
+also how you can combine data from different data sources in a single SELECT
+statement.</p>
+
+<ul>
+<li><p>Show an aggregate query on a single file or table. Use GROUP BY, WHERE, HAVING, and ORDER BY clauses.</p></li>
+<li><p>Perform joins between Hive, MapR-DB, and file system data sources.</p></li>
+<li><p>Use table and column aliases.</p></li>
+<li><p>Create a Drill view.</p></li>
+</ul>
+
+<h2 id="aggregation">Aggregation</h2>
+
+<h3 id="set-the-schema-to-hive:">Set the schema to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use hive;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'hive' |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-sales-totals-by-month:">Return sales totals by month:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select `month`, sum(order_total)
+from orders group by `month` order by 2 desc;
++------------+------------+
+| month | EXPR$1 |
++------------+------------+
+| June | 950481 |
+| May | 947796 |
+| March | 836809 |
+| April | 807291 |
+| July | 757395 |
+| October | 676236 |
+| August | 572269 |
+| February | 532901 |
+| September | 373100 |
+| January | 346536 |
++------------+------------+
+</code></pre></div>
+<p>Drill supports SQL aggregate functions such as SUM, MAX, AVG, and MIN.
+Standard SQL clauses work in the same way in Drill queries as in relational
+database queries.</p>
+
+<p>Note that back ticks are required for the âmonthâ column only because âmonthâ
+is a reserved word in SQL.</p>
+
+<h3 id="return-the-top-20-sales-totals-by-month-and-state:">Return the top 20 sales totals by month and state:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select `month`, state, sum(order_total) as sales from orders group by `month`, state
+order by 3 desc limit 20;
++------------+------------+------------+
+| month | state | sales |
++------------+------------+------------+
+| May | ca | 119586 |
+| June | ca | 116322 |
+| April | ca | 101363 |
+| March | ca | 99540 |
+| July | ca | 90285 |
+| October | ca | 80090 |
+| June | tx | 78363 |
+| May | tx | 77247 |
+| March | tx | 73815 |
+| August | ca | 71255 |
+| April | tx | 68385 |
+| July | tx | 63858 |
+| February | ca | 63527 |
+| June | fl | 62199 |
+| June | ny | 62052 |
+| May | fl | 61651 |
+| May | ny | 59369 |
+| October | tx | 55076 |
+| March | fl | 54867 |
+| March | ny | 52101 |
++------------+------------+------------+
+20 rows selected
+</code></pre></div>
+<p>Note the alias for the result of the SUM function. Drill supports column
+aliases and table aliases.</p>
+
+<h2 id="having-clause">HAVING Clause</h2>
+
+<p>This query uses the HAVING clause to constrain an aggregate result.</p>
+
+<h3 id="set-the-workspace-to-dfs.clicks">Set the workspace to dfs.clicks</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use dfs.clicks;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'dfs.clicks' |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-total-number-of-clicks-for-devices-that-indicate-high-click-throughs:">Return total number of clicks for devices that indicate high click-throughs:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select t.user_info.device, count(*) from `clicks/clicks.json` t
+group by t.user_info.device
+having count(*) > 1000;
++------------+------------+
+| EXPR$0 | EXPR$1 |
++------------+------------+
+| IOS5 | 11814 |
+| AOS4.2 | 5986 |
+| IOS6 | 4464 |
+| IOS7 | 3135 |
+| AOS4.4 | 1562 |
+| AOS4.3 | 3039 |
++------------+------------+
+</code></pre></div>
+<p>The aggregate is a count of the records for each different mobile device in
+the clickstream data. Only the activity for the devices that registered more
+than 1000 transactions qualify for the result set.</p>
+
+<h2 id="union-operator">UNION Operator</h2>
+
+<p>Use the same workspace as before (dfs.clicks).</p>
+
+<h3 id="combine-clicks-activity-from-before-and-after-the-marketing-campaign">Combine clicks activity from before and after the marketing campaign</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select t.trans_id transaction, t.user_info.cust_id customer from `clicks/clicks.campaign.json` t
+union all
+select u.trans_id, u.user_info.cust_id from `clicks/clicks.json` u limit 5;
++-------------+------------+
+| transaction | customer |
++-------------+------------+
+| 35232 | 18520 |
+| 31995 | 17182 |
+| 35760 | 18228 |
+| 37090 | 17015 |
+| 37838 | 18737 |
++-------------+------------+
+</code></pre></div>
+<p>This UNION ALL query returns rows that exist in two files (and includes any
+duplicate rows from those files): <code>clicks.campaign.json</code> and <code>clicks.json</code>.</p>
+
+<h2 id="subqueries">Subqueries</h2>
+
+<h3 id="set-the-workspace-to-hive:">Set the workspace to hive:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use hive;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'hive' |
++------------+------------+
+</code></pre></div>
+<h3 id="compare-order-totals-across-states:">Compare order totals across states:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select o1.cust_id, sum(o1.order_total) as ny_sales,
+(select sum(o2.order_total) from hive.orders o2
+where o1.cust_id=o2.cust_id and state='ca') as ca_sales
+from hive.orders o1 where o1.state='ny' group by o1.cust_id
+order by cust_id limit 20;
++------------+------------+------------+
+| cust_id | ny_sales | ca_sales |
++------------+------------+------------+
+| 1001 | 72 | 47 |
+| 1002 | 108 | 198 |
+| 1003 | 83 | null |
+| 1004 | 86 | 210 |
+| 1005 | 168 | 153 |
+| 1006 | 29 | 326 |
+| 1008 | 105 | 168 |
+| 1009 | 443 | 127 |
+| 1010 | 75 | 18 |
+| 1012 | 110 | null |
+| 1013 | 19 | null |
+| 1014 | 106 | 162 |
+| 1015 | 220 | 153 |
+| 1016 | 85 | 159 |
+| 1017 | 82 | 56 |
+| 1019 | 37 | 196 |
+| 1020 | 193 | 165 |
+| 1022 | 124 | null |
+| 1023 | 166 | 149 |
+| 1024 | 233 | null |
++------------+------------+------------+
+</code></pre></div>
+<p>This example demonstrates Drill support for correlated subqueries. This query
+uses a subquery in the select list and correlates the result of the subquery
+with the outer query, using the cust_id column reference. The subquery returns
+the sum of order totals for California, and the outer query returns the
+equivalent sum, for the same cust_id, for New York.</p>
+
+<p>The result set is sorted by the cust_id and presents the sales totals side by
+side for easy comparison. Null values indicate customer IDs that did not
+register any sales in that state.</p>
+
+<h2 id="cast-function">CAST Function</h2>
+
+<h3 id="use-the-maprdb-workspace:">Use the maprdb workspace:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use maprdb;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'maprdb' |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h3 id="return-customer-data-with-appropriate-data-types">Return customer data with appropriate data types</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select cast(row_key as int) as cust_id, cast(t.personal.name as varchar(20)) as name,
+cast(t.personal.gender as varchar(10)) as gender, cast(t.personal.age as varchar(10)) as age,
+cast(t.address.state as varchar(4)) as state, cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+cast(t.loyalty.membership as varchar(20)) as membership
+from customers t limit 5;
++------------+------------+------------+------------+------------ +------------+------------+
+| cust_id | name | gender | age | state | agg_rev | membership |
++------------+------------+------------+------------+------------+------------+------------+
+| 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
+| 10005 | "Brittany Park" | "MALE" | "26-35" | "in" | 230.00 | "silver" |
+| 10006 | "Rose Lokey" | "MALE" | "26-35" | "ca" | 250.00 | "silver" |
+| 10007 | "James Fowler" | "FEMALE" | "51-100" | "me" | 263.00 | "silver" |
+| 10010 | "Guillermo Koehler" | "OTHER" | "51-100" | "mn" | 202.00 | "silver" |
++------------+------------+------------+------------+------------+------------+------------+
+5 rows selected
+</code></pre></div>
+<p>Note the following features of this query:</p>
+
+<ul>
+<li>The CAST function is required for every column in the table. This function returns the MapR-DB/HBase binary data as readable integers and strings. Alternatively, you can use CONVERT_TO/CONVERT_FROM functions to decode the columns. CONVERT_TO and CONVERT_FROM are more efficient than CAST in most cases.</li>
+<li>The row_key column functions as the primary key of the table (a customer ID in this case).</li>
+<li>The table alias t is required; otherwise the column family names would be parsed as table names and the query would return an error.</li>
+</ul>
+
+<h3 id="remove-the-quotes-from-the-strings:">Remove the quotes from the strings:</h3>
+
+<p>You can use the regexp_replace function to remove the quotes around the
+strings in the query results. For example, to return a state name va instead
+of âvaâ:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select cast(row_key as int), regexp_replace(cast(t.address.state as varchar(10)),'"','')
+from customers t limit 1;
++------------+------------+
+| EXPR$0 | EXPR$1 |
++------------+------------+
+| 10001 | va |
++------------+------------+
+1 row selected
+</code></pre></div>
+<h2 id="create-view-command">CREATE VIEW Command</h2>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> use dfs.views;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | Default schema changed to 'dfs.views' |
++------------+------------+
+</code></pre></div>
+<h3 id="use-a-mutable-workspace:">Use a mutable workspace:</h3>
+
+<p>A mutable (or writable) workspace is a workspace that is enabled for âwriteâ
+operations. This attribute is part of the storage plugin configuration. You
+can create Drill views and tables in mutable workspaces.</p>
+
+<h3 id="create-a-view-on-a-mapr-db-table">Create a view on a MapR-DB table</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> create or replace view custview as select cast(row_key as int) as cust_id,
+cast(t.personal.name as varchar(20)) as name,
+cast(t.personal.gender as varchar(10)) as gender,
+cast(t.personal.age as varchar(10)) as age,
+cast(t.address.state as varchar(4)) as state,
+cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+cast(t.loyalty.membership as varchar(20)) as membership
+from maprdb.customers t;
++------------+------------+
+| ok | summary |
++------------+------------+
+| true | View 'custview' replaced successfully in 'dfs.views' schema |
++------------+------------+
+1 row selected
+</code></pre></div>
+<p>Drill provides CREATE OR REPLACE VIEW syntax similar to relational databases
+to create views. Use the OR REPLACE option to make it easier to update the
+view later without having to remove it first. Note that the FROM clause in
+this example must refer to maprdb.customers. The MapR-DB tables are not
+directly visible to the dfs.views workspace.</p>
+
+<p>Unlike a traditional database where views typically are DBA/developer-driven
+operations, file system-based views in Drill are very lightweight. A view is
+simply a special file with a specific extension (.drill). You can store views
+even in your local file system or point to a specific workspace. You can
+specify any query against any Drill data source in the body of the CREATE VIEW
+statement.</p>
+
+<p>Drill provides a decentralized metadata model. Drill is able to query metadata
+defined in data sources such as Hive, HBase, and the file system. Drill also
+supports the creation of metadata in the file system.</p>
+
+<h3 id="query-data-from-the-view:">Query data from the view:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select * from custview limit 1;
++------------+------------+------------+------------+------------+------------+------------+
+| cust_id | name | gender | age | state | agg_rev | membership |
++------------+------------+------------+------------+------------+------------+------------+
+| 10001 | "Corrine Mecham" | "FEMALE" | "15-20" | "va" | 197.00 | "silver" |
++------------+------------+------------+------------+------------+------------+------------+
+</code></pre></div>
+<p>Once the users get an idea on what data is available by exploring it directly
+from file system , views can be used as a way to take the data in downstream
+tools like Tableau, Microstrategy etc for downstream analysis and
+visualization. For these tools, a view appears simply as a âtableâ with
+selectable âcolumnsâ in it.</p>
+
+<h2 id="query-across-data-sources">Query Across Data Sources</h2>
+
+<p>Continue using dfs.views for this query.</p>
+
+<h3 id="join-the-customers-view-and-the-orders-table:">Join the customers view and the orders table:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select membership, sum(order_total) as sales from hive.orders, custview
+where orders.cust_id=custview.cust_id
+group by membership order by 2;
++------------+------------+
+| membership | sales |
++------------+------------+
+| "basic" | 380665 |
+| "silver" | 708438 |
+| "gold" | 2787682 |
++------------+------------+
+3 rows selected
+</code></pre></div>
+<p>In this query, we are reading data from a MapR-DB table (represented by
+custview) and combining it with the order information in Hive. When doing
+cross data source queries such as this, you need to use fully qualified
+table/view names. For example, the orders table is prefixed by âhive,â which
+is the storage plugin name registered with Drill. We are not using any prefix
+for âcustviewâ because we explicitly switched the dfs.views workspace where
+custview is stored.</p>
+
+<p>Note: If the results of any of your queries appear to be truncated because the
+rows are wide, set the maximum width of the display to 10000:</p>
+
+<p>Do not use a semicolon for this SET command.</p>
+
+<h3 id="join-the-customers,-orders,-and-clickstream-data:">Join the customers, orders, and clickstream data:</h3>
+<div class="highlight"><pre><code class="language-text" data-lang="text">0: jdbc:drill:> select custview.membership, sum(orders.order_total) as sales from hive.orders, custview,
+dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c
+where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id
+group by custview.membership order by 2;
++------------+------------+
+| membership | sales |
++------------+------------+
+| "basic" | 372866 |
+| "silver" | 728424 |
+| "gold" | 7050198 |
++------------+------------+
+3 rows selected
+</code></pre></div>
+<p>This three-way join selects from three different data sources in one query:</p>
+
+<ul>
+<li>hive.orders table</li>
+<li>custview (a view of the HBase customers table)</li>
+<li>clicks.json file</li>
+</ul>
+
+<p>The join column for both sets of join conditions is the cust_id column. The
+views workspace is used for this query so that custview can be accessed. The
+hive.orders table is also visible to the query.</p>
+
+<p>However, note that the JSON file is not directly visible from the views
+workspace, so the query specifies the full path to the file:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
+</code></pre></div>
+<h1 id="what's-next">What's Next</h1>
+
+<p>Go to <a href="/confluence/display/DRILL/%0ALesson+3%3A+Run+Queries+on+Complex+Data+Types">Lesson 3: Run Queries on Complex Data Types</a>. </p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>