You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by bu...@apache.org on 2014/11/09 08:16:19 UTC
svn commit: r928467 - in /websites/staging/drill/trunk/content: ./ drill/download.html drill/top-10-reasons-for-using-drill.html

Author: buildbot
Date: Sun Nov  9 07:16:18 2014
New Revision: 928467

Log:
Staging update by buildbot for drill

Modified:
    websites/staging/drill/trunk/content/   (props changed)
    websites/staging/drill/trunk/content/drill/download.html
    websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html

Propchange: websites/staging/drill/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Nov  9 07:16:18 2014
@@ -1 +1 @@
-1637518
+1637630

Modified: websites/staging/drill/trunk/content/drill/download.html
==============================================================================
--- websites/staging/drill/trunk/content/drill/download.html (original)
+++ websites/staging/drill/trunk/content/drill/download.html Sun Nov  9 07:16:18 2014
@@ -67,7 +67,7 @@
         
         <div class="int_text download">
         	
-            <h2>The latest release is Drill 0.6.0, released November 7, 2014</h2>
+            <h2>The latest release is Drill 0.6.0, released November 1, 2014</h2>
           <br>
             
             <table>

Modified: websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
==============================================================================
--- websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html (original)
+++ websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html Sun Nov  9 07:16:18 2014
@@ -92,45 +92,39 @@ font-family: Consolas, "Liberation Mono"
            <!-- Blog -->
 						
 
-<p>There are several options available for SQL-on-Hadoop today. What makes Drill different? </p>
-<p>Here are the top 10 reasons why Drill is a valuable and innovative technology in your toolset for interactive data exploration on big data</p>
-<div align="center">
-<p><img alt="Apache Drill" src="https://www.mapr.com/sites/default/files/blogimages/Apache-Drill.png" style="height:39px; width:551px"></p>
-<p style="margin-left:40px"><img alt="quick and easy ramp up for apache drill" src="https://www.mapr.com/sites/default/files/blogimages/Quick-Easy-Ramp-Up-2.png" style="height:329px; width:550px; padding-right:35px"></p>
-</div>
-
-<h2>1. Quick and easy ramp up</h2>
-<p>First and foremost, it takes just minutes to start working with Apache Drill. Install it on a local Windows or Mac machine and do queries right away - you don't even need Hadoop.</p><p>Here are three simple steps to run your first query with Drill.</p>
 
+<h2>1. Get started in minutes</h2>
+<p>It only takes a couple minutes to start working with Drill. Untar it on your Mac or Windows laptop and run a query on a local file. No need to set up any infrastructure. No need to define schemas. Just point at the data and drill!</p>
 <pre>
- 
-// Install, launch SQLLine CLI and query a JSON file on local file system
-$ tar -xvf apache-drill-0.5.0-incubating.tar  
-                 
-$ apache-drill-0.5.0-incubating/bin/sqlline -u jdbc:drill:zk=local
-
-0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` limit 5;
+$ tar -xvf apache-drill-0.6.0-incubating.tar.gz
+$ apache-drill-0.6.0-incubating/bin/sqlline -u jdbc:drill:zk=local
+0: jdbc:drill:zk=local> SELECT * FROM dfs.root.`path/to/employee.json` limit 5;
 +-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
 | employee_id | full_name        | first_name | last_name  | position_id | position_title       |  store_id  | department_id | birt 
 +-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+------+
-
 | 1           | Sheri Nowmer     | Sheri      | Nowmer     | 1           | President            | 0          | 1             | 19   
 | 2           | Derrick Whelply  | Derrick    | Whelply    | 2           | VP Country Manager   | 0          | 1             |
 | 4           | Michael Spence   | Michael    | Spence     | 2           | VP Country Manager   | 0          | 1             |
 | 5           | Maya Gutierrez   | Maya       | Gutierrez  | 2           | VP Country Manager   | 0          | 1             |
 | 6           | Roberta Damstra  | Roberta    | Damstra    | 3           | VP Information Systems | 0        | 2             |
 +-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
- 
 </pre>
+<h2>2. Schema-free JSON model</h2>
+<p>Drill is the world's first and only distributed SQL engine that doesn't require schemas. It shares the same schema-free JSON model as MongoDB and Elasticsearch. Instead of spending weeks or months defining schemas, transforming data (ETL) and maintaining those schemas, simply point Drill at your data (file, directory, HBase table, etc.) and run your queries. Drill automatically understands the structure of the data. Drill's self-service approach reduces the burden on IT and increases the productivity and agility of analysts and developers.</p>
 
+<h2>3. Query complex, semi-structured data in-situ</h2><p>Drill's schema-free JSON model allows you to query complex, semi-structured data in situ. No need to flatten or transform the data prior to or during query execution. Drill also provides intuitive extensions to SQL to work with nested data. Here's a simple query on a JSON file demonstrating how to access nested elements and arrays:</p>
+<pre>
+SELECT * FROM (SELECT t.trans_id,
+                      t.trans_info.prod_id[0] AS prod_id,
+                      t.trans_info.purch_flag AS purchased
+               FROM `clicks/clicks.json` t) sq
+WHERE sq.prod_id BETWEEN 700 AND 750 AND
+      sq.purchased = 'true'
+ORDER BY sq.prod_id;
+</pre>
 
-
-
-
-
-
-
-<h2>2. Supports ANSI SQL - as you know it</h2><p>Apache Drill is compatible with ANSI SQL standards. This means that users don't need to learn a new query language or know the nuances of "SQL Like" to work with Drill or migrate existing workloads to Drill.  </p><p>Drill supports SQL 2003 syntax and provides all the key SQL data types (such as DATE, INTERVAL, TIMESTAMP, VARCHAR, DECIMAL) and query constructs (such as correlated sub-queries, joins in WHERE clause) to provide a smooth and familiar analytics experience.  </p><p>Here is an example of a TPC-H standard query that runs in Drill "as is".  </p>
+<h2>4. Real SQL - not "SQL-like"</h2>
+<p>Drill supports the standard SQL:2003 syntax. No need to learn a new "SQL-like" language or struggle with a semi-functional BI tool. Drill supports many data types including DATE, INTERVAL, TIMESTAMP, VARCHAR and DECIMAL, as well as complex query constructs such as correlated sub-queries and joins in WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill "as is":</p>
 <pre>
 # TPC-H query 4
 SELECT  o.o_orderpriority, count(*) AS order_count
@@ -146,62 +140,32 @@ WHERE o.o_orderdate >= date '1996-10-01'
       ORDER BY o.o_orderpriority;
 </pre>
 
-<h2>3. Works with your BI tools</h2><p>Apache Drill integrates with the BI/SQL tools such as Tableau, MicroStrategy, Pentaho and Jaspersoft using JDBC/ODBC drivers. This means that users can now use same BI/Analytics tools they are deeply familiar with in order to perform proactive business intelligence using more raw data, up-to-date data and new types of data available in Hadoop/NoSQL stores at a significantly low cost and rapid time to market.  </p><p>Here is a quick look at the Drill ODBC Driver DSN UI - Drill explorer - a data exploration environment to understand Drill data and create views along with a BI visualization using Drill as a data source.  </p><p style="margin-left:40px"><img alt="MapR Drill ODBC Driver DSN Setup" src="https://www.mapr.com/sites/default/files/blogimages/MapR-Drill-ODBC-Driver-DSN-Setup.png" style="height:498px; width:450px"></p><p style="margin-left:40px"><img alt="data exploration enviroment" src="https://www.mapr.com/sites/default/files/blogimages
 /Data-exploration-enviroment.png" style="height:354px; width:600px"></p><p style="margin-left:40px"><img alt="Tableau example" src="https://www.mapr.com/sites/default/files/blogimages/Tableau-example.png" style="height:583px; width:600px"></p><h2>4. Supports self-describing data with no ETL</h2><p>Self-describing data is where schema is specified as part of the data itself. File formats such as Parquet, JSON, Protobuf, XML, Avro and NoSQL databases are all examples of self-describing data. Some of these data formats are also dynamic and complex in that every record in the data can have its own set of columns/attributes and each column can be semi-structured/nested.  </p><p>Think about a JSON document with multiple levels of nesting and optional/repeated elements at each level or a wide HBase table with 100s-1000s of columns with varying schema across rows. How about third party data that you are looking to leverage in BI/Analytics, but you have no control on how schemas will evolve?
   </p><p>Drill supports querying self-describing data without defining and managing any centralized schema definitions in Hive metastore. Schema is discovered dynamically on the fly when the queries come in.  </p><p>Dynamic schema discovery with no upfront modeling/schema management means that companies now can eliminate time delays of weeks/months of ETL before data is available to users for data exploration. Users can get more up-to-date/real-time data in order to make informed and timely decisions.  </p><p>Here are a few quick examples on querying files and directories using Drill.  </p>
-<pre>
-//clicks.json is a file and logs is a partitioned directory by year & month on Hadoop
+<h2>5. Leverage standard BI tools</h2>
+<p>Drill works with standard BI tools. You can keep using the tools you love, such as Tableau, MicroStrategy, QlikView and Excel. No need to introduce yet another visualization or dashboard tool. Combine a self-service BI tool with the only self-service SQL engine to enable true self-service data exploration.</p>
 
-0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
-0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs 
-where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10;
-</pre>
-
-
-
-
-<h2>5. Handles Complex Data Types</h2><p>Drill comes with a flexible JSON-like data model to natively query and process complex/multi-structured data. The data doesn't need to be flattened or transformed either at the design time or runtime providing high performance for queries on complex data. Drill provides intuitive extensions to SQL to work with nested data using MAP and ARRAY data types.  </p><p>Here is an example indicating how Drill queries a JSON file and accesses the nested maps and array fields.  </p>
+<h2>6. Interactive queries on Hive tables</h2><p>Apache Drill lets you leverage your investments in Hive. You can run interactive queries with Drill on your Hive tables and access all Hive input/output formats (including custom SerDes). You can join tables associated with different Hive metastores, and you can join a Hive table with an HBase table or a directory of log files. Here's a simple query in Drill on a Hive table:</p>
 <pre>
-// prod_id is an array field in clicks.json file  
-
-select * from (select t.trans_id, t.trans_info.prod_id[0] as prodid,
-t.trans_info.purch_flag as purchased
-from `clicks/clicks.json` t) sq
-where sq.prodid between 700 and 750 and sq.purchased='true' order by sq.prodid;
+SELECT `month`, state, sum(order_total) AS sales
+FROM hive.orders 
+GROUP BY `month`, state
+ORDER BY 3 DESC LIMIT 5;
 </pre>
 
-<h2>6. Plays Well with Hive</h2><p>Apache Drill lets you reuse investments made in existing Hive deployments. You can do queries on Hive tables and access 100+ Hive input/output formats (including custom serdes) with no re-work. Drill serves as a complement to Hive deployments by offering low latency queries.</p><p>Here is a sample Hive storage plugin configuration looks like in Drill, followed by a query on a Hive table.  </p>
+<h2>7. Access multiple data sources</h2><p>Drill is designed with extensibility in mind. It provides out-of-the-box connectivity to file systems (local or distributed file systems such as S3, HDFS and MapR-FS), HBase and Hive. You can implement a storage plugin to make Drill work with any other data source. Drill can combine data from multiple data sources on the fly in a single query, with no centralized metadata definitions. Here's a query that combines data from a Hive table, an HBase table (view) and a JSON file:</p>
 <pre>
-//Storage plugin configuration for Hive
-hive
-
-{
- "type": "hive",
- "enabled": true,
- "configProps": {
-   "hive.metastore.uris": "thrift://localhost:9083",
-   "hive.metastore.sasl.enabled": "false"
- }
-}
-
-//Query on a Hive table 'orders'
-0: jdbc:drill:> select `month`, state, sum(order_total) as sales from hive.orders 
-group by `month`, state order by 3 desc limit 5;
-
+SELECT custview.membership, sum(orders.order_total) AS sales
+FROM hive.orders, custview, dfs.`clicks/clicks.json` c 
+WHERE orders.cust_id = custview.cust_id AND orders.cust_id = c.user_info.cust_id 
+GROUP BY custview.membership
+ORDER BY 2;
 </pre>
 
+<h2>8. User-Defined Functions (UDFs)</h2><p>Drill exposes a simple and high-performance Java API to build custom functions (UDFs and UDAFs) so that you can add your own business logic. If you have already built UDFs in Hive, you can reuse them with Drill with no modifications. Refer to <a href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing Custom Functions</a> for more information.
+</p>
 
+<h2>9. High performance</h2><p>Drill is designed fround the ground up for high throughput and low latency. It doesn't use a general purpose execution engine like MapReduce, Tez or Spark. As a result, Drill is able to deliver its unparalleled flexibility (schema-free JSON model) without compromising performance. Drill's optimizer leverages rule- and cost-based techniques, as well as data locality and operator push-down (the ability to push down query fragments into the back-end data sources). Drill also provides a columnar and vectorized execution engine, resulting in higher memory and CPU efficiency.</p>
 
-<h2>7. Works with Hadoop and Beyond</h2><p>Drill is designed with extensibility in mind. It provides out-of-the-box connectivity to file systems (local or distributed file systems such as S3, HDFS, MapR-FS), HBase, or Hive. The storage plugin interface is extensible to other NoSQL stores (such as Couchbase, Elasticsearch, MongoDB) or relational databases (such as Postgres, MySQL, etc.) or your own custom store. Drill can also combine data from all these data sources in a single query on the fly without any central metadata definitions.</p><p>Here is an example Drill that combines data from Hive, HBase and JSON. </p>
-
-<pre>
-// Hive table 'orders', HBase view 'custview' and JSON file 'clicks.json' are joined together
-
-select custview.membership, sum(orders.order_total) 
-as sales from hive.orders, custview, dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c 
-where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id 
-group by custview.membership order by 2;
-</pre>
-
-<h2>8. Ease of UDFs</h2><p>Drill exposes an easy and high performance Java API to build custom functions (UDFs and UDAFs) and extend SQL for the data and the business logic that is specific to your organization. If you have already built UDFs in Hive, you can reuse them with Drill with no modifications. Refer to <a href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing Custom Functions</a> for more information.  </p><h2>9. Provides low latency queries</h2><p>Drill is built from the ground up for short and low-latency queries on large datasets. Drill doesn't use MapReduce; instead it comes with a distributed SQL MPP engine to execute queries in parallel on a cluster. Any of the Drillbits (core service in Drill) is capable of receiving requests from users. The optimizer in Drill is sophisticated and leverages various rule- based and cost-based techniques, optimization capabilities of the data sources, along with data locality to determine the most
  efficient query plan and then distribute the execution across multiple nodes in the cluster. Drill also provides a columnar and vectorized execution engine to offer high memory and CPU efficiencies along with rapid performance for a wide variety of analytic queries.  </p><h2>10. Supports large datasets</h2><p>Drill is built to scale to big data needs and is not restricted by memory available on the cluster nodes. For performance, Drill tries to do query execution in-memory when possible, using an optimistic/pipelined model and spills to disk only if the working dataset doesn't fit in memory.  </p><p>For more examples on how to use Drill, download  <a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">Apache Drill sandbox</a>  and try out the  <a href="http://doc.mapr.com/display/MapR/Apache+Drill+Tutorial">sandbox tutorial</a>. 
+<h2>10. Scales from a single laptop to a 1000-node cluster</h2><p>Drill is available as a simple download you can run on your laptop. When you're ready to analyze larger datasets, simply deploy Drill on your Hadoop cluster (up to 1000 commodity servers). Drill leverages the aggregate memory in the cluster to execute queries using an optimistic pipelined model, and automatically spills to disk when the working set doesn't fit in memory.</p>.
 						
 						<!-- Last Line -->
             </div>