You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by ts...@apache.org on 2015/01/28 00:28:48 UTC

svn commit: r1655190 - in /drill/site/trunk/content/drill: blog/2015/ blog/2015/01/ blog/2015/01/27/ blog/2015/01/27/schema-free-json-data-infrastructure/ blog/2015/01/27/schema-free-json-data-infrastructure/index.html blog/index.html feed.xml

Author: tshiran
Date: Tue Jan 27 23:28:48 2015
New Revision: 1655190

URL: http://svn.apache.org/r1655190
Log:
New blog post

Added:
    drill/site/trunk/content/drill/blog/2015/
    drill/site/trunk/content/drill/blog/2015/01/
    drill/site/trunk/content/drill/blog/2015/01/27/
    drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/
    drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
Modified:
    drill/site/trunk/content/drill/blog/index.html
    drill/site/trunk/content/drill/feed.xml

Added: drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html?rev=1655190&view=auto
==============================================================================
--- drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html (added)
+++ drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html Tue Jan 27 23:28:48 2015
@@ -0,0 +1,172 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Schema-free JSON Data Infrastructure - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="post int_text">
+
+  <header class="post-header">
+    <h1 class="post-title">Schema-free JSON Data Infrastructure</h1>
+    <p class="post-meta">
+    
+      
+      
+      <strong>Author:</strong> Tomer Shiran (Founder, PMC Member and Committer, Apache Drill)
+    
+<br/><strong>Date:</strong> Jan 27, 2015
+</p>
+  </header>
+  <div class="addthis_sharing_toolbox"></div>
+
+  <article class="post-content">
+    <p>JSON has emerged in recent years as the de-facto standard data exchange format. It is being used everywhere. Front-end Web applications use JSON to maintain data and communicate with back-end applications. Web APIs are JSON-based (eg, <a href="https://dev.twitter.com/rest/public">Twitter REST APIs</a>, <a href="http://developers.marketo.com/documentation/rest/">Marketo REST APIs</a>, <a href="https://developer.github.com/v3/">GitHub API</a>). It&#39;s the format of choice for public datasets, operational log files and more.</p>
+
+<h1 id="why-is-json-a-convenient-data-exchange-format?">Why is JSON a Convenient Data Exchange Format?</h1>
+
+<p>While I won&#39;t dive into the historical roots of JSON (JavaScript Object Notation, <a href="http://en.wikipedia.org/wiki/JSON#JavaScript_eval.28.29"><code>eval()</code></a>, etc.), I do want to highlight several attributes of JSON that make it a convenient data exchange format:</p>
+
+<ul>
+<li><strong>JSON is self-describing</strong>. You can look at a JSON document and understand what it represents. The field names are included in the document. You don&#39;t need an external schema or definition to interpret JSON-encoded data. This makes life easier for anyone who wants to deal with the data, and it also means that a collection of JSON documents represents what many people call a &quot;schema-less dataset&quot; (where structure can evolve, and different records can have different fields).</li>
+<li><strong>JSON is simple</strong>. Other self-describing formats such as XML are much more complicated. A JSON document is made up of arrays and maps (or objects, in JSON terminology), and that&#39;s about it.</li>
+<li><strong>JSON can naturally represent real-world objects</strong>. Try representing your application&#39;s <code>Customer</code> object (with the person&#39;s address, order history, etc.) in a CSV file or a relational database. It&#39;s hard. In fact, ORM systems were invented to help alleviate this issue.</li>
+<li><strong>JSON libraries are available in virtually every programming language</strong>. Take a look at <a href="http://www.json.org/">the list of supported languages on JSON.org</a>. I counted 15 languages that start with the letters A, B or C.</li>
+<li><strong>JSON is idiomatic in loosely typed languages</strong>. Many loosely typed languages, such as Python, Ruby and JavaScript, have data structures that are similar to JSON objects, making it very natural to handle JSON data in those languages. For example, a Python dictionary looks just like a JSON object. This makes it easy for developers to utilize JSON in their applications.</li>
+</ul>
+
+<h1 id="json-data-infrastructure">JSON Data Infrastructure</h1>
+
+<p>Traditional data infrastructure, such as relational databases, has some features that make it easier to store and process JSON-encoded data. For example, Oracle has <a href="https://docs.oracle.com/database/121/ADXDB/json.htm">a JSON data type and a set of functions for handling JSON data</a>.</p>
+
+<p>However, a new class of data infrastructure is providing a much more seamless experience via a full-fledged JSON data model. For example:</p>
+
+<ul>
+<li>Drill is a SQL engine in which each record is conceptually a JSON document.</li>
+<li>Elasticsearch is a search engine in which each indexed document is conceptually a JSON document.</li>
+<li>MongoDB is an operational database in which each record is conceptually a JSON document.</li>
+</ul>
+
+<p>These systems view JSON as a data model as opposed to one of many data types, realizing that JSON offers a simple way to represent real-world objects.</p>
+
+<table><thead>
+<tr>
+<th></th>
+<th>Traditional Infrastructure</th>
+<th>JSON Infrastructure</th>
+</tr>
+</thead><tbody>
+<tr>
+<td><strong>Examples:</strong></td>
+<td>Oracle, SQL Server</td>
+<td>Drill, Elasticsearch, MongoDB</td>
+</tr>
+<tr>
+<td><strong>Record:</strong></td>
+<td>Tuple</td>
+<td>JSON document</td>
+</tr>
+<tr>
+<td><strong>Variable schema:</strong></td>
+<td>No</td>
+<td>Yes</td>
+</tr>
+</tbody></table>
+
+<p>If you happen to be in the Bay Area tomorrow, please join Gaurav Gupta (VP Product Management, Elasticsearch), Paul Pedersen (Deputy CTO, MongoDB), Robert Greene (Senior Principal Product Manager, Oracle), Sukanta Ganguly (VP Solutions Architecture, Aerospike) and me for a panel moderated by Gartner&#39;s Nick Heudecker on this new world of schema-free JSON. Check out <a href="http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/">The Hive Big Data Think Tank</a> for more information.</p>
+
+  </article>
+ <div id="disqus_thread"></div>
+    <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'drill'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+    </script>
+    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+    
+</div>
+<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d" async="async"></script>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/blog/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/index.html?rev=1655190&r1=1655189&r2=1655190&view=diff
==============================================================================
--- drill/site/trunk/content/drill/blog/index.html (original)
+++ drill/site/trunk/content/drill/blog/index.html Tue Jan 27 23:28:48 2015
@@ -68,6 +68,8 @@
 </div>
 
 <div class="int_text" align="left"><!-- previously: site.posts -->
+<p><a class="post-link" href="/blog/2015/01/27/schema-free-json-data-infrastructure/">Schema-free JSON Data Infrastructure</a> (Jan 27, 2015)<br/>JSON has emerged as the de-facto standard data exchange format. Data infrastructure technologies such as Apache Drill, MongoDB and Elasticsearch are embracing JSON as their native data models, bringing game-changing ease-of-use and agility to developers and analysts.</p>
+<!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/23/drill-0.7-released/">Drill 0.7 Released</a> (Dec 23, 2014)<br/>The community has just released Drill 0.7, which includes 228 resolved JIRAs and numerous enhancements.</p>
 <!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/16/whats-coming-in-2015/">What's Coming in 2015?</a> (Dec 16, 2014)<br/>Drill is now a top-level project, and the community is expanding rapidly. Find out more about some of the new features planned for 2015.</p>

Modified: drill/site/trunk/content/drill/feed.xml
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1655190&r1=1655189&r2=1655190&view=diff
==============================================================================
--- drill/site/trunk/content/drill/feed.xml (original)
+++ drill/site/trunk/content/drill/feed.xml Tue Jan 27 23:28:48 2015
@@ -6,11 +6,76 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 14 Jan 2015 21:01:22 -0800</pubDate>
-    <lastBuildDate>Wed, 14 Jan 2015 21:01:22 -0800</lastBuildDate>
+    <pubDate>Tue, 27 Jan 2015 15:28:02 -0800</pubDate>
+    <lastBuildDate>Tue, 27 Jan 2015 15:28:02 -0800</lastBuildDate>
     <generator>Jekyll v2.5.1</generator>
     
       <item>
+        <title>Schema-free JSON Data Infrastructure</title>
+        <description>&lt;p&gt;JSON has emerged in recent years as the de-facto standard data exchange format. It is being used everywhere. Front-end Web applications use JSON to maintain data and communicate with back-end applications. Web APIs are JSON-based (eg, &lt;a href=&quot;https://dev.twitter.com/rest/public&quot;&gt;Twitter REST APIs&lt;/a&gt;, &lt;a href=&quot;http://developers.marketo.com/documentation/rest/&quot;&gt;Marketo REST APIs&lt;/a&gt;, &lt;a href=&quot;https://developer.github.com/v3/&quot;&gt;GitHub API&lt;/a&gt;). It&amp;#39;s the format of choice for public datasets, operational log files and more.&lt;/p&gt;
+
+&lt;h1 id=&quot;why-is-json-a-convenient-data-exchange-format?&quot;&gt;Why is JSON a Convenient Data Exchange Format?&lt;/h1&gt;
+
+&lt;p&gt;While I won&amp;#39;t dive into the historical roots of JSON (JavaScript Object Notation, &lt;a href=&quot;http://en.wikipedia.org/wiki/JSON#JavaScript_eval.28.29&quot;&gt;&lt;code&gt;eval()&lt;/code&gt;&lt;/a&gt;, etc.), I do want to highlight several attributes of JSON that make it a convenient data exchange format:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;JSON is self-describing&lt;/strong&gt;. You can look at a JSON document and understand what it represents. The field names are included in the document. You don&amp;#39;t need an external schema or definition to interpret JSON-encoded data. This makes life easier for anyone who wants to deal with the data, and it also means that a collection of JSON documents represents what many people call a &amp;quot;schema-less dataset&amp;quot; (where structure can evolve, and different records can have different fields).&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON is simple&lt;/strong&gt;. Other self-describing formats such as XML are much more complicated. A JSON document is made up of arrays and maps (or objects, in JSON terminology), and that&amp;#39;s about it.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON can naturally represent real-world objects&lt;/strong&gt;. Try representing your application&amp;#39;s &lt;code&gt;Customer&lt;/code&gt; object (with the person&amp;#39;s address, order history, etc.) in a CSV file or a relational database. It&amp;#39;s hard. In fact, ORM systems were invented to help alleviate this issue.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON libraries are available in virtually every programming language&lt;/strong&gt;. Take a look at &lt;a href=&quot;http://www.json.org/&quot;&gt;the list of supported languages on JSON.org&lt;/a&gt;. I counted 15 languages that start with the letters A, B or C.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON is idiomatic in loosely typed languages&lt;/strong&gt;. Many loosely typed languages, such as Python, Ruby and JavaScript, have data structures that are similar to JSON objects, making it very natural to handle JSON data in those languages. For example, a Python dictionary looks just like a JSON object. This makes it easy for developers to utilize JSON in their applications.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;json-data-infrastructure&quot;&gt;JSON Data Infrastructure&lt;/h1&gt;
+
+&lt;p&gt;Traditional data infrastructure, such as relational databases, has some features that make it easier to store and process JSON-encoded data. For example, Oracle has &lt;a href=&quot;https://docs.oracle.com/database/121/ADXDB/json.htm&quot;&gt;a JSON data type and a set of functions for handling JSON data&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;However, a new class of data infrastructure is providing a much more seamless experience via a full-fledged JSON data model. For example:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Drill is a SQL engine in which each record is conceptually a JSON document.&lt;/li&gt;
+&lt;li&gt;Elasticsearch is a search engine in which each indexed document is conceptually a JSON document.&lt;/li&gt;
+&lt;li&gt;MongoDB is an operational database in which each record is conceptually a JSON document.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;These systems view JSON as a data model as opposed to one of many data types, realizing that JSON offers a simple way to represent real-world objects.&lt;/p&gt;
+
+&lt;table&gt;&lt;thead&gt;
+&lt;tr&gt;
+&lt;th&gt;&lt;/th&gt;
+&lt;th&gt;Traditional Infrastructure&lt;/th&gt;
+&lt;th&gt;JSON Infrastructure&lt;/th&gt;
+&lt;/tr&gt;
+&lt;/thead&gt;&lt;tbody&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;Oracle, SQL Server&lt;/td&gt;
+&lt;td&gt;Drill, Elasticsearch, MongoDB&lt;/td&gt;
+&lt;/tr&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Record:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;Tuple&lt;/td&gt;
+&lt;td&gt;JSON document&lt;/td&gt;
+&lt;/tr&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Variable schema:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;No&lt;/td&gt;
+&lt;td&gt;Yes&lt;/td&gt;
+&lt;/tr&gt;
+&lt;/tbody&gt;&lt;/table&gt;
+
+&lt;p&gt;If you happen to be in the Bay Area tomorrow, please join Gaurav Gupta (VP Product Management, Elasticsearch), Paul Pedersen (Deputy CTO, MongoDB), Robert Greene (Senior Principal Product Manager, Oracle), Sukanta Ganguly (VP Solutions Architecture, Aerospike) and me for a panel moderated by Gartner&amp;#39;s Nick Heudecker on this new world of schema-free JSON. Check out &lt;a href=&quot;http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/&quot;&gt;The Hive Big Data Think Tank&lt;/a&gt; for more information.&lt;/p&gt;
+</description>
+        <pubDate>Tue, 27 Jan 2015 00:50:01 -0800</pubDate>
+        <link>/blog/2015/01/27/schema-free-json-data-infrastructure/</link>
+        <guid isPermaLink="true">/blog/2015/01/27/schema-free-json-data-infrastructure/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Drill 0.7 Released</title>
         <description>&lt;p&gt;I&amp;#39;m excited to announce that the community has just released Drill 0.7, which includes &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&amp;amp;version=12327473&quot;&gt;228 resolved JIRAs&lt;/a&gt; and numerous enhancements such as: &lt;/p&gt;