You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by on 2015/06/22 23:07:27 UTC

[3/8] orc git commit: Publish first version of site.
diff --git a/docs/stripes.html b/docs/stripes.html
new file mode 100644
index 0000000..6302fa4
--- /dev/null
+++ b/docs/stripes.html
@@ -0,0 +1,1066 @@
+<html lang="en-US">
+  <meta charset="UTF-8">
+  <title>Stripes</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="current">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="115" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="current">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+    </nav>
+  </div>
+    <section class="docs">
+    <div class="grid">
+      <div class="docs-nav-mobile unit whole show-on-mobiles">
+  <select onchange="if (this.value) window.location.href=this.value">
+    <option value="">Navigate the docs…</option>
+    <optgroup label="Overview">
+      <option value="/docs/index.html">Background</option>
+      <option value="/docs/types.html">Types</option>
+      <option value="/docs/indexes.html">Indexes</option>
+      <option value="/docs/acid.html">ACID support</option>
+    </optgroup>
+    <optgroup label="Hive Usage">
+      <option value="/docs/hive-ddl.html">Hive DDL</option>
+      <option value="/docs/hive-config.html">Hive Configuration</option>
+    </optgroup>
+    <optgroup label="Format Specification">
+      <option value="/docs/spec-intro.html">Introduction</option>
+      <option value="/docs/file-tail.html">File Tail</option>
+      <option value="/docs/compression.html">Compression</option>
+      <option value="/docs/run-length.html">Run Length Encoding</option>
+      <option value="/docs/stripes.html">Stripes</option>
+      <option value="/docs/encodings.html">Column Encodings</option>
+      <option value="/docs/spec-index.html">Indexes</option>
+    </optgroup>
+  </select>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Stripes</h1>
+          <p>The body of ORC files consists of a series of stripes. Stripes are
+large (typically ~200MB) and independent of each other and are often
+processed by different tasks. The defining characteristic for columnar
+storage formats is that the data for each column is stored separately
+and that reading data out of the file should be proportional to the
+number of columns read.</p>
+<p>In ORC files, each column is stored in several streams that are stored
+next to each other in the file. For example, an integer column is
+represented as two streams PRESENT, which uses one with a bit per
+value recording if the value is non-null, and DATA, which records the
+non-null values. If all of a column’s values in a stripe are non-null,
+the PRESENT stream is omitted from the stripe. For binary data, ORC
+uses three streams PRESENT, DATA, and LENGTH, which stores the length
+of each value. The details of each type will be presented in the
+following subsections.</p>
+<h1 id="stripe-footer">Stripe Footer</h1>
+<p>The stripe footer contains the encoding of each column and the
+directory of the streams including their location.</p>
+<p><code>message StripeFooter {
+ // the location of each stream
+ repeated Stream streams = 1;
+ // the encoding of each column
+ repeated ColumnEncoding columns = 2;
+<p>To describe each stream, ORC stores the kind of stream, the column id,
+and the stream’s size in bytes. The details of what is stored in each stream
+depends on the type and encoding of the column.</p>
+<p><code>message Stream {
+ enum Kind {
+ // boolean stream of whether the next value is non-null
+ PRESENT = 0;
+ // the primary data stream
+ DATA = 1;
+ // the length of each value for variable length data
+ LENGTH = 2;
+ // the dictionary blob
+ // deprecated prior to Hive 0.11
+ // It was used to store the number of instances of each value in the
+ // dictionary
+ // a secondary data stream
+ // the index for seeking to particular row groups
+ ROW_INDEX = 6;
+ }
+ required Kind kind = 1;
+ // the column id
+ optional uint32 column = 2;
+ // the number of bytes in the file
+ optional uint64 length = 3;
+<p>Depending on their type several options for encoding are possible. The
+encodings are divided into direct or dictionary-based categories and
+further refined as to whether they use RLE v1 or v2.</p>
+<p><code>message ColumnEncoding {
+ enum Kind {
+ // the encoding is mapped directly to the stream using RLE v1
+ DIRECT = 0;
+ // the encoding uses a dictionary of unique values using RLE v1
+ // the encoding is direct using RLE v2
+ DIRECT\_V2 = 2;
+ // the encoding is dictionary-based using RLE v2
+ }
+ required Kind kind = 1;
+ // for dictionary encodings, record the size of the dictionary
+ optional uint32 dictionarySize = 2;
+    <div class="section-nav">
+      <div class="left align-right">
+            <a href="/docs/run-length.html" class="prev">Back</a>
+      </div>
+      <div class="right align-left">
+            <a href="/docs/encodings.html" class="next">Next</a>
+      </div>
+    </div>
+    <div class="clear"></div>
+        </article>
+      </div>
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
+    <h4>Overview</h4>
+      <li class=""><a href="/docs/index.html">Background</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
+    <h4>Hive Usage</h4>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+    <h4>Format Specification</h4>
+      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/compression.html">Compression</a></li>
+      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class="current"><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
+      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+  </aside>
+      <div class="clear"></div>
+    </div>
+  </section>
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2015
+     <a href="">Apache Software Foundation</a>
+     under the terms of the <a
+      href="">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+      if (typeof !== "undefined" && !== "") {
+        header.appendChild(anchorForId(;
+      }
+    }
+  };
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
diff --git a/docs/types.html b/docs/types.html
new file mode 100644
index 0000000..4535269
--- /dev/null
+++ b/docs/types.html
@@ -0,0 +1,1035 @@
+<html lang="en-US">
+  <meta charset="UTF-8">
+  <title>Types</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="current">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="115" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="current">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+    </nav>
+  </div>
+    <section class="docs">
+    <div class="grid">
+      <div class="docs-nav-mobile unit whole show-on-mobiles">
+  <select onchange="if (this.value) window.location.href=this.value">
+    <option value="">Navigate the docs…</option>
+    <optgroup label="Overview">
+      <option value="/docs/index.html">Background</option>
+      <option value="/docs/types.html">Types</option>
+      <option value="/docs/indexes.html">Indexes</option>
+      <option value="/docs/acid.html">ACID support</option>
+    </optgroup>
+    <optgroup label="Hive Usage">
+      <option value="/docs/hive-ddl.html">Hive DDL</option>
+      <option value="/docs/hive-config.html">Hive Configuration</option>
+    </optgroup>
+    <optgroup label="Format Specification">
+      <option value="/docs/spec-intro.html">Introduction</option>
+      <option value="/docs/file-tail.html">File Tail</option>
+      <option value="/docs/compression.html">Compression</option>
+      <option value="/docs/run-length.html">Run Length Encoding</option>
+      <option value="/docs/stripes.html">Stripes</option>
+      <option value="/docs/encodings.html">Column Encodings</option>
+      <option value="/docs/spec-index.html">Indexes</option>
+    </optgroup>
+  </select>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Types</h1>
+          <p>ORC files are completely self-describing and do not depend on the Hive
+Metastore or any other external metadata. The file includes all of the
+type and encoding information for the objects stored in the file. Because the
+file is self-contained, it does not depend on the user’s environment to
+correctly interpret the file’s contents.</p>
+<p>ORC provides a rich set of scalar and compound types:</p>
+  <li>Integer
+    <ul>
+      <li>boolean (1 bit)</li>
+      <li>tinyint (8 bit)</li>
+      <li>smallint (16 bit)</li>
+      <li>int (32 bit)</li>
+      <li>bigint (64 bit)</li>
+    </ul>
+  </li>
+  <li>Floating point
+    <ul>
+      <li>float</li>
+      <li>double</li>
+    </ul>
+  </li>
+  <li>String types
+    <ul>
+      <li>string</li>
+      <li>char</li>
+      <li>varchar</li>
+    </ul>
+  </li>
+  <li>Binary blobs
+    <ul>
+      <li>binary</li>
+    </ul>
+  </li>
+  <li>Date/time
+    <ul>
+      <li>timestamp</li>
+      <li>date</li>
+    </ul>
+  </li>
+  <li>Compound types
+    <ul>
+      <li>struct</li>
+      <li>list</li>
+      <li>map</li>
+      <li>union</li>
+    </ul>
+  </li>
+<p>All ORC file are logically sequences of identically typed objects. Hive
+always uses a struct with a field for each of the top-level columns as
+the root object type, but that is not required. All types in ORC can take
+null values including the compound types.</p>
+<p>Compound types have children columns that hold the values for their
+sub-elements. For example, a struct column has one child column for
+each field of the struct. Lists always have a single child column for
+the element values and maps always have two child columns. Union
+columns have one child column for each of the variants.</p>
+<p>Given the following definition of the table Foobar, the columns in the
+file would form the given tree.</p>
+<p><code>create table Foobar (
+ myInt int,
+ myMap map&lt;string,
+ struct&lt;myString : string,
+ myDouble: double&gt;&gt;,
+ myTime timestamp
+<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
+    <div class="section-nav">
+      <div class="left align-right">
+            <a href="/docs/index.html" class="prev">Back</a>
+      </div>
+      <div class="right align-left">
+            <a href="/docs/indexes.html" class="next">Next</a>
+      </div>
+    </div>
+    <div class="clear"></div>
+        </article>
+      </div>
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
+    <h4>Overview</h4>
+      <li class=""><a href="/docs/index.html">Background</a></li>
+      <li class="current"><a href="/docs/types.html">Types</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
+    <h4>Hive Usage</h4>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+    <h4>Format Specification</h4>
+      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/compression.html">Compression</a></li>
+      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
+      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+  </aside>
+      <div class="clear"></div>
+    </div>
+  </section>
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2015
+     <a href="">Apache Software Foundation</a>
+     under the terms of the <a
+      href="">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+      if (typeof !== "undefined" && !== "") {
+        header.appendChild(anchorForId(;
+      }
+    }
+  };
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
diff --git a/downloads/index.html b/downloads/index.html
new file mode 100644
index 0000000..7a317d7
--- /dev/null
+++ b/downloads/index.html
@@ -0,0 +1,142 @@
+<html lang="en-US">
+  <meta charset="UTF-8">
+  <title>Downloads</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="115" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/">Documentation</a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+    </nav>
+  </div>
+  <section class="standalone">
+  <div class="grid">
+    <div class="unit whole">
+      <article>
+        <h1>Downloads</h1>
+        <p>We haven’t made any releases as a separate project yet. Please download
+the Hive 1.1 release and use the hive-exec.jar.</p>
+      </article>
+    </div>
+    <div class="clear"></div>
+  </div>
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2015
+     <a href="">Apache Software Foundation</a>
+     under the terms of the <a
+      href="">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+      if (typeof !== "undefined" && !== "") {
+        header.appendChild(anchorForId(;
+      }
+    }
+  };
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
diff --git a/favicon.ico b/favicon.ico
new file mode 100644
index 0000000..d877215
Binary files /dev/null and b/favicon.ico differ
diff --git a/fonts/fontawesome-webfont.eot b/fonts/fontawesome-webfont.eot
new file mode 100755
index 0000000..84677bc
Binary files /dev/null and b/fonts/fontawesome-webfont.eot differ