You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by om...@apache.org on 2018/04/17 17:49:48 UTC

[1/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Repository: orc
Updated Branches:
  refs/heads/asf-site c63412b1b -> c6e290902


http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/specification/ORCv2.html
----------------------------------------------------------------------
diff --git a/specification/ORCv2.html b/specification/ORCv2.html
new file mode 100644
index 0000000..b78fc0a
--- /dev/null
+++ b/specification/ORCv2.html
@@ -0,0 +1,1769 @@
+<!DOCTYPE HTML>
+<html lang="en-US">
+<head>
+  <meta charset="UTF-8">
+  <title>Evolving Draft for ORC Specification v2</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+</head>
+
+
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+    </nav>
+  </div>
+</header>
+
+
+  <section class="standalone">
+  <div class="grid">
+
+    <div class="unit whole">
+      <article>
+        <h1>Evolving Draft for ORC Specification v2</h1>
+        <p>This specification is rapidly evolving and should only be used for
+developers on the project.</p>
+
+<h1 id="to-do-items">TO DO items</h1>
+
+<p>The list of things that we plan to change:</p>
+
+<ul>
+  <li>Create a decimal representation with fixed scale using rle.</li>
+  <li>Create a better float/double encoding that splits mantissa and
+exponent.</li>
+  <li>Create a dictionary encoding for float, double, and decimal.</li>
+  <li>Create RLEv3:
+    <ul>
+      <li>64 and 128 bit variants</li>
+      <li>Zero suppression</li>
+      <li>Evaluate the rle subformats</li>
+    </ul>
+  </li>
+  <li>Group stripe data into stripelets to enable Async IO for reads.</li>
+  <li>Reorder stripe data into (stripe metadata, index, dictionary, data)</li>
+  <li>Stop sorting dictionaries and record the sort order separately in the index.</li>
+  <li>Remove use of RLEv1 and RLEv2.</li>
+  <li>Remove non-utf8 bloom filter.</li>
+  <li>Use numeric value for decimal statistics and bloom filter.</li>
+  <li>Add Zstd with dictionary.</li>
+</ul>
+
+<h1 id="motivation">Motivation</h1>
+
+<p>Hive’s RCFile was the standard format for storing tabular data in
+Hadoop for several years. However, RCFile has limitations because it
+treats each column as a binary blob without semantics. In Hive 0.11 we
+added a new file format named Optimized Row Columnar (ORC) file that
+uses and retains the type information from the table definition. ORC
+uses type specific readers and writers that provide light weight
+compression techniques such as dictionary encoding, bit packing, delta
+encoding, and run length encoding – resulting in dramatically smaller
+files. Additionally, ORC can apply generic compression using zlib, or
+Snappy on top of the lightweight compression for even smaller
+files. However, storage savings are only part of the gain. ORC
+supports projection, which selects subsets of the columns for reading,
+so that queries reading only one column read only the required
+bytes. Furthermore, ORC files include light weight indexes that
+include the minimum and maximum values for each column in each set of
+10,000 rows and the entire file. Using pushdown filters from Hive, the
+file reader can skip entire sets of rows that aren’t important for
+this query.</p>
+
+<p><img src="/img/OrcFileLayout.png" alt="ORC file structure" /></p>
+
+<h1 id="file-tail">File Tail</h1>
+
+<p>Since HDFS does not support changing the data in a file after it is
+written, ORC stores the top level index at the end of the file. The
+overall structure of the file is given in the figure above.  The
+file’s tail consists of 3 parts; the file metadata, file footer and
+postscript.</p>
+
+<p>The metadata for ORC is stored using
+<a href="https://s.apache.org/protobuf_encoding">Protocol Buffers</a>, which provides
+the ability to add new fields without breaking readers. This document
+incorporates the Protobuf definition from the
+<a href="https://s.apache.org/orc_proto">ORC source code</a> and the
+reader is encouraged to review the Protobuf encoding if they need to
+understand the byte-level encoding</p>
+
+<h2 id="postscript">Postscript</h2>
+
+<p>The Postscript section provides the necessary information to interpret
+the rest of the file including the length of the file’s Footer and
+Metadata sections, the version of the file, and the kind of general
+compression used (eg. none, zlib, or snappy). The Postscript is never
+compressed and ends one byte before the end of the file. The version
+stored in the Postscript is the lowest version of Hive that is
+guaranteed to be able to read the file and it stored as a sequence of
+the major and minor version. This file version is encoded as [0,12].</p>
+
+<p>The process of reading an ORC file works backwards through the
+file. Rather than making multiple short reads, the ORC reader reads
+the last 16k bytes of the file with the hope that it will contain both
+the Footer and Postscript sections. The final byte of the file
+contains the serialized length of the Postscript, which must be less
+than 256 bytes. Once the Postscript is parsed, the compressed
+serialized length of the Footer is known and it can be decompressed
+and parsed.</p>
+
+<p><code>message PostScript {
+ // the length of the footer section in bytes
+ optional uint64 footerLength = 1;
+ // the kind of generic compression used
+ optional CompressionKind compression = 2;
+ // the maximum size of each compression chunk
+ optional uint64 compressionBlockSize = 3;
+ // the version of the writer
+ repeated uint32 version = 4 [packed = true];
+ // the length of the metadata section in bytes
+ optional uint64 metadataLength = 5;
+ // the fixed string "ORC"
+ optional string magic = 8000;
+}
+</code></p>
+
+<p><code>enum CompressionKind {
+ NONE = 0;
+ ZLIB = 1;
+ SNAPPY = 2;
+ LZO = 3;
+ LZ4 = 4;
+ ZSTD = 5;
+}
+</code></p>
+
+<h2 id="footer">Footer</h2>
+
+<p>The Footer section contains the layout of the body of the file, the
+type schema information, the number of rows, and the statistics about
+each of the columns.</p>
+
+<p>The file is broken in to three parts- Header, Body, and Tail. The
+Header consists of the bytes “ORC’’ to support tools that want to
+scan the front of the file to determine the type of the file. The Body
+contains the rows and indexes, and the Tail gives the file level
+information as described in this section.</p>
+
+<p><code>message Footer {
+ // the length of the file header in bytes (always 3)
+ optional uint64 headerLength = 1;
+ // the length of the file header and body in bytes
+ optional uint64 contentLength = 2;
+ // the information about the stripes
+ repeated StripeInformation stripes = 3;
+ // the schema information
+ repeated Type types = 4;
+ // the user metadata that was added
+ repeated UserMetadataItem metadata = 5;
+ // the total number of rows in the file
+ optional uint64 numberOfRows = 6;
+ // the statistics of each column across the file
+ repeated ColumnStatistics statistics = 7;
+ // the maximum number of rows in each index entry
+ optional uint32 rowIndexStride = 8;
+}
+</code></p>
+
+<h3 id="stripe-information">Stripe Information</h3>
+
+<p>The body of the file is divided into stripes. Each stripe is self
+contained and may be read using only its own bytes combined with the
+file’s Footer and Postscript. Each stripe contains only entire rows so
+that rows never straddle stripe boundaries. Stripes have three
+sections: a set of indexes for the rows within the stripe, the data
+itself, and a stripe footer. Both the indexes and the data sections
+are divided by columns so that only the data for the required columns
+needs to be read.</p>
+
+<p><code>message StripeInformation {
+ // the start of the stripe within the file
+ optional uint64 offset = 1;
+ // the length of the indexes in bytes
+ optional uint64 indexLength = 2;
+ // the length of the data in bytes
+ optional uint64 dataLength = 3;
+ // the length of the footer in bytes
+ optional uint64 footerLength = 4;
+ // the number of rows in the stripe
+ optional uint64 numberOfRows = 5;
+}
+</code></p>
+
+<h3 id="type-information">Type Information</h3>
+
+<p>All of the rows in an ORC file must have the same schema. Logically
+the schema is expressed as a tree as in the figure below, where
+the compound types have subcolumns under them.</p>
+
+<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
+
+<p>The equivalent Hive DDL would be:</p>
+
+<p><code>create table Foobar (
+ myInt int,
+ myMap map&lt;string,
+ struct&lt;myString : string,
+ myDouble: double&gt;&gt;,
+ myTime timestamp
+);
+</code></p>
+
+<p>The type tree is flattened in to a list via a pre-order traversal
+where each type is assigned the next id. Clearly the root of the type
+tree is always type id 0. Compound types have a field named subtypes
+that contains the list of their children’s type ids.</p>
+
+<p><code>message Type {
+ enum Kind {
+ BOOLEAN = 0;
+ BYTE = 1;
+ SHORT = 2;
+ INT = 3;
+ LONG = 4;
+ FLOAT = 5;
+ DOUBLE = 6;
+ STRING = 7;
+ BINARY = 8;
+ TIMESTAMP = 9;
+ LIST = 10;
+ MAP = 11;
+ STRUCT = 12;
+ UNION = 13;
+ DECIMAL = 14;
+ DATE = 15;
+ VARCHAR = 16;
+ CHAR = 17;
+ }
+ // the kind of this type
+ required Kind kind = 1;
+ // the type ids of any subcolumns for list, map, struct, or union
+ repeated uint32 subtypes = 2 [packed=true];
+ // the list of field names for struct
+ repeated string fieldNames = 3;
+ // the maximum length of the type for varchar or char in UTF-8 characters
+ optional uint32 maximumLength = 4;
+ // the precision and scale for decimal
+ optional uint32 precision = 5;
+ optional uint32 scale = 6;
+}
+</code></p>
+
+<h3 id="column-statistics">Column Statistics</h3>
+
+<p>The goal of the column statistics is that for each column, the writer
+records the count and depending on the type other useful fields. For
+most of the primitive types, it records the minimum and maximum
+values; and for numeric types it additionally stores the sum.
+From Hive 1.1.0 onwards, the column statistics will also record if
+there are any null values within the row group by setting the hasNull flag.
+The hasNull flag is used by ORC’s predicate pushdown to better answer
+‘IS NULL’ queries.</p>
+
+<p><code>message ColumnStatistics {
+ // the number of values
+ optional uint64 numberOfValues = 1;
+ // At most one of these has a value for any column
+ optional IntegerStatistics intStatistics = 2;
+ optional DoubleStatistics doubleStatistics = 3;
+ optional StringStatistics stringStatistics = 4;
+ optional BucketStatistics bucketStatistics = 5;
+ optional DecimalStatistics decimalStatistics = 6;
+ optional DateStatistics dateStatistics = 7;
+ optional BinaryStatistics binaryStatistics = 8;
+ optional TimestampStatistics timestampStatistics = 9;
+ optional bool hasNull = 10;
+}
+</code></p>
+
+<p>For integer types (tinyint, smallint, int, bigint), the column
+statistics includes the minimum, maximum, and sum. If the sum
+overflows long at any point during the calculation, no sum is
+recorded.</p>
+
+<p><code>message IntegerStatistics {
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For floating point types (float, double), the column statistics
+include the minimum, maximum, and sum. If the sum overflows a double,
+no sum is recorded.</p>
+
+<p><code>message DoubleStatistics {
+ optional double minimum = 1;
+ optional double maximum = 2;
+ optional double sum = 3;
+}
+</code></p>
+
+<p>For strings, the minimum value, maximum value, and the sum of the
+lengths of the values are recorded.</p>
+
+<p><code>message StringStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ // sum will store the total length of all strings
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For booleans, the statistics include the count of false and true values.</p>
+
+<p><code>message BucketStatistics {
+ repeated uint64 count = 1 [packed=true];
+}
+</code></p>
+
+<p>For decimals, the minimum, maximum, and sum are stored.</p>
+
+<p><code>message DecimalStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ optional string sum = 3;
+}
+</code></p>
+
+<p>Date columns record the minimum and maximum values as the number of
+days since the epoch (1/1/2015).</p>
+
+<p><code>message DateStatistics {
+ // min,max values saved as days since epoch
+ optional sint32 minimum = 1;
+ optional sint32 maximum = 2;
+}
+</code></p>
+
+<p>Timestamp columns record the minimum and maximum values as the number of
+milliseconds since the epoch (1/1/2015).</p>
+
+<p><code>message TimestampStatistics {
+ // min,max values saved as milliseconds since epoch
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+}
+</code></p>
+
+<p>Binary columns store the aggregate number of bytes across all of the values.</p>
+
+<p><code>message BinaryStatistics {
+ // sum will store the total binary blob length
+ optional sint64 sum = 1;
+}
+</code></p>
+
+<h3 id="user-metadata">User Metadata</h3>
+
+<p>The user can add arbitrary key/value pairs to an ORC file as it is
+written. The contents of the keys and values are completely
+application defined, but the key is a string and the value is
+binary. Care should be taken by applications to make sure that their
+keys are unique and in general should be prefixed with an organization
+code.</p>
+
+<p><code>message UserMetadataItem {
+ // the user defined key
+ required string name = 1;
+ // the user defined binary value
+ required bytes value = 2;
+}
+</code></p>
+
+<h3 id="file-metadata">File Metadata</h3>
+
+<p>The file Metadata section contains column statistics at the stripe
+level granularity. These statistics enable input split elimination
+based on the predicate push-down evaluated per a stripe.</p>
+
+<p><code>message StripeStatistics {
+ repeated ColumnStatistics colStats = 1;
+}
+</code></p>
+
+<p><code>message Metadata {
+ repeated StripeStatistics stripeStats = 1;
+}
+</code></p>
+
+<h1 id="compression">Compression</h1>
+
+<p>If the ORC file writer selects a generic compression codec (zlib or
+snappy), every part of the ORC file except for the Postscript is
+compressed with that codec. However, one of the requirements for ORC
+is that the reader be able to skip over compressed bytes without
+decompressing the entire stream. To manage this, ORC writes compressed
+streams in chunks with headers as in the figure below.
+To handle uncompressable data, if the compressed data is larger than
+the original, the original is stored and the isOriginal flag is
+set. Each header is 3 bytes long with (compressedLength * 2 +
+isOriginal) stored as a little endian value. For example, the header
+for a chunk that compressed to 100,000 bytes would be [0x40, 0x0d,
+0x03]. The header for 5 bytes that did not compress would be [0x0b,
+0x00, 0x00]. Each compression chunk is compressed independently so
+that as long as a decompressor starts at the top of a header, it can
+start decompressing without the previous bytes.</p>
+
+<p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
+
+<p>The default compression chunk size is 256K, but writers can choose
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
+size.</p>
+
+<p>ORC files without generic compression write each stream directly
+with no headers.</p>
+
+<h1 id="run-length-encoding">Run Length Encoding</h1>
+
+<h2 id="base-128-varint">Base 128 Varint</h2>
+
+<p>Variable width integer encodings take advantage of the fact that most
+numbers are small and that having smaller encodings for small numbers
+shrinks the overall size of the data. ORC uses the varint format from
+Protocol Buffers, which writes data in little endian format using the
+low 7 bits of each byte. The high bit in each byte is set if the
+number continues into the next byte.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Unsigned Original</th>
+      <th style="text-align: left">Serialized</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0x00</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">127</td>
+      <td style="text-align: left">0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">128</td>
+      <td style="text-align: left">0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">129</td>
+      <td style="text-align: left">0x81, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,383</td>
+      <td style="text-align: left">0xff, 0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,384</td>
+      <td style="text-align: left">0x80, 0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,385</td>
+      <td style="text-align: left">0x81, 0x80, 0x01</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>For signed integer types, the number is converted into an unsigned
+number using a zigzag encoding. Zigzag encoding moves the sign bit to
+the least significant bit using the expression (val « 1) ^ (val »
+63) and derives its name from the fact that positive and negative
+numbers alternate once encoded. The unsigned number is then serialized
+as above.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Signed Original</th>
+      <th style="text-align: left">Unsigned</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-1</td>
+      <td style="text-align: left">1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-2</td>
+      <td style="text-align: left">3</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">4</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="byte-run-length-encoding">Byte Run Length Encoding</h2>
+
+<p>For byte streams, ORC uses a very light weight encoding of identical
+values.</p>
+
+<ul>
+  <li>Run - a sequence of at least 3 identical values</li>
+  <li>Literals - a sequence of non-identical values</li>
+</ul>
+
+<p>The first byte of each group of values is a header than determines
+whether it is a run (value between 0 to 127) or literal list (value
+between -128 to -1). For runs, the control byte is the length of the
+run minus the length of the minimal run (3) and the control byte for
+literal lists is the negative length of the list. For example, a
+hundred 0’s is encoded as [0x61, 0x00] and the sequence 0x44, 0x45
+would be encoded as [0xfe, 0x44, 0x45]. The next group can choose
+either of the encodings.</p>
+
+<h2 id="boolean-run-length-encoding">Boolean Run Length Encoding</h2>
+
+<p>For encoding boolean types, the bits are put in the bytes from most
+significant to least significant. The bytes are encoded using byte run
+length encoding as described in the previous section. For example,
+the byte sequence [0xff, 0x80] would be one true followed by
+seven false values.</p>
+
+<h2 id="integer-run-length-encoding-version-1">Integer Run Length Encoding, version 1</h2>
+
+<p>In Hive 0.11 ORC files used Run Length Encoding version 1 (RLEv1),
+which provides a lightweight compression of signed or unsigned integer
+sequences. RLEv1 has two sub-encodings:</p>
+
+<ul>
+  <li>Run - a sequence of values that differ by a small fixed delta</li>
+  <li>Literals - a sequence of varint encoded values</li>
+</ul>
+
+<p>Runs start with an initial byte of 0x00 to 0x7f, which encodes the
+length of the run - 3. A second byte provides the fixed delta in the
+range of -128 to 127. Finally, the first value of the run is encoded
+as a base 128 varint.</p>
+
+<p>For example, if the sequence is 100 instances of 7 the encoding would
+start with 100 - 3, followed by a delta of 0, and a varint of 7 for
+an encoding of [0x61, 0x00, 0x07]. To encode the sequence of numbers
+running from 100 to 1, the first byte is 100 - 3, the delta is -1,
+and the varint is 100 for an encoding of [0x61, 0xff, 0x64].</p>
+
+<p>Literals start with an initial byte of 0x80 to 0xff, which corresponds
+to the negative of number of literals in the sequence. Following the
+header byte, the list of N varints is encoded. Thus, if there are
+no runs, the overhead is 1 byte for each 128 integers. The first 5
+prime numbers [2, 3, 4, 7, 11] would encoded as [0xfb, 0x02, 0x03,
+0x04, 0x07, 0xb].</p>
+
+<h2 id="integer-run-length-encoding-version-2">Integer Run Length Encoding, version 2</h2>
+
+<p>In Hive 0.12, ORC introduced Run Length Encoding version 2 (RLEv2),
+which has improved compression and fixed bit width encodings for
+faster expansion. RLEv2 uses four sub-encodings based on the data:</p>
+
+<ul>
+  <li>Short Repeat - used for short sequences with repeated values</li>
+  <li>Direct - used for random sequences with a fixed bit width</li>
+  <li>Patched Base - used for random sequences with a variable bit width</li>
+  <li>Delta - used for monotonically increasing or decreasing sequences</li>
+</ul>
+
+<h3 id="short-repeat">Short Repeat</h3>
+
+<p>The short repeat encoding is used for short repeating integer
+sequences with the goal of minimizing the overhead of the header. All
+of the bits listed in the header are from the first byte to the last
+and from most significant bit to least significant bit. If the type is
+signed, the value is zigzag encoded.</p>
+
+<ul>
+  <li>1 byte header
+    <ul>
+      <li>2 bits for encoding type (0)</li>
+      <li>3 bits for width (W) of repeating value (1 to 8 bytes)</li>
+      <li>3 bits for repeat count (3 to 10 values)</li>
+    </ul>
+  </li>
+  <li>W bytes in big endian format, which is zigzag encoded if they type
+is signed</li>
+</ul>
+
+<p>The unsigned sequence of [10000, 10000, 10000, 10000, 10000] would be
+serialized with short repeat encoding (0), a width of 2 bytes (1), and
+repeat count of 5 (2) as [0x0a, 0x27, 0x10].</p>
+
+<h3 id="direct">Direct</h3>
+
+<p>The direct encoding is used for integer sequences whose values have a
+relatively constant bit width. It encodes the values directly using a
+fixed width big endian encoding. The width of the values is encoded
+using the table below.</p>
+
+<p>The 5 bit width encoding table for RLEv2:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Width in Bits</th>
+      <th style="text-align: left">Encoded Value</th>
+      <th style="text-align: left">Notes</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">for delta encoding</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">for non-delta encoding</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">4</td>
+      <td style="text-align: left">3</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">8</td>
+      <td style="text-align: left">7</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16</td>
+      <td style="text-align: left">15</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">24</td>
+      <td style="text-align: left">23</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">32</td>
+      <td style="text-align: left">27</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">40</td>
+      <td style="text-align: left">28</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">48</td>
+      <td style="text-align: left">29</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">56</td>
+      <td style="text-align: left">30</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">64</td>
+      <td style="text-align: left">31</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">3</td>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">5 &lt;= x &lt;= 7</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">9 &lt;= x &lt;= 15</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">17 &lt;= x &lt;= 21</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">26</td>
+      <td style="text-align: left">24</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">28</td>
+      <td style="text-align: left">25</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">30</td>
+      <td style="text-align: left">26</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+  </tbody>
+</table>
+
+<ul>
+  <li>2 bytes header
+    <ul>
+      <li>2 bits for encoding type (1)</li>
+      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
+width encoding table</li>
+      <li>9 bits for length (L) (1 to 512 values)</li>
+    </ul>
+  </li>
+  <li>W * L bits (padded to the next byte) encoded in big endian format, which is
+zigzag encoding if the type is signed</li>
+</ul>
+
+<p>The unsigned sequence of [23713, 43806, 57005, 48879] would be
+serialized with direct encoding (1), a width of 16 bits (15), and
+length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
+0xbe, 0xef].</p>
+
+<h3 id="patched-base">Patched Base</h3>
+
+<p>The patched base encoding is used for integer sequences whose bit
+widths varies a lot. The minimum signed value of the sequence is found
+and subtracted from the other values. The bit width of those adjusted
+values is analyzed and the 90 percentile of the bit width is chosen
+as W. The 10\% of values larger than W use patches from a patch list
+to set the additional bits. Patches are encoded as a list of gaps in
+the index values and the additional value bits.</p>
+
+<ul>
+  <li>4 bytes header
+    <ul>
+      <li>2 bits for encoding type (2)</li>
+      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
+  width encoding table</li>
+      <li>9 bits for length (L) (1 to 512 values)</li>
+      <li>3 bits for base value width (BW) (1 to 8 bytes)</li>
+      <li>5 bits for patch width (PW) (1 to 64 bits) using  the 5 bit width
+encoding table</li>
+      <li>3 bits for patch gap width (PGW) (1 to 8 bits)</li>
+      <li>5 bits for patch list length (PLL) (0 to 31 patches)</li>
+    </ul>
+  </li>
+  <li>Base value (BW bytes) - The base value is stored as a big endian value
+with negative values marked by the most significant bit set. If it that
+bit is set, the entire value is negated.</li>
+  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
+values that are added to the base value.</li>
+  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
+values that are added to the base value.</li>
+  <li>Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
+that didn’t fit within W bits. Each entry in the list consists of a
+gap, which is the number of elements skipped from the previous
+patch, and a patch value. Patches are applied by logically or’ing
+the data values with the relevant patch shifted W bits left. If a
+patch is 0, it was introduced to skip over more than 255 items. The
+combined length of each patch (PGW + PW) must be less or equal to
+64.</li>
+</ul>
+
+<p>The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
+2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
+has a minimum of 2000, which makes the adjusted
+sequence [30, 0, 20, 998000, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
+150, 160, 170, 180, 190]. It has an
+encoding of patched base (2), a bit width of 8 (7), a length of 20
+(19), a base value width of 2 bytes (1), a patch width of 12 bits (11),
+patch gap width of 2 bits (1), and a patch list length of 1 (1). The
+base value is 2000 and the combined result is [0x8e, 0x13, 0x2b, 0x21, 0x07,
+0xd0, 0x1e, 0x00, 0x14, 0x70, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e,
+0x78, 0x82, 0x8c, 0x96, 0xa0, 0xaa, 0xb4, 0xbe, 0xfc, 0xe8]</p>
+
+<h3 id="delta">Delta</h3>
+
+<p>The Delta encoding is used for monotonically increasing or decreasing
+sequences. The first two numbers in the sequence can not be identical,
+because the encoding is using the sign of the first delta to determine
+if the series is increasing or decreasing.</p>
+
+<ul>
+  <li>2 bytes header
+    <ul>
+      <li>2 bits for encoding type (3)</li>
+      <li>5 bits for encoded width (W) of deltas (0 to 64 bits) using the 5 bit
+width encoding table</li>
+      <li>9 bits for run length (L) (1 to 512 values)</li>
+    </ul>
+  </li>
+  <li>Base value - encoded as (signed or unsigned) varint</li>
+  <li>Delta base - encoded as signed varint</li>
+  <li>Delta values $W * (L - 2)$ bytes - encode each delta after the first
+one. If the delta base is positive, the sequence is increasing and if it is
+negative the sequence is decreasing.</li>
+</ul>
+
+<p>The unsigned sequence of [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] would be
+serialized with delta encoding (3), a width of 4 bits (3), length of
+10 (9), a base of 2 (2), and first delta of 1 (2). The resulting
+sequence is [0xc6, 0x09, 0x02, 0x02, 0x22, 0x42, 0x42, 0x46].</p>
+
+<h1 id="stripes">Stripes</h1>
+
+<p>The body of ORC files consists of a series of stripes. Stripes are
+large (typically ~200MB) and independent of each other and are often
+processed by different tasks. The defining characteristic for columnar
+storage formats is that the data for each column is stored separately
+and that reading data out of the file should be proportional to the
+number of columns read.</p>
+
+<p>In ORC files, each column is stored in several streams that are stored
+next to each other in the file. For example, an integer column is
+represented as two streams PRESENT, which uses one with a bit per
+value recording if the value is non-null, and DATA, which records the
+non-null values. If all of a column’s values in a stripe are non-null,
+the PRESENT stream is omitted from the stripe. For binary data, ORC
+uses three streams PRESENT, DATA, and LENGTH, which stores the length
+of each value. The details of each type will be presented in the
+following subsections.</p>
+
+<h2 id="stripe-footer">Stripe Footer</h2>
+
+<p>The stripe footer contains the encoding of each column and the
+directory of the streams including their location.</p>
+
+<p><code>message StripeFooter {
+ // the location of each stream
+ repeated Stream streams = 1;
+ // the encoding of each column
+ repeated ColumnEncoding columns = 2;
+}
+</code></p>
+
+<p>To describe each stream, ORC stores the kind of stream, the column id,
+and the stream’s size in bytes. The details of what is stored in each stream
+depends on the type and encoding of the column.</p>
+
+<p><code>message Stream {
+ enum Kind {
+ // boolean stream of whether the next value is non-null
+ PRESENT = 0;
+ // the primary data stream
+ DATA = 1;
+ // the length of each value for variable length data
+ LENGTH = 2;
+ // the dictionary blob
+ DICTIONARY\_DATA = 3;
+ // deprecated prior to Hive 0.11
+ // It was used to store the number of instances of each value in the
+ // dictionary
+ DICTIONARY_COUNT = 4;
+ // a secondary data stream
+ SECONDARY = 5;
+ // the index for seeking to particular row groups
+ ROW_INDEX = 6;
+ // original bloom filters used before ORC-101
+ BLOOM_FILTER = 7;
+ // bloom filters that consistently use utf8
+ BLOOM_FILTER_UTF8 = 8;
+ }
+ required Kind kind = 1;
+ // the column id
+ optional uint32 column = 2;
+ // the number of bytes in the file
+ optional uint64 length = 3;
+}
+</code></p>
+
+<p>Depending on their type several options for encoding are possible. The
+encodings are divided into direct or dictionary-based categories and
+further refined as to whether they use RLE v1 or v2.</p>
+
+<p><code>message ColumnEncoding {
+ enum Kind {
+ // the encoding is mapped directly to the stream using RLE v1
+ DIRECT = 0;
+ // the encoding uses a dictionary of unique values using RLE v1
+ DICTIONARY = 1;
+ // the encoding is direct using RLE v2
+ DIRECT\_V2 = 2;
+ // the encoding is dictionary-based using RLE v2
+ DICTIONARY\_V2 = 3;
+ }
+ required Kind kind = 1;
+ // for dictionary encodings, record the size of the dictionary
+ optional uint32 dictionarySize = 2;
+}
+</code></p>
+
+<h1 id="column-encodings">Column Encodings</h1>
+
+<h2 id="smallint-int-and-bigint-columns">SmallInt, Int, and BigInt Columns</h2>
+
+<p>All of the 16, 32, and 64 bit integer column types use the same set of
+potential encodings, which is basically whether they use RLE v1 or
+v2. If the PRESENT stream is not included, all of the values are
+present. For values that have false bits in the present stream, no
+values are included in the data stream.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="float-and-double-columns">Float and Double Columns</h2>
+
+<p>Floating point types are stored using IEEE 754 floating point bit
+layout. Float columns use 4 bytes per value and double columns use 8
+bytes.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">IEEE 754 floating point representation</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
+
+<p>String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).</p>
+
+<p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
+the length of each value is written into the LENGTH stream. In direct
+encoding, if the values were [“Nevada”, “California”]; the DATA
+would be “NevadaCalifornia” and the LENGTH would be [6, 10].</p>
+
+<p>For dictionary encodings the dictionary is sorted and UTF-8 bytes of
+each unique value are placed into DICTIONARY_DATA. The length of each
+item in the dictionary is put into the LENGTH stream. The DATA stream
+consists of the sequence of references to the dictionary elements.</p>
+
+<p>In dictionary encoding, if the values were [“Nevada”,
+“California”, “Nevada”, “California”, and “Florida”]; the
+DICTIONARY_DATA would be “CaliforniaFloridaNevada” and LENGTH would
+be [10, 7, 6]. The DATA would be [2, 0, 2, 0, 1].</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DICTIONARY</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DICTIONARY_DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DICTIONARY_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DICTIONARY_DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="boolean-columns">Boolean Columns</h2>
+
+<p>Boolean columns are rare, but have a simple encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="tinyint-columns">TinyInt Columns</h2>
+
+<p>TinyInt (byte) columns use byte run length encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="binary-columns">Binary Columns</h2>
+
+<p>Binary data is encoded with a PRESENT stream, a DATA stream that records
+the contents, and a LENGTH stream that records the number of bytes per a
+value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="decimal-columns">Decimal Columns</h2>
+
+<p>Decimal was introduced in Hive 0.11 with infinite precision (the total
+number of digits). In Hive 0.13, the definition was change to limit
+the precision to a maximum of 38 digits, which conveniently uses 127
+bits plus a sign bit. The current encoding of decimal columns stores
+the integer representation of the value as an unbounded length zigzag
+encoded base 128 varint. The scale is stored in the SECONDARY stream
+as an signed integer.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unbounded base 128 varints</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unbounded base 128 varints</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="date-columns">Date Columns</h2>
+
+<p>Date data is encoded with a PRESENT stream, a DATA stream that records
+the number of days after January 1, 1970 in UTC.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="timestamp-columns">Timestamp Columns</h2>
+
+<p>Timestamp records times down to nanoseconds as a PRESENT stream that
+records non-null values, a DATA stream that records the number of
+seconds after 1 January 2015, and a SECONDARY stream that records the
+number of nanoseconds.</p>
+
+<p>Because the number of nanoseconds often has a large number of trailing
+zeros, the number has trailing decimal zero digits removed and the
+last three bits are used to record how many zeros were removed. Thus
+1000 nanoseconds would be serialized as 0x0b and 100000 would be
+serialized as 0x0d.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="struct-columns">Struct Columns</h2>
+
+<p>Structs have no data themselves and delegate everything to their child
+columns except for their PRESENT stream. They have a child column
+for each of the fields.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="list-columns">List Columns</h2>
+
+<p>Lists are encoded as the PRESENT stream and a length stream with
+number of items in each list. They have a single child column for the
+element values.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="map-columns">Map Columns</h2>
+
+<p>Maps are encoded as the PRESENT stream and a length stream with number
+of items in each list. They have a child column for the key and
+another child column for the value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="union-columns">Union Columns</h2>
+
+<p>Unions are encoded as the PRESENT stream and a tag stream that controls which
+potential variant is used. They have a child column for each variant of the
+union. Currently ORC union types are limited to 256 variants, which matches
+the Hive type model.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h1 id="indexes">Indexes</h1>
+
+<h2 id="row-group-index">Row Group Index</h2>
+
+<p>The row group indexes consist of a ROW_INDEX stream for each primitive
+column that has an entry for each row group. Row groups are controlled
+by the writer and default to 10,000 rows. Each RowIndexEntry gives the
+position of each stream for the column and the statistics for that row
+group.</p>
+
+<p>The index streams are placed at the front of the stripe, because in
+the default case of streaming they do not need to be read. They are
+only loaded when either predicate push down is being used or the
+reader seeks to a particular row.</p>
+
+<p><code>message RowIndexEntry {
+ repeated uint64 positions = 1 [packed=true];
+ optional ColumnStatistics statistics = 2;
+}
+</code></p>
+
+<p><code>message RowIndex {
+ repeated RowIndexEntry entry = 1;
+}
+</code></p>
+
+<p>To record positions, each stream needs a sequence of numbers. For
+uncompressed streams, the position is the byte offset of the RLE run’s
+start location followed by the number of values that need to be
+consumed from the run. In compressed streams, the first number is the
+start of the compression chunk in the stream, followed by the number
+of decompressed bytes that need to be consumed, and finally the number
+of values consumed in the RLE.</p>
+
+<p>For columns with multiple streams, the sequences of positions in each
+stream are concatenated. That was an unfortunate decision on my part
+that we should fix at some point, because it makes code that uses the
+indexes error-prone.</p>
+
+<p>Because dictionaries are accessed randomly, there is not a position to
+record for the dictionary and the entire dictionary must be read even
+if only part of a stripe is being read.</p>
+
+<h2 id="bloom-filter-index">Bloom Filter Index</h2>
+
+<p>Bloom Filters are added to ORC indexes from Hive 1.2.0 onwards.
+Predicate pushdown can make use of bloom filters to better prune
+the row groups that do not satisfy the filter condition.
+The bloom filter indexes consist of a BLOOM_FILTER stream for each
+column specified through ‘orc.bloom.filter.columns’ table properties.
+A BLOOM_FILTER stream records a bloom filter entry for each row
+group (default to 10,000 rows) in a column. Only the row groups that
+satisfy min/max row index evaluation will be evaluated against the
+bloom filter index.</p>
+
+<p>Each BloomFilterEntry stores the number of hash functions (‘k’) used
+and the bitset backing the bloom filter. The original encoding (pre
+ORC-101) of bloom filters used the bitset field encoded as a repeating
+sequence of longs in the bitset field with a little endian encoding
+(0x1 is bit 0 and 0x2 is bit 1.) After ORC-101, the encoding is a
+sequence of bytes with a little endian encoding in the utf8bitset field.</p>
+
+<p><code>message BloomFilter {
+ optional uint32 numHashFunctions = 1;
+ repeated fixed64 bitset = 2;
+ optional bytes utf8bitset = 3;
+}
+</code></p>
+
+<p><code>message BloomFilterIndex {
+ repeated BloomFilter bloomFilter = 1;
+}
+</code></p>
+
+<p>Bloom filter internally uses two different hash functions to map a key
+to a position in the bit set. For tinyint, smallint, int, bigint, float
+and double types, Thomas Wang’s 64-bit integer hash function is used.
+Floats are converted to IEEE-754 32 bit representation
+(using Java’s Float.floatToIntBits(float)). Similary, Doubles are
+converted to IEEE-754 64 bit representation (using Java’s
+Double.doubleToLongBits(double)). All these primitive types
+are cast to long base type before being passed on to the hash function.
+For strings and binary types, Murmur3 64 bit hash algorithm is used.
+The 64 bit variant of Murmur3 considers only the most significant
+8 bytes of Murmur3 128-bit algorithm. The 64 bit hashcode generated
+from the above algorithms is used as a base to derive ‘k’ different
+hash functions. We use the idea mentioned in the paper “Less Hashing,
+Same Performance: Building a Better Bloom Filter” by Kirsch et. al. to
+quickly compute the k hashcodes.</p>
+
+<p>The algorithm for computing k hashcodes and setting the bit position
+in a bloom filter is as follows:</p>
+
+<ol>
+  <li>Get 64 bit base hash code from Murmur3 or Thomas Wang’s hash algorithm.</li>
+  <li>Split the above hashcode into two 32-bit hashcodes (say hash1 and hash2).</li>
+  <li>k’th hashcode is obtained by (where k &gt; 0):
+    <ul>
+      <li>combinedHash = hash1 + (k * hash2)</li>
+    </ul>
+  </li>
+  <li>If combinedHash is negative flip all the bits:
+    <ul>
+      <li>combinedHash = ~combinedHash</li>
+    </ul>
+  </li>
+  <li>Bit set position is obtained by performing modulo with m:
+    <ul>
+      <li>position = combinedHash % m</li>
+    </ul>
+  </li>
+  <li>Set the position in bit set. The LSB 6 bits identifies the long index
+within bitset and bit position within the long uses little endian order.
+    <ul>
+      <li>bitset[position »&gt; 6] |= (1L « position);</li>
+    </ul>
+  </li>
+</ol>
+
+<p>Bloom filter streams are interlaced with row group indexes. This placement
+makes it convenient to read the bloom filter stream and row index stream
+together in single read operation.</p>
+
+<p><img src="/img/BloomFilter.png" alt="bloom filter" /></p>
+
+      </article>
+    </div>
+
+    <div class="clear"></div>
+
+  </div>
+</section>
+
+
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2018
+     <a href="https://www.apache.org/">Apache Software Foundation</a>
+     under the terms of the <a
+      href="https://www.apache.org/licenses/LICENSE-2.0.html">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+</footer>
+
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+
+      if (typeof header.id !== "undefined" && header.id !== "") {
+        header.appendChild(anchorForId(header.id));
+      }
+    }
+  };
+
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
+</script>
+
+
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/specification/index.html
----------------------------------------------------------------------
diff --git a/specification/index.html b/specification/index.html
new file mode 100644
index 0000000..3c3a5fe
--- /dev/null
+++ b/specification/index.html
@@ -0,0 +1,159 @@
+<!DOCTYPE HTML>
+<html lang="en-US">
+<head>
+  <meta charset="UTF-8">
+  <title>ORC Specification</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+</head>
+
+
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+    </nav>
+  </div>
+</header>
+
+
+  <section class="standalone">
+  <div class="grid">
+
+    <div class="unit whole">
+      <article>
+        <h1>ORC Specification</h1>
+        <p>There have been two released ORC file versions:</p>
+
+<ul>
+  <li><a href="ORCv0.html">ORC v0</a> was released in Hive 0.11.</li>
+  <li><a href="ORCv1.html">ORC v1</a> was released in Hive 0.12 and ORC 1.x.</li>
+</ul>
+
+<p>Each version of the library will detect the format version and use
+the appropriate reader. The library can also write the older versions
+of the file format to ensure that users can write files that all of their
+clusters can read correctly.</p>
+
+<p>We are working on a new version of the file format:</p>
+
+<ul>
+  <li><a href="ORCv2.html">ORC v2</a> is a work in progress and is rapidly evolving.</li>
+</ul>
+
+      </article>
+    </div>
+
+    <div class="clear"></div>
+
+  </div>
+</section>
+
+
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2018
+     <a href="https://www.apache.org/">Apache Software Foundation</a>
+     under the terms of the <a
+      href="https://www.apache.org/licenses/LICENSE-2.0.html">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+</footer>
+
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+
+      if (typeof header.id !== "undefined" && header.id !== "") {
+        header.appendChild(anchorForId(header.id));
+      }
+    }
+  };
+
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
+</script>
+
+
+</body>
+</html>


[6/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/hive-ddl.html
----------------------------------------------------------------------
diff --git a/docs/hive-ddl.html b/docs/hive-ddl.html
index 0da9356..8c360d3 100644
--- a/docs/hive-ddl.html
+++ b/docs/hive-ddl.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,695 +663,104 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
+
+
+      <div class="unit four-fifths">
+        <article>
+          <h1>Hive DDL</h1>
+          <p>ORC is well integrated into Hive, so storing your istari table as ORC
+is done by adding “STORED AS ORC”.</p>
+
+<p><code>CREATE TABLE istari (
+  name STRING,
+  color STRING
+) STORED AS ORC;
+</code></p>
+
+<p>To modify a table so that new partitions of the istari table are
+stored as ORC files:</p>
+
+<p><code>ALTER TABLE istari SET FILEFORMAT ORC;
+</code></p>
+
+<p>As of Hive 0.14, users can request an efficient merge of small ORC files
+together by issuing a CONCATENATE command on their table or partition. The
+files will be merged at the stripe level without reserialization.</p>
+
+<p><code>ALTER TABLE istari [PARTITION partition_spec] CONCATENATE;
+</code></p>
+
+<p>To get information about an ORC file, use the orcfiledump command.</p>
+
+<p><code>% hive --orcfiledump &lt;path_to_file&gt;
+</code></p>
+
+<p>As of Hive 1.1, to display the data in the ORC file, use:</p>
+
+<p><code>% hive --orcfiledump -d &lt;path_to_file&gt;
+</code></p>
+
+          
+
+
+
 
 
   
+  
 
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Hive DDL</h1>
-          <p>ORC is well integrated into Hive, so storing your istari table as ORC
-is done by adding “STORED AS ORC”.</p>
-
-<p><code>CREATE TABLE istari (
-  name STRING,
-  color STRING
-) STORED AS ORC;
-</code></p>
-
-<p>To modify a table so that new partitions of the istari table are
-stored as ORC files:</p>
-
-<p><code>ALTER TABLE istari SET FILEFORMAT ORC;
-</code></p>
-
-<p>As of Hive 0.14, users can request an efficient merge of small ORC files
-together by issuing a CONCATENATE command on their table or partition. The
-files will be merged at the stripe level without reserialization.</p>
-
-<p><code>ALTER TABLE istari [PARTITION partition_spec] CONCATENATE;
-</code></p>
-
-<p>To get information about an ORC file, use the orcfiledump command.</p>
-
-<p><code>% hive --orcfiledump &lt;path_to_file&gt;
-</code></p>
-
-<p>As of Hive 1.1, to display the data in the ORC file, use:</p>
-
-<p><code>% hive --orcfiledump -d &lt;path_to_file&gt;
-</code></p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/releases.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/hive-config.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
   
 
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/releases.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/hive-config.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1582,11 +789,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-      <li class="current"><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1600,34 +803,10 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1664,7 +843,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1694,49 +873,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1748,22 +885,14 @@ files will be merged at the stripe level without reserialization.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1780,15 +909,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1826,14 +947,14 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1860,31 +981,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class="current"><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1908,31 +1005,17 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -1964,19 +1047,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2012,13 +1083,25 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2028,7 +1111,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2046,17 +1129,17 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2076,11 +1159,17 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2102,7 +1191,7 @@ files will be merged at the stripe level without reserialization.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/index.html
----------------------------------------------------------------------
diff --git a/docs/index.html b/docs/index.html
index 6014e66..0d344dc 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,9 +160,9 @@
   
     
   
-    
+
   
-    
+
   
     
   
@@ -188,9 +174,9 @@
   
     
   
-
+    
   
-
+    
   
     
   
@@ -207,6 +193,12 @@
     
   
     
+      <option value="/docs/types.html">Types</option>
+    
+  
+
+  
+
   
     
   
@@ -227,6 +219,8 @@
     
   
     
+      <option value="/docs/indexes.html">Indexes</option>
+    
   
     
   
@@ -235,7 +229,7 @@
     
   
     
-      <option value="/docs/types.html">Types</option>
+  
     
   
 
@@ -243,7 +237,7 @@
 
   
     
-  
+      <option value="/docs/acid.html">ACID support</option>
     
   
     
@@ -267,8 +261,6 @@
     
   
     
-      <option value="/docs/indexes.html">Indexes</option>
-    
   
     
   
@@ -276,24 +268,35 @@
   
     
   
+
+
+    </optgroup>
     
+    <optgroup label="Installing">
+      
+
+
+  
+
   
     
   
     
   
     
+      <option value="/docs/building.html">Building ORC</option>
+    
   
     
   
     
   
-
+    
   
-
+    
   
     
-      <option value="/docs/acid.html">ACID support</option>
+  
     
   
     
@@ -308,6 +311,10 @@
   
     
   
+
+  
+
+  
     
   
     
@@ -335,6 +342,8 @@
     
   
     
+      <option value="/docs/releases.html">Releases</option>
+    
   
     
   
@@ -342,7 +351,7 @@
 
     </optgroup>
     
-    <optgroup label="Installing">
+    <optgroup label="Using in Hive">
       
 
 
@@ -354,8 +363,6 @@
     
   
     
-      <option value="/docs/building.html">Building ORC</option>
-    
   
     
   
@@ -366,7 +373,7 @@
     
   
     
-  
+      <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
     
@@ -383,7 +390,9 @@
   
     
   
-    
+
+  
+
   
     
   
@@ -395,9 +404,11 @@
   
     
   
-
+    
   
-
+    
+      <option value="/docs/hive-config.html">Hive Configuration</option>
+    
   
     
   
@@ -415,7 +426,16 @@
   
     
   
+
+
+    </optgroup>
     
+    <optgroup label="Using in MapReduce">
+      
+
+
+  
+
   
     
   
@@ -432,7 +452,7 @@
     
   
     
-      <option value="/docs/releases.html">Releases</option>
+  
     
   
     
@@ -440,18 +460,15 @@
     
   
     
+      <option value="/docs/mapred.html">Using in MapRed</option>
+    
   
     
   
     
   
-
-
-    </optgroup>
     
-    <optgroup label="Using in Hive">
-      
-
+  
 
   
 
@@ -477,20 +494,27 @@
     
   
     
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
   
     
   
     
-  
+      <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
     
   
     
   
+
+
+    </optgroup>
     
+    <optgroup label="Using ORC Core">
+      
+
+
+  
+
   
     
   
@@ -501,10 +525,12 @@
     
   
     
+      <option value="/docs/core-java.html">Using Core Java</option>
+    
   
-
+    
   
-
+    
   
     
   
@@ -522,13 +548,19 @@
   
     
   
+
+  
+
+  
     
   
     
-      <option value="/docs/hive-config.html">Hive Configuration</option>
+  
     
   
     
+      <option value="/docs/core-cpp.html">Using Core C++</option>
+    
   
     
   
@@ -556,7 +588,7 @@
 
     </optgroup>
     
-    <optgroup label="Using in MapReduce">
+    <optgroup label="Tools">
       
 
 
@@ -574,7 +606,7 @@
     
   
     
-  
+      <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
     
@@ -592,14 +624,12 @@
     
   
     
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
   
     
   
-    
+
   
-    
+
   
     
   
@@ -609,10 +639,6 @@
   
     
   
-
-  
-
-  
     
   
     
@@ -626,7 +652,7 @@
     
   
     
-  
+      <option value="/docs/java-tools.html">Java Tools</option>
     
   
     
@@ -637,14 +663,94 @@
   
     
   
+
+
+    </optgroup>
     
+  </select>
+</div>
+
+
+      <div class="unit four-fifths">
+        <article>
+          <h1>Background</h1>
+          <p>Back in January 2013, we created ORC files as part of the initiative
+to massively speed up Apache Hive and improve the storage efficiency
+of data stored in Apache Hadoop. The focus was on enabling high speed
+processing and reducing file sizes.</p>
+
+<p>ORC is a self-describing type-aware columnar file format designed for 
+Hadoop workloads. It is optimized for large streaming reads, but with
+integrated support for finding required rows quickly. Storing data in
+a columnar format lets the reader read, decompress, and process only
+the values that are required for the current query. Because ORC files
+are type-aware, the writer chooses the most appropriate encoding for
+the type and builds an internal index as the file is written.</p>
+
+<p>Predicate pushdown uses those indexes to determine which stripes in a
+file need to be read for a particular query and the row indexes can
+narrow the search to a particular set of 10,000 rows. ORC supports the
+complete set of types in Hive, including the complex types: structs,
+lists, maps, and unions.</p>
+
+<p>Many large Hadoop users have adopted ORC. For instance, Facebook uses
+ORC to <a href="https://s.apache.org/fb-scaling-300-pb">save tens of petabytes</a>
+in their data warehouse and demonstrated that ORC is <a href="https://s.apache.org/presto-orc">significantly
+faster</a> than RC File or Parquet. Yahoo
+uses ORC to store their production data and has released some of their
+<a href="https://s.apache.org/yahoo-orc">benchmark results</a>.</p>
+
+<p>ORC files are divided in to <em>stripes</em> that are roughly 64MB by
+default. The stripes in a file are independent of each other and form
+the natural unit of distributed work. Within each stripe, the columns
+are separated from each other so the reader can read just the columns
+that are required.</p>
+
+          
+
+
+
+
+
   
+  
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            <span class="prev disabled">Back</span>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/adopters.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
+    
+
+        </article>
+      </div>
+
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
+    
+    <h4>Overview</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
+  
     
   
     
@@ -659,11 +765,8 @@
   
     
   
-
-
-    </optgroup>
     
-    <optgroup label="Using ORC Core">
+      <li class="current"><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -672,19 +775,21 @@
   
     
   
-    
+
   
     
   
     
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
+      
+
+
   
-    
+
   
     
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
   
-    
+
   
     
   
@@ -715,22 +820,20 @@
     
   
     
-  
+      <li class=""><a href="/docs/types.html">Types</a></li>
+      
 
-  
 
   
-    
+
   
     
   
-    
+
   
     
   
     
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
   
     
   
@@ -747,860 +850,26 @@
     
   
     
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      
+
+
   
-    
+
   
     
   
-    
+
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Background</h1>
-          <p>Back in January 2013, we created ORC files as part of the initiative
-to massively speed up Apache Hive and improve the storage efficiency
-of data stored in Apache Hadoop. The focus was on enabling high speed
-processing and reducing file sizes.</p>
-
-<p>ORC is a self-describing type-aware columnar file format designed for 
-Hadoop workloads. It is optimized for large streaming reads, but with
-integrated support for finding required rows quickly. Storing data in
-a columnar format lets the reader read, decompress, and process only
-the values that are required for the current query. Because ORC files
-are type-aware, the writer chooses the most appropriate encoding for
-the type and builds an internal index as the file is written.</p>
-
-<p>Predicate pushdown uses those indexes to determine which stripes in a
-file need to be read for a particular query and the row indexes can
-narrow the search to a particular set of 10,000 rows. ORC supports the
-complete set of types in Hive, including the complex types: structs,
-lists, maps, and unions.</p>
-
-<p>Many large Hadoop users have adopted ORC. For instance, Facebook uses
-ORC to <a href="https://s.apache.org/fb-scaling-300-pb">save tens of petabytes</a>
-in their data warehouse and demonstrated that ORC is <a href="https://s.apache.org/presto-orc">significantly
-faster</a> than RC File or Parquet. Yahoo
-uses ORC to store their production data and has released some of their
-<a href="https://s.apache.org/yahoo-orc">benchmark results</a>.</p>
-
-<p>ORC files are divided in to <em>stripes</em> that are roughly 64MB by
-default. The stripes in a file are independent of each other and form
-the natural unit of distributed work. Within each stripe, the columns
-are separated from each other so the reader can read just the columns
-that are required.</p>
-
-          
-
-
-
-
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            <span class="prev disabled">Back</span>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/adopters.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Using in MapReduce</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1617,31 +886,7 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1679,18 +924,14 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Using ORC Core</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1713,59 +954,11 @@ that are required.</p>
     
   
     
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
   
     
   
     
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1789,28 +982,14 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in MapReduce</h4>
     
 
 <ul>
@@ -1845,23 +1024,7 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -1889,16 +1052,6 @@ that are required.</p>
     
   
     
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
   
     
   
@@ -1907,42 +1060,24 @@ that are required.</p>
     
   
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
   
-    
+
   
     
   
-    
+
   
     
   
@@ -1953,7 +1088,7 @@ that are required.</p>
     
   
     
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -1971,54 +1106,24 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
       
 
 
-  
+</ul>
 
-  
     
+    <h4>Tools</h4>
+    
+
+<ul>
+
   
 
   
     
   
-    
+
   
     
   
@@ -2031,7 +1136,7 @@ that are required.</p>
     
   
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
       
 
 
@@ -2063,23 +1168,7 @@ that are required.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/indexes.html
----------------------------------------------------------------------
diff --git a/docs/indexes.html b/docs/indexes.html
index 5654a47..0a81f43 100644
--- a/docs/indexes.html
+++ b/docs/indexes.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,680 +663,89 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
+
+
+      <div class="unit four-fifths">
+        <article>
+          <h1>Indexes</h1>
+          <p>ORC provides three level of indexes within each file:</p>
+
+<ul>
+  <li>file level - statistics about the values in each column across the entire 
+file</li>
+  <li>stripe level - statistics about the values in each column for each stripe</li>
+  <li>row level - statistics about the values in each column for each set of
+10,000 rows within a stripe</li>
+</ul>
+
+<p>The file and stripe level column statistics are in the file footer so
+that they are easy to access to determine if the rest of the file
+needs to be read at all. Row level indexes include both the column
+statistics for each row group and the position for seeking to the
+start of the row group.</p>
+
+<p>Column statistics always contain the count of values and whether there
+are null values present. Most other primitive types include the
+minimum and maximum values and for numeric types the sum. As of Hive
+1.2, the indexes can include bloom filters, which provide a much more
+selective filter.</p>
+
+<p>The indexes at all levels are used by the reader using Search
+ARGuments or SARGs, which are simplified expressions that restrict the
+rows that are of interest. For example, if a query was looking for
+people older than 100 years old, the SARG would be “age &gt; 100” and
+only files, stripes, or row groups that had people over 100 years old
+would be read.</p>
+
+          
 
 
+
+
+
+  
   
 
   
-    
   
-    
+
   
-    
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Indexes</h1>
-          <p>ORC provides three level of indexes within each file:</p>
-
-<ul>
-  <li>file level - statistics about the values in each column across the entire 
-file</li>
-  <li>stripe level - statistics about the values in each column for each stripe</li>
-  <li>row level - statistics about the values in each column for each set of
-10,000 rows within a stripe</li>
-</ul>
-
-<p>The file and stripe level column statistics are in the file footer so
-that they are easy to access to determine if the rest of the file
-needs to be read at all. Row level indexes include both the column
-statistics for each row group and the position for seeking to the
-start of the row group.</p>
-
-<p>Column statistics always contain the count of values and whether there
-are null values present. Most other primitive types include the
-minimum and maximum values and for numeric types the sum. As of Hive
-1.2, the indexes can include bloom filters, which provide a much more
-selective filter.</p>
-
-<p>The indexes at all levels are used by the reader using Search
-ARGuments or SARGs, which are simplified expressions that restrict the
-rows that are of interest. For example, if a query was looking for
-people older than 100 years old, the SARG would be “age &gt; 100” and
-only files, stripes, or row groups that had people over 100 years old
-would be read.</p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/types.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/acid.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/types.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/acid.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1567,11 +774,7 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1585,34 +788,10 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1649,7 +828,7 @@ would be read.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1679,49 +858,7 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class="current"><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1733,22 +870,14 @@ would be read.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1765,15 +894,7 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1811,14 +932,14 @@ would be read.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1845,31 +966,7 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1893,31 +990,17 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -1949,19 +1032,7 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -1997,13 +1068,25 @@ would be read.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2013,7 +1096,7 @@ would be read.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2031,17 +1114,17 @@ would be read.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2061,11 +1144,17 @@ would be read.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2087,7 +1176,7 @@ would be read.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/java-tools.html
----------------------------------------------------------------------
diff --git a/docs/java-tools.html b/docs/java-tools.html
index 7d38769..25efb43 100644
--- a/docs/java-tools.html
+++ b/docs/java-tools.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,992 +663,329 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
+      <div class="unit four-fifths">
+        <article>
+          <h1>Java Tools</h1>
+          <p>In addition to the C++ tools, there is an ORC tools jar that packages
+several useful utilities and the necessary Java dependencies
+(including Hadoop) into a single package. The Java ORC tool jar
+supports both the local file system and HDFS.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Java Tools</h1>
-          <p>In addition to the C++ tools, there is an ORC tools jar that packages
-several useful utilities and the necessary Java dependencies
-(including Hadoop) into a single package. The Java ORC tool jar
-supports both the local file system and HDFS.</p>
-
-<p>The subcommands for the tools are:</p>
-
-<ul>
-  <li>meta - print the metadata of an ORC file</li>
-  <li>data - print the data of an ORC file</li>
-  <li>scan (since ORC 1.3) - scan the data for benchmarking</li>
-  <li>convert (since ORC 1.4) - convert JSON files to ORC</li>
-  <li>json-schema (since ORC 1.4) - determine the schema of JSON documents</li>
-</ul>
-
-<p>The command line looks like:</p>
-
-<pre><code class="language-shell">% java -jar orc-tools-X.Y.Z-uber.jar &lt;sub-command&gt; &lt;args&gt;
-</code></pre>
-
-<h2 id="java-meta">Java Meta</h2>
-
-<p>The meta command prints the metadata about the given ORC file and is
-equivalent to the Hive ORC File Dump command.</p>
-
-<dl>
-  <dt>-j</dt>
-  <dd>format the output in JSON</dd>
-  <dt>-p</dt>
-  <dd>pretty print the output</dd>
-  <dt>-t</dt>
-  <dd>print the timezone of the writer</dd>
-  <dt>–rowindex</dt>
-  <dd>print the row indexes for the comma separated list of column ids</dd>
-  <dt>–recover</dt>
-  <dd>skip over corrupted values in the ORC file</dd>
-  <dt>–skip-dump</dt>
-  <dd>skip dumping the metadata</dd>
-  <dt>–backup-path</dt>
-  <dd>when used with –recover specifies the path where the recovered file is written</dd>
-</dl>
-
-<p>An example of the output is given below:</p>
-
-<pre><code class="language-shell">% java -jar orc-tools-X.Y.Z-uber.jar meta examples/TestOrcFile.test1.orc
-Processing data file examples/TestOrcFile.test1.orc [length: 1711]
-Structure for examples/TestOrcFile.test1.orc
-File Version: 0.12 with HIVE_8732
-Rows: 2
-Compression: ZLIB
-Compression size: 10000
-Type: struct&lt;boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,
-long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,
-middle:struct&lt;list:array&lt;struct&lt;int1:int,string1:string&gt;&gt;&gt;,list:array&lt;
-struct&lt;int1:int,string1:string&gt;&gt;,map:map&lt;string,struct&lt;int1:int,string1:
-string&gt;&gt;&gt;
-
-Stripe Statistics:
-  Stripe 1:
-    Column 0: count: 2 hasNull: false
-    Column 1: count: 2 hasNull: false true: 1
-    Column 2: count: 2 hasNull: false min: 1 max: 100 sum: 101
-    Column 3: count: 2 hasNull: false min: 1024 max: 2048 sum: 3072
-    Column 4: count: 2 hasNull: false min: 65536 max: 65536 sum: 131072
-    Column 5: count: 2 hasNull: false min: 9223372036854775807 max: 9223372036854775807
-    Column 6: count: 2 hasNull: false min: 1.0 max: 2.0 sum: 3.0
-    Column 7: count: 2 hasNull: false min: -15.0 max: -5.0 sum: -20.0
-    Column 8: count: 2 hasNull: false sum: 5
-    Column 9: count: 2 hasNull: false min: bye max: hi sum: 5
-    Column 10: count: 2 hasNull: false
-    Column 11: count: 2 hasNull: false
-    Column 12: count: 4 hasNull: false
-    Column 13: count: 4 hasNull: false min: 1 max: 2 sum: 6
-    Column 14: count: 4 hasNull: false min: bye max: sigh sum: 14
-    Column 15: count: 2 hasNull: false
-    Column 16: count: 5 hasNull: false
-    Column 17: count: 5 hasNull: false min: -100000 max: 100000000 sum: 99901241
-    Column 18: count: 5 hasNull: false min: bad max: in sum: 15
-    Column 19: count: 2 hasNull: false
-    Column 20: count: 2 hasNull: false min: chani max: mauddib sum: 12
-    Column 21: count: 2 hasNull: false
-    Column 22: count: 2 hasNull: false min: 1 max: 5 sum: 6
-    Column 23: count: 2 hasNull: false min: chani max: mauddib sum: 12
-
-File Statistics:
-  Column 0: count: 2 hasNull: false
-  Column 1: count: 2 hasNull: false true: 1
-  Column 2: count: 2 hasNull: false min: 1 max: 100 sum: 101
-  Column 3: count: 2 hasNull: false min: 1024 max: 2048 sum: 3072
-  Column 4: count: 2 hasNull: false min: 65536 max: 65536 sum: 131072
-  Column 5: count: 2 hasNull: false min: 9223372036854775807 max: 9223372036854775807
-  Column 6: count: 2 hasNull: false min: 1.0 max: 2.0 sum: 3.0
-  Column 7: count: 2 hasNull: false min: -15.0 max: -5.0 sum: -20.0
-  Column 8: count: 2 hasNull: false sum: 5
-  Column 9: count: 2 hasNull: false min: bye max: hi sum: 5
-  Column 10: count: 2 hasNull: false
-  Column 11: count: 2 hasNull: false
-  Column 12: count: 4 hasNull: false
-  Column 13: count: 4 hasNull: false min: 1 max: 2 sum: 6
-  Column 14: count: 4 hasNull: false min: bye max: sigh sum: 14
-  Column 15: count: 2 hasNull: false
-  Column 16: count: 5 hasNull: false
-  Column 17: count: 5 hasNull: false min: -100000 max: 100000000 sum: 99901241
-  Column 18: count: 5 hasNull: false min: bad max: in sum: 15
-  Column 19: count: 2 hasNull: false
-  Column 20: count: 2 hasNull: false min: chani max: mauddib sum: 12
-  Column 21: count: 2 hasNull: false
-  Column 22: count: 2 hasNull: false min: 1 max: 5 sum: 6
-  Column 23: count: 2 hasNull: false min: chani max: mauddib sum: 12
-
-Stripes:
-  Stripe: offset: 3 data: 243 rows: 2 tail: 199 index: 570
-    Stream: column 0 section ROW_INDEX start: 3 length 11
-    Stream: column 1 section ROW_INDEX start: 14 length 22
-    Stream: column 2 section ROW_INDEX start: 36 length 26
-    Stream: column 3 section ROW_INDEX start: 62 length 27
-    Stream: column 4 section ROW_INDEX start: 89 length 30
-    Stream: column 5 section ROW_INDEX start: 119 length 28
-    Stream: column 6 section ROW_INDEX start: 147 length 34
-    Stream: column 7 section ROW_INDEX start: 181 length 34
-    Stream: column 8 section ROW_INDEX start: 215 length 21
-    Stream: column 9 section ROW_INDEX start: 236 length 30
-    Stream: column 10 section ROW_INDEX start: 266 length 11
-    Stream: column 11 section ROW_INDEX start: 277 length 16
-    Stream: column 12 section ROW_INDEX start: 293 length 11
-    Stream: column 13 section ROW_INDEX start: 304 length 24
-    Stream: column 14 section ROW_INDEX start: 328 length 31
-    Stream: column 15 section ROW_INDEX start: 359 length 16
-    Stream: column 16 section ROW_INDEX start: 375 length 11
-    Stream: column 17 section ROW_INDEX start: 386 length 32
-    Stream: column 18 section ROW_INDEX start: 418 length 30
-    Stream: column 19 section ROW_INDEX start: 448 length 16
-    Stream: column 20 section ROW_INDEX start: 464 length 37
-    Stream: column 21 section ROW_INDEX start: 501 length 11
-    Stream: column 22 section ROW_INDEX start: 512 length 24
-    Stream: column 23 section ROW_INDEX start: 536 length 37
-    Stream: column 1 section DATA start: 573 length 5
-    Stream: column 2 section DATA start: 578 length 6
-    Stream: column 3 section DATA start: 584 length 9
-    Stream: column 4 section DATA start: 593 length 11
-    Stream: column 5 section DATA start: 604 length 12
-    Stream: column 6 section DATA start: 616 length 11
-    Stream: column 7 section DATA start: 627 length 15
-    Stream: column 8 section DATA start: 642 length 8
-    Stream: column 8 section LENGTH start: 650 length 6
-    Stream: column 9 section DATA start: 656 length 8
-    Stream: column 9 section LENGTH start: 664 length 6
-    Stream: column 11 section LENGTH start: 670 length 6
-    Stream: column 13 section DATA start: 676 length 7
-    Stream: column 14 section DATA start: 683 length 6
-    Stream: column 14 section LENGTH start: 689 length 6
-    Stream: column 14 section DICTIONARY_DATA start: 695 length 10
-    Stream: column 15 section LENGTH start: 705 length 6
-    Stream: column 17 section DATA start: 711 length 25
-    Stream: column 18 section DATA start: 736 length 18
-    Stream: column 18 section LENGTH start: 754 length 8
-    Stream: column 19 section LENGTH start: 762 length 6
-    Stream: column 20 section DATA start: 768 length 15
-    Stream: column 20 section LENGTH start: 783 length 6
-    Stream: column 22 section DATA start: 789 length 6
-    Stream: column 23 section DATA start: 795 length 15
-    Stream: column 23 section LENGTH start: 810 length 6
-    Encoding column 0: DIRECT
-    Encoding column 1: DIRECT
-    Encoding column 2: DIRECT
-    Encoding column 3: DIRECT_V2
-    Encoding column 4: DIRECT_V2
-    Encoding column 5: DIRECT_V2
-    Encoding column 6: DIRECT
-    Encoding column 7: DIRECT
-    Encoding column 8: DIRECT_V2
-    Encoding column 9: DIRECT_V2
-    Encoding column 10: DIRECT
-    Encoding column 11: DIRECT_V2
-    Encoding column 12: DIRECT
-    Encoding column 13: DIRECT_V2
-    Encoding column 14: DICTIONARY_V2[2]
-    Encoding column 15: DIRECT_V2
-    Encoding column 16: DIRECT
-    Encoding column 17: DIRECT_V2
-    Encoding column 18: DIRECT_V2
-    Encoding column 19: DIRECT_V2
-    Encoding column 20: DIRECT_V2
-    Encoding column 21: DIRECT
-    Encoding column 22: DIRECT_V2
-    Encoding column 23: DIRECT_V2
-
-File length: 1711 bytes
-Padding length: 0 bytes
-Padding ratio: 0%
-______________________________________________________________________
-</code></pre>
-
-<h2 id="java-data">Java Data</h2>
-
-<p>The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field’s type.</p>
-
-<h2 id="java-scan">Java Scan</h2>
-
-<p>The scan command reads the contents of the file without printing anything. It
-is primarily intendend for benchmarking the Java reader without including the
-cost of printing the data out.</p>
-
-<h2 id="java-convert">Java Convert</h2>
-
-<p>The convert command reads several JSON files and converts them into a
-single ORC file.</p>
-
-<dl>
-  <dt>-o <filename></filename></dt>
-  <dd>Sets the output ORC filename, which defaults to output.orc</dd>
-  <dt>-s <schema></schema></dt>
-  <dd>Sets the schema for the ORC file. By default, the schema is automatically discovered.</dd>
-  <dt>-h</dt>
-  <dd>Print help</dd>
-</dl>
-
-<p>The automatic JSON schema discovery is equivalent to the json-schema tool
-below.</p>
-
-<h2 id="java-json-schema">Java JSON Schema</h2>
-
-<p>The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.</p>
-
-<dl>
-  <dt>-f</dt>
-  <dd>Print the schema as a list of flat types for each subfield</dd>
-  <dt>-t</dt>
-  <dd>Print the schema as a Hive table declaration</dd>
-  <dt>-h</dt>
-  <dd>Print help</dd>
-</dl>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/cpp-tools.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/spec-intro.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
+<p>The subcommands for the tools are:</p>
 
+<ul>
+  <li>meta - print the metadata of an ORC file</li>
+  <li>data - print the data of an ORC file</li>
+  <li>scan (since ORC 1.3) - scan the data for benchmarking</li>
+  <li>convert (since ORC 1.4) - convert JSON files to ORC</li>
+  <li>json-schema (since ORC 1.4) - determine the schema of JSON documents</li>
+</ul>
 
-  
+<p>The command line looks like:</p>
 
-  
-    
-  
+<pre><code class="language-shell">% java -jar orc-tools-X.Y.Z-uber.jar &lt;sub-command&gt; &lt;args&gt;
+</code></pre>
 
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
+<h2 id="java-meta">Java Meta</h2>
 
+<p>The meta command prints the metadata about the given ORC file and is
+equivalent to the Hive ORC File Dump command.</p>
 
-</ul>
+<dl>
+  <dt>-j</dt>
+  <dd>format the output in JSON</dd>
+  <dt>-p</dt>
+  <dd>pretty print the output</dd>
+  <dt>-t</dt>
+  <dd>print the timezone of the writer</dd>
+  <dt>–rowindex</dt>
+  <dd>print the row indexes for the comma separated list of column ids</dd>
+  <dt>–recover</dt>
+  <dd>skip over corrupted values in the ORC file</dd>
+  <dt>–skip-dump</dt>
+  <dd>skip dumping the metadata</dd>
+  <dt>–backup-path</dt>
+  <dd>when used with –recover specifies the path where the recovered file is written</dd>
+</dl>
 
-    
-    <h4>Installing</h4>
-    
+<p>An example of the output is given below:</p>
 
-<ul>
+<pre><code class="language-shell">% java -jar orc-tools-X.Y.Z-uber.jar meta examples/TestOrcFile.test1.orc
+Processing data file examples/TestOrcFile.test1.orc [length: 1711]
+Structure for examples/TestOrcFile.test1.orc
+File Version: 0.12 with HIVE_8732
+Rows: 2
+Compression: ZLIB
+Compression size: 10000
+Type: struct&lt;boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,
+long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,
+middle:struct&lt;list:array&lt;struct&lt;int1:int,string1:string&gt;&gt;&gt;,list:array&lt;
+struct&lt;int1:int,string1:string&gt;&gt;,map:map&lt;string,struct&lt;int1:int,string1:
+string&gt;&gt;&gt;
 
-  
+Stripe Statistics:
+  Stripe 1:
+    Column 0: count: 2 hasNull: false
+    Column 1: count: 2 hasNull: false true: 1
+    Column 2: count: 2 hasNull: false min: 1 max: 100 sum: 101
+    Column 3: count: 2 hasNull: false min: 1024 max: 2048 sum: 3072
+    Column 4: count: 2 hasNull: false min: 65536 max: 65536 sum: 131072
+    Column 5: count: 2 hasNull: false min: 9223372036854775807 max: 9223372036854775807
+    Column 6: count: 2 hasNull: false min: 1.0 max: 2.0 sum: 3.0
+    Column 7: count: 2 hasNull: false min: -15.0 max: -5.0 sum: -20.0
+    Column 8: count: 2 hasNull: false sum: 5
+    Column 9: count: 2 hasNull: false min: bye max: hi sum: 5
+    Column 10: count: 2 hasNull: false
+    Column 11: count: 2 hasNull: false
+    Column 12: count: 4 hasNull: false
+    Column 13: count: 4 hasNull: false min: 1 max: 2 sum: 6
+    Column 14: count: 4 hasNull: false min: bye max: sigh sum: 14
+    Column 15: count: 2 hasNull: false
+    Column 16: count: 5 hasNull: false
+    Column 17: count: 5 hasNull: false min: -100000 max: 100000000 sum: 99901241
+    Column 18: count: 5 hasNull: false min: bad max: in sum: 15
+    Column 19: count: 2 hasNull: false
+    Column 20: count: 2 hasNull: false min: chani max: mauddib sum: 12
+    Column 21: count: 2 hasNull: false
+    Column 22: count: 2 hasNull: false min: 1 max: 5 sum: 6
+    Column 23: count: 2 hasNull: false min: chani max: mauddib sum: 12
 
-  
-    
-  
+File Statistics:
+  Column 0: count: 2 hasNull: false
+  Column 1: count: 2 hasNull: false true: 1
+  Column 2: count: 2 hasNull: false min: 1 max: 100 sum: 101
+  Column 3: count: 2 hasNull: false min: 1024 max: 2048 sum: 3072
+  Column 4: count: 2 hasNull: false min: 65536 max: 65536 sum: 131072
+  Column 5: count: 2 hasNull: false min: 9223372036854775807 max: 9223372036854775807
+  Column 6: count: 2 hasNull: false min: 1.0 max: 2.0 sum: 3.0
+  Column 7: count: 2 hasNull: false min: -15.0 max: -5.0 sum: -20.0
+  Column 8: count: 2 hasNull: false sum: 5
+  Column 9: count: 2 hasNull: false min: bye max: hi sum: 5
+  Column 10: count: 2 hasNull: false
+  Column 11: count: 2 hasNull: false
+  Column 12: count: 4 hasNull: false
+  Column 13: count: 4 hasNull: false min: 1 max: 2 sum: 6
+  Column 14: count: 4 hasNull: false min: bye max: sigh sum: 14
+  Column 15: count: 2 hasNull: false
+  Column 16: count: 5 hasNull: false
+  Column 17: count: 5 hasNull: false min: -100000 max: 100000000 sum: 99901241
+  Column 18: count: 5 hasNull: false min: bad max: in sum: 15
+  Column 19: count: 2 hasNull: false
+  Column 20: count: 2 hasNull: false min: chani max: mauddib sum: 12
+  Column 21: count: 2 hasNull: false
+  Column 22: count: 2 hasNull: false min: 1 max: 5 sum: 6
+  Column 23: count: 2 hasNull: false min: chani max: mauddib sum: 12
 
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
+Stripes:
+  Stripe: offset: 3 data: 243 rows: 2 tail: 199 index: 570
+    Stream: column 0 section ROW_INDEX start: 3 length 11
+    Stream: column 1 section ROW_INDEX start: 14 length 22
+    Stream: column 2 section ROW_INDEX start: 36 length 26
+    Stream: column 3 section ROW_INDEX start: 62 length 27
+    Stream: column 4 section ROW_INDEX start: 89 length 30
+    Stream: column 5 section ROW_INDEX start: 119 length 28
+    Stream: column 6 section ROW_INDEX start: 147 length 34
+    Stream: column 7 section ROW_INDEX start: 181 length 34
+    Stream: column 8 section ROW_INDEX start: 215 length 21
+    Stream: column 9 section ROW_INDEX start: 236 length 30
+    Stream: column 10 section ROW_INDEX start: 266 length 11
+    Stream: column 11 section ROW_INDEX start: 277 length 16
+    Stream: column 12 section ROW_INDEX start: 293 length 11
+    Stream: column 13 section ROW_INDEX start: 304 length 24
+    Stream: column 14 section ROW_INDEX start: 328 length 31
+    Stream: column 15 section ROW_INDEX start: 359 length 16
+    Stream: column 16 section ROW_INDEX start: 375 length 11
+    Stream: column 17 section ROW_INDEX start: 386 length 32
+    Stream: column 18 section ROW_INDEX start: 418 length 30
+    Stream: column 19 section ROW_INDEX start: 448 length 16
+    Stream: column 20 section ROW_INDEX start: 464 length 37
+    Stream: column 21 section ROW_INDEX start: 501 length 11
+    Stream: column 22 section ROW_INDEX start: 512 length 24
+    Stream: column 23 section ROW_INDEX start: 536 length 37
+    Stream: column 1 section DATA start: 573 length 5
+    Stream: column 2 section DATA start: 578 length 6
+    Stream: column 3 section DATA start: 584 length 9
+    Stream: column 4 section DATA start: 593 length 11
+    Stream: column 5 section DATA start: 604 length 12
+    Stream: column 6 section DATA start: 616 length 11
+    Stream: column 7 section DATA start: 627 length 15
+    Stream: column 8 section DATA start: 642 length 8
+    Stream: column 8 section LENGTH start: 650 length 6
+    Stream: column 9 section DATA start: 656 length 8
+    Stream: column 9 section LENGTH start: 664 length 6
+    Stream: column 11 section LENGTH start: 670 length 6
+    Stream: column 13 section DATA start: 676 length 7
+    Stream: column 14 section DATA start: 683 length 6
+    Stream: column 14 section LENGTH start: 689 length 6
+    Stream: column 14 section DICTIONARY_DATA start: 695 length 10
+    Stream: column 15 section LENGTH start: 705 length 6
+    Stream: column 17 section DATA start: 711 length 25
+    Stream: column 18 section DATA start: 736 length 18
+    Stream: column 18 section LENGTH start: 754 length 8
+    Stream: column 19 section LENGTH start: 762 length 6
+    Stream: column 20 section DATA start: 768 length 15
+    Stream: column 20 section LENGTH start: 783 length 6
+    Stream: column 22 section DATA start: 789 length 6
+    Stream: column 23 section DATA start: 795 length 15
+    Stream: column 23 section LENGTH start: 810 length 6
+    Encoding column 0: DIRECT
+    Encoding column 1: DIRECT
+    Encoding column 2: DIRECT
+    Encoding column 3: DIRECT_V2
+    Encoding column 4: DIRECT_V2
+    Encoding column 5: DIRECT_V2
+    Encoding column 6: DIRECT
+    Encoding column 7: DIRECT
+    Encoding column 8: DIRECT_V2
+    Encoding column 9: DIRECT_V2
+    Encoding column 10: DIRECT
+    Encoding column 11: DIRECT_V2
+    Encoding column 12: DIRECT
+    Encoding column 13: DIRECT_V2
+    Encoding column 14: DICTIONARY_V2[2]
+    Encoding column 15: DIRECT_V2
+    Encoding column 16: DIRECT
+    Encoding column 17: DIRECT_V2
+    Encoding column 18: DIRECT_V2
+    Encoding column 19: DIRECT_V2
+    Encoding column 20: DIRECT_V2
+    Encoding column 21: DIRECT
+    Encoding column 22: DIRECT_V2
+    Encoding column 23: DIRECT_V2
 
+File length: 1711 bytes
+Padding length: 0 bytes
+Padding ratio: 0%
+______________________________________________________________________
+</code></pre>
 
-  
+<h2 id="java-data">Java Data</h2>
 
-  
-    
-  
+<p>The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field’s type.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
+<h2 id="java-scan">Java Scan</h2>
+
+<p>The scan command reads the contents of the file without printing anything. It
+is primarily intendend for benchmarking the Java reader without including the
+cost of printing the data out.</p>
+
+<h2 id="java-convert">Java Convert</h2>
 
+<p>The convert command reads several JSON files and converts them into a
+single ORC file.</p>
+
+<dl>
+  <dt>-o <filename></filename></dt>
+  <dd>Sets the output ORC filename, which defaults to output.orc</dd>
+  <dt>-s <schema></schema></dt>
+  <dd>Sets the schema for the ORC file. By default, the schema is automatically discovered.</dd>
+  <dt>-h</dt>
+  <dd>Print help</dd>
+</dl>
+
+<p>The automatic JSON schema discovery is equivalent to the json-schema tool
+below.</p>
+
+<h2 id="java-json-schema">Java JSON Schema</h2>
+
+<p>The JSON Schema discovery tool processes a set of JSON documents and
+produces a schema that encompasses all of the records in all of the
+documents. It works by computing the enclosing type and promoting it
+to include all of the observed values.</p>
+
+<dl>
+  <dt>-f</dt>
+  <dd>Print the schema as a list of flat types for each subfield</dd>
+  <dt>-t</dt>
+  <dd>Print the schema as a Hive table declaration</dd>
+  <dt>-h</dt>
+  <dd>Print help</dd>
+</dl>
+
+          
 
-</ul>
 
-    
-    <h4>Using in Hive</h4>
-    
 
-<ul>
 
-  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
+  
   
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
 
+  
+  
 
   
+  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/cpp-tools.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            <span class="next disabled">Next</span>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in MapReduce</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1879,19 +1014,21 @@ to include all of the observed values.</p>
     
   
     
+      <li class=""><a href="/docs/index.html">Background</a></li>
+      
+
+
   
-    
-  
-    
+
   
     
   
-    
+
   
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
@@ -1931,20 +1068,10 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
   
 
   
@@ -1963,34 +1090,34 @@ to include all of the observed values.</p>
     
   
     
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
   
     
   
-
-  
     
   
     
   
     
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      
+
+
+  
+
   
     
   
+
+  
     
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -2007,15 +1134,7 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2053,14 +1172,14 @@ to include all of the observed values.</p>
     
   
     
-      <li class="current"><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2087,31 +1206,7 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2135,31 +1230,17 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2191,19 +1272,7 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2239,13 +1308,25 @@ to include all of the observed values.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2255,7 +1336,7 @@ to include all of the observed values.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2273,17 +1354,17 @@ to include all of the observed values.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2303,11 +1384,17 @@ to include all of the observed values.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2329,7 +1416,7 @@ to include all of the observed values.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class="current"><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 


[2/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/specification/ORCv1.html
----------------------------------------------------------------------
diff --git a/specification/ORCv1.html b/specification/ORCv1.html
new file mode 100644
index 0000000..e3cad2e
--- /dev/null
+++ b/specification/ORCv1.html
@@ -0,0 +1,1744 @@
+<!DOCTYPE HTML>
+<html lang="en-US">
+<head>
+  <meta charset="UTF-8">
+  <title>ORC Specification v1</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+</head>
+
+
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+    </nav>
+  </div>
+</header>
+
+
+  <section class="standalone">
+  <div class="grid">
+
+    <div class="unit whole">
+      <article>
+        <h1>ORC Specification v1</h1>
+        <p>This version of the file format was originally released as part of
+Hive 0.12.</p>
+
+<h1 id="motivation">Motivation</h1>
+
+<p>Hive’s RCFile was the standard format for storing tabular data in
+Hadoop for several years. However, RCFile has limitations because it
+treats each column as a binary blob without semantics. In Hive 0.11 we
+added a new file format named Optimized Row Columnar (ORC) file that
+uses and retains the type information from the table definition. ORC
+uses type specific readers and writers that provide light weight
+compression techniques such as dictionary encoding, bit packing, delta
+encoding, and run length encoding – resulting in dramatically smaller
+files. Additionally, ORC can apply generic compression using zlib, or
+Snappy on top of the lightweight compression for even smaller
+files. However, storage savings are only part of the gain. ORC
+supports projection, which selects subsets of the columns for reading,
+so that queries reading only one column read only the required
+bytes. Furthermore, ORC files include light weight indexes that
+include the minimum and maximum values for each column in each set of
+10,000 rows and the entire file. Using pushdown filters from Hive, the
+file reader can skip entire sets of rows that aren’t important for
+this query.</p>
+
+<p><img src="/img/OrcFileLayout.png" alt="ORC file structure" /></p>
+
+<h1 id="file-tail">File Tail</h1>
+
+<p>Since HDFS does not support changing the data in a file after it is
+written, ORC stores the top level index at the end of the file. The
+overall structure of the file is given in the figure above.  The
+file’s tail consists of 3 parts; the file metadata, file footer and
+postscript.</p>
+
+<p>The metadata for ORC is stored using
+<a href="https://s.apache.org/protobuf_encoding">Protocol Buffers</a>, which provides
+the ability to add new fields without breaking readers. This document
+incorporates the Protobuf definition from the
+<a href="https://s.apache.org/orc_proto">ORC source code</a> and the
+reader is encouraged to review the Protobuf encoding if they need to
+understand the byte-level encoding</p>
+
+<h2 id="postscript">Postscript</h2>
+
+<p>The Postscript section provides the necessary information to interpret
+the rest of the file including the length of the file’s Footer and
+Metadata sections, the version of the file, and the kind of general
+compression used (eg. none, zlib, or snappy). The Postscript is never
+compressed and ends one byte before the end of the file. The version
+stored in the Postscript is the lowest version of Hive that is
+guaranteed to be able to read the file and it stored as a sequence of
+the major and minor version. This file version is encoded as [0,12].</p>
+
+<p>The process of reading an ORC file works backwards through the
+file. Rather than making multiple short reads, the ORC reader reads
+the last 16k bytes of the file with the hope that it will contain both
+the Footer and Postscript sections. The final byte of the file
+contains the serialized length of the Postscript, which must be less
+than 256 bytes. Once the Postscript is parsed, the compressed
+serialized length of the Footer is known and it can be decompressed
+and parsed.</p>
+
+<p><code>message PostScript {
+ // the length of the footer section in bytes
+ optional uint64 footerLength = 1;
+ // the kind of generic compression used
+ optional CompressionKind compression = 2;
+ // the maximum size of each compression chunk
+ optional uint64 compressionBlockSize = 3;
+ // the version of the writer
+ repeated uint32 version = 4 [packed = true];
+ // the length of the metadata section in bytes
+ optional uint64 metadataLength = 5;
+ // the fixed string "ORC"
+ optional string magic = 8000;
+}
+</code></p>
+
+<p><code>enum CompressionKind {
+ NONE = 0;
+ ZLIB = 1;
+ SNAPPY = 2;
+ LZO = 3;
+ LZ4 = 4;
+ ZSTD = 5;
+}
+</code></p>
+
+<h2 id="footer">Footer</h2>
+
+<p>The Footer section contains the layout of the body of the file, the
+type schema information, the number of rows, and the statistics about
+each of the columns.</p>
+
+<p>The file is broken in to three parts- Header, Body, and Tail. The
+Header consists of the bytes “ORC’’ to support tools that want to
+scan the front of the file to determine the type of the file. The Body
+contains the rows and indexes, and the Tail gives the file level
+information as described in this section.</p>
+
+<p><code>message Footer {
+ // the length of the file header in bytes (always 3)
+ optional uint64 headerLength = 1;
+ // the length of the file header and body in bytes
+ optional uint64 contentLength = 2;
+ // the information about the stripes
+ repeated StripeInformation stripes = 3;
+ // the schema information
+ repeated Type types = 4;
+ // the user metadata that was added
+ repeated UserMetadataItem metadata = 5;
+ // the total number of rows in the file
+ optional uint64 numberOfRows = 6;
+ // the statistics of each column across the file
+ repeated ColumnStatistics statistics = 7;
+ // the maximum number of rows in each index entry
+ optional uint32 rowIndexStride = 8;
+}
+</code></p>
+
+<h3 id="stripe-information">Stripe Information</h3>
+
+<p>The body of the file is divided into stripes. Each stripe is self
+contained and may be read using only its own bytes combined with the
+file’s Footer and Postscript. Each stripe contains only entire rows so
+that rows never straddle stripe boundaries. Stripes have three
+sections: a set of indexes for the rows within the stripe, the data
+itself, and a stripe footer. Both the indexes and the data sections
+are divided by columns so that only the data for the required columns
+needs to be read.</p>
+
+<p><code>message StripeInformation {
+ // the start of the stripe within the file
+ optional uint64 offset = 1;
+ // the length of the indexes in bytes
+ optional uint64 indexLength = 2;
+ // the length of the data in bytes
+ optional uint64 dataLength = 3;
+ // the length of the footer in bytes
+ optional uint64 footerLength = 4;
+ // the number of rows in the stripe
+ optional uint64 numberOfRows = 5;
+}
+</code></p>
+
+<h3 id="type-information">Type Information</h3>
+
+<p>All of the rows in an ORC file must have the same schema. Logically
+the schema is expressed as a tree as in the figure below, where
+the compound types have subcolumns under them.</p>
+
+<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
+
+<p>The equivalent Hive DDL would be:</p>
+
+<p><code>create table Foobar (
+ myInt int,
+ myMap map&lt;string,
+ struct&lt;myString : string,
+ myDouble: double&gt;&gt;,
+ myTime timestamp
+);
+</code></p>
+
+<p>The type tree is flattened in to a list via a pre-order traversal
+where each type is assigned the next id. Clearly the root of the type
+tree is always type id 0. Compound types have a field named subtypes
+that contains the list of their children’s type ids.</p>
+
+<p><code>message Type {
+ enum Kind {
+ BOOLEAN = 0;
+ BYTE = 1;
+ SHORT = 2;
+ INT = 3;
+ LONG = 4;
+ FLOAT = 5;
+ DOUBLE = 6;
+ STRING = 7;
+ BINARY = 8;
+ TIMESTAMP = 9;
+ LIST = 10;
+ MAP = 11;
+ STRUCT = 12;
+ UNION = 13;
+ DECIMAL = 14;
+ DATE = 15;
+ VARCHAR = 16;
+ CHAR = 17;
+ }
+ // the kind of this type
+ required Kind kind = 1;
+ // the type ids of any subcolumns for list, map, struct, or union
+ repeated uint32 subtypes = 2 [packed=true];
+ // the list of field names for struct
+ repeated string fieldNames = 3;
+ // the maximum length of the type for varchar or char in UTF-8 characters
+ optional uint32 maximumLength = 4;
+ // the precision and scale for decimal
+ optional uint32 precision = 5;
+ optional uint32 scale = 6;
+}
+</code></p>
+
+<h3 id="column-statistics">Column Statistics</h3>
+
+<p>The goal of the column statistics is that for each column, the writer
+records the count and depending on the type other useful fields. For
+most of the primitive types, it records the minimum and maximum
+values; and for numeric types it additionally stores the sum.
+From Hive 1.1.0 onwards, the column statistics will also record if
+there are any null values within the row group by setting the hasNull flag.
+The hasNull flag is used by ORC’s predicate pushdown to better answer
+‘IS NULL’ queries.</p>
+
+<p><code>message ColumnStatistics {
+ // the number of values
+ optional uint64 numberOfValues = 1;
+ // At most one of these has a value for any column
+ optional IntegerStatistics intStatistics = 2;
+ optional DoubleStatistics doubleStatistics = 3;
+ optional StringStatistics stringStatistics = 4;
+ optional BucketStatistics bucketStatistics = 5;
+ optional DecimalStatistics decimalStatistics = 6;
+ optional DateStatistics dateStatistics = 7;
+ optional BinaryStatistics binaryStatistics = 8;
+ optional TimestampStatistics timestampStatistics = 9;
+ optional bool hasNull = 10;
+}
+</code></p>
+
+<p>For integer types (tinyint, smallint, int, bigint), the column
+statistics includes the minimum, maximum, and sum. If the sum
+overflows long at any point during the calculation, no sum is
+recorded.</p>
+
+<p><code>message IntegerStatistics {
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For floating point types (float, double), the column statistics
+include the minimum, maximum, and sum. If the sum overflows a double,
+no sum is recorded.</p>
+
+<p><code>message DoubleStatistics {
+ optional double minimum = 1;
+ optional double maximum = 2;
+ optional double sum = 3;
+}
+</code></p>
+
+<p>For strings, the minimum value, maximum value, and the sum of the
+lengths of the values are recorded.</p>
+
+<p><code>message StringStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ // sum will store the total length of all strings
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For booleans, the statistics include the count of false and true values.</p>
+
+<p><code>message BucketStatistics {
+ repeated uint64 count = 1 [packed=true];
+}
+</code></p>
+
+<p>For decimals, the minimum, maximum, and sum are stored.</p>
+
+<p><code>message DecimalStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ optional string sum = 3;
+}
+</code></p>
+
+<p>Date columns record the minimum and maximum values as the number of
+days since the epoch (1/1/2015).</p>
+
+<p><code>message DateStatistics {
+ // min,max values saved as days since epoch
+ optional sint32 minimum = 1;
+ optional sint32 maximum = 2;
+}
+</code></p>
+
+<p>Timestamp columns record the minimum and maximum values as the number of
+milliseconds since the epoch (1/1/2015).</p>
+
+<p><code>message TimestampStatistics {
+ // min,max values saved as milliseconds since epoch
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+}
+</code></p>
+
+<p>Binary columns store the aggregate number of bytes across all of the values.</p>
+
+<p><code>message BinaryStatistics {
+ // sum will store the total binary blob length
+ optional sint64 sum = 1;
+}
+</code></p>
+
+<h3 id="user-metadata">User Metadata</h3>
+
+<p>The user can add arbitrary key/value pairs to an ORC file as it is
+written. The contents of the keys and values are completely
+application defined, but the key is a string and the value is
+binary. Care should be taken by applications to make sure that their
+keys are unique and in general should be prefixed with an organization
+code.</p>
+
+<p><code>message UserMetadataItem {
+ // the user defined key
+ required string name = 1;
+ // the user defined binary value
+ required bytes value = 2;
+}
+</code></p>
+
+<h3 id="file-metadata">File Metadata</h3>
+
+<p>The file Metadata section contains column statistics at the stripe
+level granularity. These statistics enable input split elimination
+based on the predicate push-down evaluated per a stripe.</p>
+
+<p><code>message StripeStatistics {
+ repeated ColumnStatistics colStats = 1;
+}
+</code></p>
+
+<p><code>message Metadata {
+ repeated StripeStatistics stripeStats = 1;
+}
+</code></p>
+
+<h1 id="compression">Compression</h1>
+
+<p>If the ORC file writer selects a generic compression codec (zlib or
+snappy), every part of the ORC file except for the Postscript is
+compressed with that codec. However, one of the requirements for ORC
+is that the reader be able to skip over compressed bytes without
+decompressing the entire stream. To manage this, ORC writes compressed
+streams in chunks with headers as in the figure below.
+To handle uncompressable data, if the compressed data is larger than
+the original, the original is stored and the isOriginal flag is
+set. Each header is 3 bytes long with (compressedLength * 2 +
+isOriginal) stored as a little endian value. For example, the header
+for a chunk that compressed to 100,000 bytes would be [0x40, 0x0d,
+0x03]. The header for 5 bytes that did not compress would be [0x0b,
+0x00, 0x00]. Each compression chunk is compressed independently so
+that as long as a decompressor starts at the top of a header, it can
+start decompressing without the previous bytes.</p>
+
+<p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
+
+<p>The default compression chunk size is 256K, but writers can choose
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
+size.</p>
+
+<p>ORC files without generic compression write each stream directly
+with no headers.</p>
+
+<h1 id="run-length-encoding">Run Length Encoding</h1>
+
+<h2 id="base-128-varint">Base 128 Varint</h2>
+
+<p>Variable width integer encodings take advantage of the fact that most
+numbers are small and that having smaller encodings for small numbers
+shrinks the overall size of the data. ORC uses the varint format from
+Protocol Buffers, which writes data in little endian format using the
+low 7 bits of each byte. The high bit in each byte is set if the
+number continues into the next byte.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Unsigned Original</th>
+      <th style="text-align: left">Serialized</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0x00</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">127</td>
+      <td style="text-align: left">0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">128</td>
+      <td style="text-align: left">0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">129</td>
+      <td style="text-align: left">0x81, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,383</td>
+      <td style="text-align: left">0xff, 0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,384</td>
+      <td style="text-align: left">0x80, 0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,385</td>
+      <td style="text-align: left">0x81, 0x80, 0x01</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>For signed integer types, the number is converted into an unsigned
+number using a zigzag encoding. Zigzag encoding moves the sign bit to
+the least significant bit using the expression (val « 1) ^ (val »
+63) and derives its name from the fact that positive and negative
+numbers alternate once encoded. The unsigned number is then serialized
+as above.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Signed Original</th>
+      <th style="text-align: left">Unsigned</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-1</td>
+      <td style="text-align: left">1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-2</td>
+      <td style="text-align: left">3</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">4</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="byte-run-length-encoding">Byte Run Length Encoding</h2>
+
+<p>For byte streams, ORC uses a very light weight encoding of identical
+values.</p>
+
+<ul>
+  <li>Run - a sequence of at least 3 identical values</li>
+  <li>Literals - a sequence of non-identical values</li>
+</ul>
+
+<p>The first byte of each group of values is a header than determines
+whether it is a run (value between 0 to 127) or literal list (value
+between -128 to -1). For runs, the control byte is the length of the
+run minus the length of the minimal run (3) and the control byte for
+literal lists is the negative length of the list. For example, a
+hundred 0’s is encoded as [0x61, 0x00] and the sequence 0x44, 0x45
+would be encoded as [0xfe, 0x44, 0x45]. The next group can choose
+either of the encodings.</p>
+
+<h2 id="boolean-run-length-encoding">Boolean Run Length Encoding</h2>
+
+<p>For encoding boolean types, the bits are put in the bytes from most
+significant to least significant. The bytes are encoded using byte run
+length encoding as described in the previous section. For example,
+the byte sequence [0xff, 0x80] would be one true followed by
+seven false values.</p>
+
+<h2 id="integer-run-length-encoding-version-1">Integer Run Length Encoding, version 1</h2>
+
+<p>In Hive 0.11 ORC files used Run Length Encoding version 1 (RLEv1),
+which provides a lightweight compression of signed or unsigned integer
+sequences. RLEv1 has two sub-encodings:</p>
+
+<ul>
+  <li>Run - a sequence of values that differ by a small fixed delta</li>
+  <li>Literals - a sequence of varint encoded values</li>
+</ul>
+
+<p>Runs start with an initial byte of 0x00 to 0x7f, which encodes the
+length of the run - 3. A second byte provides the fixed delta in the
+range of -128 to 127. Finally, the first value of the run is encoded
+as a base 128 varint.</p>
+
+<p>For example, if the sequence is 100 instances of 7 the encoding would
+start with 100 - 3, followed by a delta of 0, and a varint of 7 for
+an encoding of [0x61, 0x00, 0x07]. To encode the sequence of numbers
+running from 100 to 1, the first byte is 100 - 3, the delta is -1,
+and the varint is 100 for an encoding of [0x61, 0xff, 0x64].</p>
+
+<p>Literals start with an initial byte of 0x80 to 0xff, which corresponds
+to the negative of number of literals in the sequence. Following the
+header byte, the list of N varints is encoded. Thus, if there are
+no runs, the overhead is 1 byte for each 128 integers. The first 5
+prime numbers [2, 3, 4, 7, 11] would encoded as [0xfb, 0x02, 0x03,
+0x04, 0x07, 0xb].</p>
+
+<h2 id="integer-run-length-encoding-version-2">Integer Run Length Encoding, version 2</h2>
+
+<p>In Hive 0.12, ORC introduced Run Length Encoding version 2 (RLEv2),
+which has improved compression and fixed bit width encodings for
+faster expansion. RLEv2 uses four sub-encodings based on the data:</p>
+
+<ul>
+  <li>Short Repeat - used for short sequences with repeated values</li>
+  <li>Direct - used for random sequences with a fixed bit width</li>
+  <li>Patched Base - used for random sequences with a variable bit width</li>
+  <li>Delta - used for monotonically increasing or decreasing sequences</li>
+</ul>
+
+<h3 id="short-repeat">Short Repeat</h3>
+
+<p>The short repeat encoding is used for short repeating integer
+sequences with the goal of minimizing the overhead of the header. All
+of the bits listed in the header are from the first byte to the last
+and from most significant bit to least significant bit. If the type is
+signed, the value is zigzag encoded.</p>
+
+<ul>
+  <li>1 byte header
+    <ul>
+      <li>2 bits for encoding type (0)</li>
+      <li>3 bits for width (W) of repeating value (1 to 8 bytes)</li>
+      <li>3 bits for repeat count (3 to 10 values)</li>
+    </ul>
+  </li>
+  <li>W bytes in big endian format, which is zigzag encoded if they type
+is signed</li>
+</ul>
+
+<p>The unsigned sequence of [10000, 10000, 10000, 10000, 10000] would be
+serialized with short repeat encoding (0), a width of 2 bytes (1), and
+repeat count of 5 (2) as [0x0a, 0x27, 0x10].</p>
+
+<h3 id="direct">Direct</h3>
+
+<p>The direct encoding is used for integer sequences whose values have a
+relatively constant bit width. It encodes the values directly using a
+fixed width big endian encoding. The width of the values is encoded
+using the table below.</p>
+
+<p>The 5 bit width encoding table for RLEv2:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Width in Bits</th>
+      <th style="text-align: left">Encoded Value</th>
+      <th style="text-align: left">Notes</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">for delta encoding</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">for non-delta encoding</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">4</td>
+      <td style="text-align: left">3</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">8</td>
+      <td style="text-align: left">7</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16</td>
+      <td style="text-align: left">15</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">24</td>
+      <td style="text-align: left">23</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">32</td>
+      <td style="text-align: left">27</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">40</td>
+      <td style="text-align: left">28</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">48</td>
+      <td style="text-align: left">29</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">56</td>
+      <td style="text-align: left">30</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">64</td>
+      <td style="text-align: left">31</td>
+      <td style="text-align: left"> </td>
+    </tr>
+    <tr>
+      <td style="text-align: left">3</td>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">5 &lt;= x &lt;= 7</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">9 &lt;= x &lt;= 15</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">17 &lt;= x &lt;= 21</td>
+      <td style="text-align: left">x - 1</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">26</td>
+      <td style="text-align: left">24</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">28</td>
+      <td style="text-align: left">25</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">30</td>
+      <td style="text-align: left">26</td>
+      <td style="text-align: left">deprecated</td>
+    </tr>
+  </tbody>
+</table>
+
+<ul>
+  <li>2 bytes header
+    <ul>
+      <li>2 bits for encoding type (1)</li>
+      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
+width encoding table</li>
+      <li>9 bits for length (L) (1 to 512 values)</li>
+    </ul>
+  </li>
+  <li>W * L bits (padded to the next byte) encoded in big endian format, which is
+zigzag encoding if the type is signed</li>
+</ul>
+
+<p>The unsigned sequence of [23713, 43806, 57005, 48879] would be
+serialized with direct encoding (1), a width of 16 bits (15), and
+length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
+0xbe, 0xef].</p>
+
+<h3 id="patched-base">Patched Base</h3>
+
+<p>The patched base encoding is used for integer sequences whose bit
+widths varies a lot. The minimum signed value of the sequence is found
+and subtracted from the other values. The bit width of those adjusted
+values is analyzed and the 90 percentile of the bit width is chosen
+as W. The 10\% of values larger than W use patches from a patch list
+to set the additional bits. Patches are encoded as a list of gaps in
+the index values and the additional value bits.</p>
+
+<ul>
+  <li>4 bytes header
+    <ul>
+      <li>2 bits for encoding type (2)</li>
+      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
+  width encoding table</li>
+      <li>9 bits for length (L) (1 to 512 values)</li>
+      <li>3 bits for base value width (BW) (1 to 8 bytes)</li>
+      <li>5 bits for patch width (PW) (1 to 64 bits) using  the 5 bit width
+encoding table</li>
+      <li>3 bits for patch gap width (PGW) (1 to 8 bits)</li>
+      <li>5 bits for patch list length (PLL) (0 to 31 patches)</li>
+    </ul>
+  </li>
+  <li>Base value (BW bytes) - The base value is stored as a big endian value
+with negative values marked by the most significant bit set. If it that
+bit is set, the entire value is negated.</li>
+  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
+values that are added to the base value.</li>
+  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
+values that are added to the base value.</li>
+  <li>Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
+that didn’t fit within W bits. Each entry in the list consists of a
+gap, which is the number of elements skipped from the previous
+patch, and a patch value. Patches are applied by logically or’ing
+the data values with the relevant patch shifted W bits left. If a
+patch is 0, it was introduced to skip over more than 255 items. The
+combined length of each patch (PGW + PW) must be less or equal to
+64.</li>
+</ul>
+
+<p>The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
+2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
+has a minimum of 2000, which makes the adjusted
+sequence [30, 0, 20, 998000, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
+150, 160, 170, 180, 190]. It has an
+encoding of patched base (2), a bit width of 8 (7), a length of 20
+(19), a base value width of 2 bytes (1), a patch width of 12 bits (11),
+patch gap width of 2 bits (1), and a patch list length of 1 (1). The
+base value is 2000 and the combined result is [0x8e, 0x13, 0x2b, 0x21, 0x07,
+0xd0, 0x1e, 0x00, 0x14, 0x70, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e,
+0x78, 0x82, 0x8c, 0x96, 0xa0, 0xaa, 0xb4, 0xbe, 0xfc, 0xe8]</p>
+
+<h3 id="delta">Delta</h3>
+
+<p>The Delta encoding is used for monotonically increasing or decreasing
+sequences. The first two numbers in the sequence can not be identical,
+because the encoding is using the sign of the first delta to determine
+if the series is increasing or decreasing.</p>
+
+<ul>
+  <li>2 bytes header
+    <ul>
+      <li>2 bits for encoding type (3)</li>
+      <li>5 bits for encoded width (W) of deltas (0 to 64 bits) using the 5 bit
+width encoding table</li>
+      <li>9 bits for run length (L) (1 to 512 values)</li>
+    </ul>
+  </li>
+  <li>Base value - encoded as (signed or unsigned) varint</li>
+  <li>Delta base - encoded as signed varint</li>
+  <li>Delta values $W * (L - 2)$ bytes - encode each delta after the first
+one. If the delta base is positive, the sequence is increasing and if it is
+negative the sequence is decreasing.</li>
+</ul>
+
+<p>The unsigned sequence of [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] would be
+serialized with delta encoding (3), a width of 4 bits (3), length of
+10 (9), a base of 2 (2), and first delta of 1 (2). The resulting
+sequence is [0xc6, 0x09, 0x02, 0x02, 0x22, 0x42, 0x42, 0x46].</p>
+
+<h1 id="stripes">Stripes</h1>
+
+<p>The body of ORC files consists of a series of stripes. Stripes are
+large (typically ~200MB) and independent of each other and are often
+processed by different tasks. The defining characteristic for columnar
+storage formats is that the data for each column is stored separately
+and that reading data out of the file should be proportional to the
+number of columns read.</p>
+
+<p>In ORC files, each column is stored in several streams that are stored
+next to each other in the file. For example, an integer column is
+represented as two streams PRESENT, which uses one with a bit per
+value recording if the value is non-null, and DATA, which records the
+non-null values. If all of a column’s values in a stripe are non-null,
+the PRESENT stream is omitted from the stripe. For binary data, ORC
+uses three streams PRESENT, DATA, and LENGTH, which stores the length
+of each value. The details of each type will be presented in the
+following subsections.</p>
+
+<h2 id="stripe-footer">Stripe Footer</h2>
+
+<p>The stripe footer contains the encoding of each column and the
+directory of the streams including their location.</p>
+
+<p><code>message StripeFooter {
+ // the location of each stream
+ repeated Stream streams = 1;
+ // the encoding of each column
+ repeated ColumnEncoding columns = 2;
+}
+</code></p>
+
+<p>To describe each stream, ORC stores the kind of stream, the column id,
+and the stream’s size in bytes. The details of what is stored in each stream
+depends on the type and encoding of the column.</p>
+
+<p><code>message Stream {
+ enum Kind {
+ // boolean stream of whether the next value is non-null
+ PRESENT = 0;
+ // the primary data stream
+ DATA = 1;
+ // the length of each value for variable length data
+ LENGTH = 2;
+ // the dictionary blob
+ DICTIONARY\_DATA = 3;
+ // deprecated prior to Hive 0.11
+ // It was used to store the number of instances of each value in the
+ // dictionary
+ DICTIONARY_COUNT = 4;
+ // a secondary data stream
+ SECONDARY = 5;
+ // the index for seeking to particular row groups
+ ROW_INDEX = 6;
+ // original bloom filters used before ORC-101
+ BLOOM_FILTER = 7;
+ // bloom filters that consistently use utf8
+ BLOOM_FILTER_UTF8 = 8;
+ }
+ required Kind kind = 1;
+ // the column id
+ optional uint32 column = 2;
+ // the number of bytes in the file
+ optional uint64 length = 3;
+}
+</code></p>
+
+<p>Depending on their type several options for encoding are possible. The
+encodings are divided into direct or dictionary-based categories and
+further refined as to whether they use RLE v1 or v2.</p>
+
+<p><code>message ColumnEncoding {
+ enum Kind {
+ // the encoding is mapped directly to the stream using RLE v1
+ DIRECT = 0;
+ // the encoding uses a dictionary of unique values using RLE v1
+ DICTIONARY = 1;
+ // the encoding is direct using RLE v2
+ DIRECT\_V2 = 2;
+ // the encoding is dictionary-based using RLE v2
+ DICTIONARY\_V2 = 3;
+ }
+ required Kind kind = 1;
+ // for dictionary encodings, record the size of the dictionary
+ optional uint32 dictionarySize = 2;
+}
+</code></p>
+
+<h1 id="column-encodings">Column Encodings</h1>
+
+<h2 id="smallint-int-and-bigint-columns">SmallInt, Int, and BigInt Columns</h2>
+
+<p>All of the 16, 32, and 64 bit integer column types use the same set of
+potential encodings, which is basically whether they use RLE v1 or
+v2. If the PRESENT stream is not included, all of the values are
+present. For values that have false bits in the present stream, no
+values are included in the data stream.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="float-and-double-columns">Float and Double Columns</h2>
+
+<p>Floating point types are stored using IEEE 754 floating point bit
+layout. Float columns use 4 bytes per value and double columns use 8
+bytes.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">IEEE 754 floating point representation</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
+
+<p>String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).</p>
+
+<p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
+the length of each value is written into the LENGTH stream. In direct
+encoding, if the values were [“Nevada”, “California”]; the DATA
+would be “NevadaCalifornia” and the LENGTH would be [6, 10].</p>
+
+<p>For dictionary encodings the dictionary is sorted and UTF-8 bytes of
+each unique value are placed into DICTIONARY_DATA. The length of each
+item in the dictionary is put into the LENGTH stream. The DATA stream
+consists of the sequence of references to the dictionary elements.</p>
+
+<p>In dictionary encoding, if the values were [“Nevada”,
+“California”, “Nevada”, “California”, and “Florida”]; the
+DICTIONARY_DATA would be “CaliforniaFloridaNevada” and LENGTH would
+be [10, 7, 6]. The DATA would be [2, 0, 2, 0, 1].</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DICTIONARY</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DICTIONARY_DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DICTIONARY_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DICTIONARY_DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="boolean-columns">Boolean Columns</h2>
+
+<p>Boolean columns are rare, but have a simple encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="tinyint-columns">TinyInt Columns</h2>
+
+<p>TinyInt (byte) columns use byte run length encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="binary-columns">Binary Columns</h2>
+
+<p>Binary data is encoded with a PRESENT stream, a DATA stream that records
+the contents, and a LENGTH stream that records the number of bytes per a
+value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="decimal-columns">Decimal Columns</h2>
+
+<p>Decimal was introduced in Hive 0.11 with infinite precision (the total
+number of digits). In Hive 0.13, the definition was change to limit
+the precision to a maximum of 38 digits, which conveniently uses 127
+bits plus a sign bit. The current encoding of decimal columns stores
+the integer representation of the value as an unbounded length zigzag
+encoded base 128 varint. The scale is stored in the SECONDARY stream
+as an signed integer.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unbounded base 128 varints</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unbounded base 128 varints</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="date-columns">Date Columns</h2>
+
+<p>Date data is encoded with a PRESENT stream, a DATA stream that records
+the number of days after January 1, 1970 in UTC.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="timestamp-columns">Timestamp Columns</h2>
+
+<p>Timestamp records times down to nanoseconds as a PRESENT stream that
+records non-null values, a DATA stream that records the number of
+seconds after 1 January 2015, and a SECONDARY stream that records the
+number of nanoseconds.</p>
+
+<p>Because the number of nanoseconds often has a large number of trailing
+zeros, the number has trailing decimal zero digits removed and the
+last three bits are used to record how many zeros were removed. Thus
+1000 nanoseconds would be serialized as 0x0b and 100000 would be
+serialized as 0x0d.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="struct-columns">Struct Columns</h2>
+
+<p>Structs have no data themselves and delegate everything to their child
+columns except for their PRESENT stream. They have a child column
+for each of the fields.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="list-columns">List Columns</h2>
+
+<p>Lists are encoded as the PRESENT stream and a length stream with
+number of items in each list. They have a single child column for the
+element values.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="map-columns">Map Columns</h2>
+
+<p>Maps are encoded as the PRESENT stream and a length stream with number
+of items in each list. They have a child column for the key and
+another child column for the value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DIRECT_V2</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v2</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="union-columns">Union Columns</h2>
+
+<p>Unions are encoded as the PRESENT stream and a tag stream that controls which
+potential variant is used. They have a child column for each variant of the
+union. Currently ORC union types are limited to 256 variants, which matches
+the Hive type model.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h1 id="indexes">Indexes</h1>
+
+<h2 id="row-group-index">Row Group Index</h2>
+
+<p>The row group indexes consist of a ROW_INDEX stream for each primitive
+column that has an entry for each row group. Row groups are controlled
+by the writer and default to 10,000 rows. Each RowIndexEntry gives the
+position of each stream for the column and the statistics for that row
+group.</p>
+
+<p>The index streams are placed at the front of the stripe, because in
+the default case of streaming they do not need to be read. They are
+only loaded when either predicate push down is being used or the
+reader seeks to a particular row.</p>
+
+<p><code>message RowIndexEntry {
+ repeated uint64 positions = 1 [packed=true];
+ optional ColumnStatistics statistics = 2;
+}
+</code></p>
+
+<p><code>message RowIndex {
+ repeated RowIndexEntry entry = 1;
+}
+</code></p>
+
+<p>To record positions, each stream needs a sequence of numbers. For
+uncompressed streams, the position is the byte offset of the RLE run’s
+start location followed by the number of values that need to be
+consumed from the run. In compressed streams, the first number is the
+start of the compression chunk in the stream, followed by the number
+of decompressed bytes that need to be consumed, and finally the number
+of values consumed in the RLE.</p>
+
+<p>For columns with multiple streams, the sequences of positions in each
+stream are concatenated. That was an unfortunate decision on my part
+that we should fix at some point, because it makes code that uses the
+indexes error-prone.</p>
+
+<p>Because dictionaries are accessed randomly, there is not a position to
+record for the dictionary and the entire dictionary must be read even
+if only part of a stripe is being read.</p>
+
+<h2 id="bloom-filter-index">Bloom Filter Index</h2>
+
+<p>Bloom Filters are added to ORC indexes from Hive 1.2.0 onwards.
+Predicate pushdown can make use of bloom filters to better prune
+the row groups that do not satisfy the filter condition.
+The bloom filter indexes consist of a BLOOM_FILTER stream for each
+column specified through ‘orc.bloom.filter.columns’ table properties.
+A BLOOM_FILTER stream records a bloom filter entry for each row
+group (default to 10,000 rows) in a column. Only the row groups that
+satisfy min/max row index evaluation will be evaluated against the
+bloom filter index.</p>
+
+<p>Each BloomFilterEntry stores the number of hash functions (‘k’) used
+and the bitset backing the bloom filter. The original encoding (pre
+ORC-101) of bloom filters used the bitset field encoded as a repeating
+sequence of longs in the bitset field with a little endian encoding
+(0x1 is bit 0 and 0x2 is bit 1.) After ORC-101, the encoding is a
+sequence of bytes with a little endian encoding in the utf8bitset field.</p>
+
+<p><code>message BloomFilter {
+ optional uint32 numHashFunctions = 1;
+ repeated fixed64 bitset = 2;
+ optional bytes utf8bitset = 3;
+}
+</code></p>
+
+<p><code>message BloomFilterIndex {
+ repeated BloomFilter bloomFilter = 1;
+}
+</code></p>
+
+<p>Bloom filter internally uses two different hash functions to map a key
+to a position in the bit set. For tinyint, smallint, int, bigint, float
+and double types, Thomas Wang’s 64-bit integer hash function is used.
+Floats are converted to IEEE-754 32 bit representation
+(using Java’s Float.floatToIntBits(float)). Similary, Doubles are
+converted to IEEE-754 64 bit representation (using Java’s
+Double.doubleToLongBits(double)). All these primitive types
+are cast to long base type before being passed on to the hash function.
+For strings and binary types, Murmur3 64 bit hash algorithm is used.
+The 64 bit variant of Murmur3 considers only the most significant
+8 bytes of Murmur3 128-bit algorithm. The 64 bit hashcode generated
+from the above algorithms is used as a base to derive ‘k’ different
+hash functions. We use the idea mentioned in the paper “Less Hashing,
+Same Performance: Building a Better Bloom Filter” by Kirsch et. al. to
+quickly compute the k hashcodes.</p>
+
+<p>The algorithm for computing k hashcodes and setting the bit position
+in a bloom filter is as follows:</p>
+
+<ol>
+  <li>Get 64 bit base hash code from Murmur3 or Thomas Wang’s hash algorithm.</li>
+  <li>Split the above hashcode into two 32-bit hashcodes (say hash1 and hash2).</li>
+  <li>k’th hashcode is obtained by (where k &gt; 0):
+    <ul>
+      <li>combinedHash = hash1 + (k * hash2)</li>
+    </ul>
+  </li>
+  <li>If combinedHash is negative flip all the bits:
+    <ul>
+      <li>combinedHash = ~combinedHash</li>
+    </ul>
+  </li>
+  <li>Bit set position is obtained by performing modulo with m:
+    <ul>
+      <li>position = combinedHash % m</li>
+    </ul>
+  </li>
+  <li>Set the position in bit set. The LSB 6 bits identifies the long index
+within bitset and bit position within the long uses little endian order.
+    <ul>
+      <li>bitset[position »&gt; 6] |= (1L « position);</li>
+    </ul>
+  </li>
+</ol>
+
+<p>Bloom filter streams are interlaced with row group indexes. This placement
+makes it convenient to read the bloom filter stream and row index stream
+together in single read operation.</p>
+
+<p><img src="/img/BloomFilter.png" alt="bloom filter" /></p>
+
+      </article>
+    </div>
+
+    <div class="clear"></div>
+
+  </div>
+</section>
+
+
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2018
+     <a href="https://www.apache.org/">Apache Software Foundation</a>
+     under the terms of the <a
+      href="https://www.apache.org/licenses/LICENSE-2.0.html">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+</footer>
+
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+
+      if (typeof header.id !== "undefined" && header.id !== "") {
+        header.appendChild(anchorForId(header.id));
+      }
+    }
+  };
+
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
+</script>
+
+
+</body>
+</html>


[4/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/spec-index.html
----------------------------------------------------------------------
diff --git a/docs/spec-index.html b/docs/spec-index.html
deleted file mode 100644
index 25ba64d..0000000
--- a/docs/spec-index.html
+++ /dev/null
@@ -1,2298 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Indexes</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Indexes</h1>
-          <h1 id="row-group-index">Row Group Index</h1>
-
-<p>The row group indexes consist of a ROW_INDEX stream for each primitive
-column that has an entry for each row group. Row groups are controlled
-by the writer and default to 10,000 rows. Each RowIndexEntry gives the
-position of each stream for the column and the statistics for that row
-group.</p>
-
-<p>The index streams are placed at the front of the stripe, because in
-the default case of streaming they do not need to be read. They are
-only loaded when either predicate push down is being used or the
-reader seeks to a particular row.</p>
-
-<p><code>message RowIndexEntry {
- repeated uint64 positions = 1 [packed=true];
- optional ColumnStatistics statistics = 2;
-}
-</code></p>
-
-<p><code>message RowIndex {
- repeated RowIndexEntry entry = 1;
-}
-</code></p>
-
-<p>To record positions, each stream needs a sequence of numbers. For
-uncompressed streams, the position is the byte offset of the RLE run’s
-start location followed by the number of values that need to be
-consumed from the run. In compressed streams, the first number is the
-start of the compression chunk in the stream, followed by the number
-of decompressed bytes that need to be consumed, and finally the number
-of values consumed in the RLE.</p>
-
-<p>For columns with multiple streams, the sequences of positions in each
-stream are concatenated. That was an unfortunate decision on my part
-that we should fix at some point, because it makes code that uses the
-indexes error-prone.</p>
-
-<p>Because dictionaries are accessed randomly, there is not a position to
-record for the dictionary and the entire dictionary must be read even
-if only part of a stripe is being read.</p>
-
-<h1 id="bloom-filter-index">Bloom Filter Index</h1>
-
-<p>Bloom Filters are added to ORC indexes from Hive 1.2.0 onwards.
-Predicate pushdown can make use of bloom filters to better prune
-the row groups that do not satisfy the filter condition.
-The bloom filter indexes consist of a BLOOM_FILTER stream for each
-column specified through ‘orc.bloom.filter.columns’ table properties.
-A BLOOM_FILTER stream records a bloom filter entry for each row
-group (default to 10,000 rows) in a column. Only the row groups that
-satisfy min/max row index evaluation will be evaluated against the
-bloom filter index.</p>
-
-<p>Each BloomFilterEntry stores the number of hash functions (‘k’) used
-and the bitset backing the bloom filter. The original encoding (pre
-ORC-101) of bloom filters used the bitset field encoded as a repeating
-sequence of longs in the bitset field with a little endian encoding
-(0x1 is bit 0 and 0x2 is bit 1.) After ORC-101, the encoding is a
-sequence of bytes with a little endian encoding in the utf8bitset field.</p>
-
-<p><code>message BloomFilter {
- optional uint32 numHashFunctions = 1;
- repeated fixed64 bitset = 2;
- optional bytes utf8bitset = 3;
-}
-</code></p>
-
-<p><code>message BloomFilterIndex {
- repeated BloomFilter bloomFilter = 1;
-}
-</code></p>
-
-<p>Bloom filter internally uses two different hash functions to map a key
-to a position in the bit set. For tinyint, smallint, int, bigint, float
-and double types, Thomas Wang’s 64-bit integer hash function is used.
-Floats are converted to IEEE-754 32 bit representation
-(using Java’s Float.floatToIntBits(float)). Similary, Doubles are
-converted to IEEE-754 64 bit representation (using Java’s
-Double.doubleToLongBits(double)). All these primitive types
-are cast to long base type before being passed on to the hash function.
-For strings and binary types, Murmur3 64 bit hash algorithm is used.
-The 64 bit variant of Murmur3 considers only the most significant
-8 bytes of Murmur3 128-bit algorithm. The 64 bit hashcode generated
-from the above algorithms is used as a base to derive ‘k’ different
-hash functions. We use the idea mentioned in the paper “Less Hashing,
-Same Performance: Building a Better Bloom Filter” by Kirsch et. al. to
-quickly compute the k hashcodes.</p>
-
-<p>The algorithm for computing k hashcodes and setting the bit position
-in a bloom filter is as follows:</p>
-
-<ol>
-  <li>Get 64 bit base hash code from Murmur3 or Thomas Wang’s hash algorithm.</li>
-  <li>Split the above hashcode into two 32-bit hashcodes (say hash1 and hash2).</li>
-  <li>k’th hashcode is obtained by (where k &gt; 0):
-    <ul>
-      <li>combinedHash = hash1 + (k * hash2)</li>
-    </ul>
-  </li>
-  <li>If combinedHash is negative flip all the bits:
-    <ul>
-      <li>combinedHash = ~combinedHash</li>
-    </ul>
-  </li>
-  <li>Bit set position is obtained by performing modulo with m:
-    <ul>
-      <li>position = combinedHash % m</li>
-    </ul>
-  </li>
-  <li>Set the position in bit set. The LSB 6 bits identifies the long index
-within bitset and bit position within the long uses little endian order.
-    <ul>
-      <li>bitset[position »&gt; 6] |= (1L « position);</li>
-    </ul>
-  </li>
-</ol>
-
-<p>Bloom filter streams are interlaced with row group indexes. This placement
-makes it convenient to read the bloom filter stream and row index stream
-together in single read operation.</p>
-
-<p><img src="/img/BloomFilter.png" alt="bloom filter" /></p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/encodings.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            <span class="next disabled">Next</span>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/spec-intro.html
----------------------------------------------------------------------
diff --git a/docs/spec-intro.html b/docs/spec-intro.html
deleted file mode 100644
index 3468dd0..0000000
--- a/docs/spec-intro.html
+++ /dev/null
@@ -1,2180 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Introduction</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Introduction</h1>
-          <p>Hive’s RCFile was the standard format for storing tabular data in
-Hadoop for several years. However, RCFile has limitations because it
-treats each column as a binary blob without semantics. In Hive 0.11 we
-added a new file format named Optimized Row Columnar (ORC) file that
-uses and retains the type information from the table definition. ORC
-uses type specific readers and writers that provide light weight
-compression techniques such as dictionary encoding, bit packing, delta
-encoding, and run length encoding – resulting in dramatically smaller
-files. Additionally, ORC can apply generic compression using zlib, or
-Snappy on top of the lightweight compression for even smaller
-files. However, storage savings are only part of the gain. ORC
-supports projection, which selects subsets of the columns for reading,
-so that queries reading only one column read only the required
-bytes. Furthermore, ORC files include light weight indexes that
-include the minimum and maximum values for each column in each set of
-10,000 rows and the entire file. Using pushdown filters from Hive, the
-file reader can skip entire sets of rows that aren’t important for
-this query.</p>
-
-<p><img src="/img/OrcFileLayout.png" alt="ORC file structure" /></p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/java-tools.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/file-tail.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/stripes.html
----------------------------------------------------------------------
diff --git a/docs/stripes.html b/docs/stripes.html
deleted file mode 100644
index 401c0d9..0000000
--- a/docs/stripes.html
+++ /dev/null
@@ -1,2257 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Stripes</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Stripes</h1>
-          <p>The body of ORC files consists of a series of stripes. Stripes are
-large (typically ~200MB) and independent of each other and are often
-processed by different tasks. The defining characteristic for columnar
-storage formats is that the data for each column is stored separately
-and that reading data out of the file should be proportional to the
-number of columns read.</p>
-
-<p>In ORC files, each column is stored in several streams that are stored
-next to each other in the file. For example, an integer column is
-represented as two streams PRESENT, which uses one with a bit per
-value recording if the value is non-null, and DATA, which records the
-non-null values. If all of a column’s values in a stripe are non-null,
-the PRESENT stream is omitted from the stripe. For binary data, ORC
-uses three streams PRESENT, DATA, and LENGTH, which stores the length
-of each value. The details of each type will be presented in the
-following subsections.</p>
-
-<h1 id="stripe-footer">Stripe Footer</h1>
-
-<p>The stripe footer contains the encoding of each column and the
-directory of the streams including their location.</p>
-
-<p><code>message StripeFooter {
- // the location of each stream
- repeated Stream streams = 1;
- // the encoding of each column
- repeated ColumnEncoding columns = 2;
-}
-</code></p>
-
-<p>To describe each stream, ORC stores the kind of stream, the column id,
-and the stream’s size in bytes. The details of what is stored in each stream
-depends on the type and encoding of the column.</p>
-
-<p><code>message Stream {
- enum Kind {
- // boolean stream of whether the next value is non-null
- PRESENT = 0;
- // the primary data stream
- DATA = 1;
- // the length of each value for variable length data
- LENGTH = 2;
- // the dictionary blob
- DICTIONARY\_DATA = 3;
- // deprecated prior to Hive 0.11
- // It was used to store the number of instances of each value in the
- // dictionary
- DICTIONARY_COUNT = 4;
- // a secondary data stream
- SECONDARY = 5;
- // the index for seeking to particular row groups
- ROW_INDEX = 6;
- // original bloom filters used before ORC-101
- BLOOM_FILTER = 7;
- // bloom filters that consistently use utf8
- BLOOM_FILTER_UTF8 = 8;
- }
- required Kind kind = 1;
- // the column id
- optional uint32 column = 2;
- // the number of bytes in the file
- optional uint64 length = 3;
-}
-</code></p>
-
-<p>Depending on their type several options for encoding are possible. The
-encodings are divided into direct or dictionary-based categories and
-further refined as to whether they use RLE v1 or v2.</p>
-
-<p><code>message ColumnEncoding {
- enum Kind {
- // the encoding is mapped directly to the stream using RLE v1
- DIRECT = 0;
- // the encoding uses a dictionary of unique values using RLE v1
- DICTIONARY = 1;
- // the encoding is direct using RLE v2
- DIRECT\_V2 = 2;
- // the encoding is dictionary-based using RLE v2
- DICTIONARY\_V2 = 3;
- }
- required Kind kind = 1;
- // for dictionary encodings, record the size of the dictionary
- optional uint32 dictionarySize = 2;
-}
-</code></p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/run-length.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/encodings.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/types.html
----------------------------------------------------------------------
diff --git a/docs/types.html b/docs/types.html
index dda60a4..149fa88 100644
--- a/docs/types.html
+++ b/docs/types.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,726 +663,135 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
+      <div class="unit four-fifths">
+        <article>
+          <h1>Types</h1>
+          <p>ORC files are completely self-describing and do not depend on the Hive
+Metastore or any other external metadata. The file includes all of the
+type and encoding information for the objects stored in the file. Because the
+file is self-contained, it does not depend on the user’s environment to
+correctly interpret the file’s contents.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Types</h1>
-          <p>ORC files are completely self-describing and do not depend on the Hive
-Metastore or any other external metadata. The file includes all of the
-type and encoding information for the objects stored in the file. Because the
-file is self-contained, it does not depend on the user’s environment to
-correctly interpret the file’s contents.</p>
-
-<p>ORC provides a rich set of scalar and compound types:</p>
-
-<ul>
-  <li>Integer
-    <ul>
-      <li>boolean (1 bit)</li>
-      <li>tinyint (8 bit)</li>
-      <li>smallint (16 bit)</li>
-      <li>int (32 bit)</li>
-      <li>bigint (64 bit)</li>
-    </ul>
-  </li>
-  <li>Floating point
-    <ul>
-      <li>float</li>
-      <li>double</li>
-    </ul>
-  </li>
-  <li>String types
-    <ul>
-      <li>string</li>
-      <li>char</li>
-      <li>varchar</li>
-    </ul>
-  </li>
-  <li>Binary blobs
-    <ul>
-      <li>binary</li>
-    </ul>
-  </li>
-  <li>Date/time
-    <ul>
-      <li>timestamp</li>
-      <li>date</li>
-    </ul>
-  </li>
-  <li>Compound types
-    <ul>
-      <li>struct</li>
-      <li>list</li>
-      <li>map</li>
-      <li>union</li>
-    </ul>
-  </li>
-</ul>
-
-<p>All ORC file are logically sequences of identically typed objects. Hive
-always uses a struct with a field for each of the top-level columns as
-the root object type, but that is not required. All types in ORC can take
-null values including the compound types.</p>
-
-<p>Compound types have children columns that hold the values for their
-sub-elements. For example, a struct column has one child column for
-each field of the struct. Lists always have a single child column for
-the element values and maps always have two child columns. Union
-columns have one child column for each of the variants.</p>
-
-<p>Given the following definition of the table Foobar, the columns in the
-file would form the given tree.</p>
-
-<p><code>create table Foobar (
- myInt int,
- myMap map&lt;string,
- struct&lt;myString : string,
- myDouble: double&gt;&gt;,
- myTime timestamp
-);
-</code></p>
-
-<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
-
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/adopters.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/indexes.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
+<p>ORC provides a rich set of scalar and compound types:</p>
 
 <ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
+  <li>Integer
+    <ul>
+      <li>boolean (1 bit)</li>
+      <li>tinyint (8 bit)</li>
+      <li>smallint (16 bit)</li>
+      <li>int (32 bit)</li>
+      <li>bigint (64 bit)</li>
+    </ul>
+  </li>
+  <li>Floating point
+    <ul>
+      <li>float</li>
+      <li>double</li>
+    </ul>
+  </li>
+  <li>String types
+    <ul>
+      <li>string</li>
+      <li>char</li>
+      <li>varchar</li>
+    </ul>
+  </li>
+  <li>Binary blobs
+    <ul>
+      <li>binary</li>
+    </ul>
+  </li>
+  <li>Date/time
+    <ul>
+      <li>timestamp</li>
+      <li>date</li>
+    </ul>
+  </li>
+  <li>Compound types
+    <ul>
+      <li>struct</li>
+      <li>list</li>
+      <li>map</li>
+      <li>union</li>
+    </ul>
+  </li>
 </ul>
 
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
+<p>All ORC file are logically sequences of identically typed objects. Hive
+always uses a struct with a field for each of the top-level columns as
+the root object type, but that is not required. All types in ORC can take
+null values including the compound types.</p>
+
+<p>Compound types have children columns that hold the values for their
+sub-elements. For example, a struct column has one child column for
+each field of the struct. Lists always have a single child column for
+the element values and maps always have two child columns. Union
+columns have one child column for each of the variants.</p>
+
+<p>Given the following definition of the table Foobar, the columns in the
+file would form the given tree.</p>
+
+<p><code>create table Foobar (
+ myInt int,
+ myMap map&lt;string,
+ struct&lt;myString : string,
+ myDouble: double&gt;&gt;,
+ myTime timestamp
+);
+</code></p>
+
+<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
+
+
+          
+
+
+
 
-  
-    
-  
 
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/adopters.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/indexes.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1613,11 +820,7 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1631,34 +834,10 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1695,7 +874,7 @@ file would form the given tree.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class="current"><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1725,49 +904,7 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1779,22 +916,14 @@ file would form the given tree.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1811,15 +940,7 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1857,14 +978,14 @@ file would form the given tree.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1891,31 +1012,7 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1939,31 +1036,17 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -1995,19 +1078,7 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2043,13 +1114,25 @@ file would form the given tree.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2059,7 +1142,7 @@ file would form the given tree.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2077,17 +1160,17 @@ file would form the given tree.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2107,11 +1190,17 @@ file would form the given tree.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2133,7 +1222,7 @@ file would form the given tree.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 


[5/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/mapred.html
----------------------------------------------------------------------
diff --git a/docs/mapred.html b/docs/mapred.html
index f0ab622..ab932db 100644
--- a/docs/mapred.html
+++ b/docs/mapred.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,386 +663,21 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Using in MapRed</h1>
-          <p>This page describes how to read and write ORC files from Hadoop’s
-older org.apache.hadoop.mapred MapReduce APIs. If you want to use the
-new org.apache.hadoop.mapreduce API, please look at the <a href="/docs/mapreduce.html">next
-page</a>.</p>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Using in MapRed</h1>
+          <p>This page describes how to read and write ORC files from Hadoop’s
+older org.apache.hadoop.mapred MapReduce APIs. If you want to use the
+new org.apache.hadoop.mapreduce API, please look at the <a href="/docs/mapreduce.html">next
+page</a>.</p>
 
 <h2 id="reading-orc-files">Reading ORC files</h2>
 
@@ -1506,282 +939,56 @@ OrcKey.key and OrcValue.value fields.</p>
 
   
   
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/hive-config.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/mapreduce.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
+  
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/hive-config.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/mapreduce.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1810,11 +1017,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1828,34 +1031,10 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1892,7 +1071,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class="current"><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1922,49 +1101,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1976,22 +1113,14 @@ OrcKey.key and OrcValue.value fields.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -2008,15 +1137,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2054,14 +1175,14 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2088,31 +1209,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2136,31 +1233,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2192,19 +1275,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class="current"><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2240,13 +1311,25 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2256,7 +1339,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2274,17 +1357,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2304,11 +1387,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2330,7 +1419,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/mapreduce.html
----------------------------------------------------------------------
diff --git a/docs/mapreduce.html b/docs/mapreduce.html
index 2423f01..63fcd9c 100644
--- a/docs/mapreduce.html
+++ b/docs/mapreduce.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,386 +663,21 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Using in MapReduce</h1>
-          <p>This page describes how to read and write ORC files from Hadoop’s
-newer org.apache.hadoop.mapreduce MapReduce APIs. If you want to use the
-older org.apache.hadoop.mapred API, please look at the <a href="/docs/mapred.html">previous
-page</a>.</p>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Using in MapReduce</h1>
+          <p>This page describes how to read and write ORC files from Hadoop’s
+newer org.apache.hadoop.mapreduce MapReduce APIs. If you want to use the
+older org.apache.hadoop.mapred API, please look at the <a href="/docs/mapred.html">previous
+page</a>.</p>
 
 <h2 id="reading-orc-files">Reading ORC files</h2>
 
@@ -1483,289 +916,63 @@ OrcKey.key and OrcValue.value fields.</p>
 
 
   
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/mapred.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/core-java.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
+  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
+  
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/mapred.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/core-java.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1794,11 +1001,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1812,34 +1015,10 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1876,7 +1055,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1906,49 +1085,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1960,22 +1097,14 @@ OrcKey.key and OrcValue.value fields.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1992,15 +1121,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2038,14 +1159,14 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2072,31 +1193,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2120,31 +1217,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2176,19 +1259,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2224,13 +1295,25 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
+      <li class="current"><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2240,7 +1323,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2258,17 +1341,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2288,11 +1371,17 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2314,7 +1403,7 @@ OrcKey.key and OrcValue.value fields.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/releases.html
----------------------------------------------------------------------
diff --git a/docs/releases.html b/docs/releases.html
index 8a2406f..3b96cec 100644
--- a/docs/releases.html
+++ b/docs/releases.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,384 +663,19 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Releases</h1>
-          
-<h2 id="current-release---143">Current Release - 1.4.3:</h2>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Releases</h1>
+          
+<h2 id="current-release---143">Current Release - 1.4.3:</h2>
 
 <p>ORC 1.4.3 contains both the Java reader and writer and the C++
 reader for ORC files. It also contains tools for working with ORC
@@ -1483,273 +916,47 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
 
   
   
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/building.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/hive-ddl.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/building.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/hive-ddl.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class="current"><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1778,11 +985,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1796,34 +999,10 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1860,7 +1039,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1890,49 +1069,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1944,22 +1081,14 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1976,15 +1105,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2022,14 +1143,14 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class="current"><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2056,31 +1177,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2104,31 +1201,17 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2160,19 +1243,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2208,13 +1279,25 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2224,7 +1307,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2242,17 +1325,17 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2272,11 +1355,17 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2298,7 +1387,7 @@ committers’ <a href="https://dist.apache.org/repos/dist/release/orc/KEYS">key
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/run-length.html
----------------------------------------------------------------------
diff --git a/docs/run-length.html b/docs/run-length.html
deleted file mode 100644
index 5ca06d6..0000000
--- a/docs/run-length.html
+++ /dev/null
@@ -1,2566 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Run Length Encoding</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Run Length Encoding</h1>
-          <h1 id="base-128-varint">Base 128 Varint</h1>
-
-<p>Variable width integer encodings take advantage of the fact that most
-numbers are small and that having smaller encodings for small numbers
-shrinks the overall size of the data. ORC uses the varint format from
-Protocol Buffers, which writes data in little endian format using the
-low 7 bits of each byte. The high bit in each byte is set if the
-number continues into the next byte.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Unsigned Original</th>
-      <th style="text-align: left">Serialized</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">0</td>
-      <td style="text-align: left">0x00</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">1</td>
-      <td style="text-align: left">0x01</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">127</td>
-      <td style="text-align: left">0x7f</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">128</td>
-      <td style="text-align: left">0x80, 0x01</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">129</td>
-      <td style="text-align: left">0x81, 0x01</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">16,383</td>
-      <td style="text-align: left">0xff, 0x7f</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">16,384</td>
-      <td style="text-align: left">0x80, 0x80, 0x01</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">16,385</td>
-      <td style="text-align: left">0x81, 0x80, 0x01</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>For signed integer types, the number is converted into an unsigned
-number using a zigzag encoding. Zigzag encoding moves the sign bit to
-the least significant bit using the expression (val « 1) ^ (val »
-63) and derives its name from the fact that positive and negative
-numbers alternate once encoded. The unsigned number is then serialized
-as above.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Signed Original</th>
-      <th style="text-align: left">Unsigned</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">0</td>
-      <td style="text-align: left">0</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">-1</td>
-      <td style="text-align: left">1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">1</td>
-      <td style="text-align: left">2</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">-2</td>
-      <td style="text-align: left">3</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">2</td>
-      <td style="text-align: left">4</td>
-    </tr>
-  </tbody>
-</table>
-
-<h1 id="byte-run-length-encoding">Byte Run Length Encoding</h1>
-
-<p>For byte streams, ORC uses a very light weight encoding of identical
-values.</p>
-
-<ul>
-  <li>Run - a sequence of at least 3 identical values</li>
-  <li>Literals - a sequence of non-identical values</li>
-</ul>
-
-<p>The first byte of each group of values is a header than determines
-whether it is a run (value between 0 to 127) or literal list (value
-between -128 to -1). For runs, the control byte is the length of the
-run minus the length of the minimal run (3) and the control byte for
-literal lists is the negative length of the list. For example, a
-hundred 0’s is encoded as [0x61, 0x00] and the sequence 0x44, 0x45
-would be encoded as [0xfe, 0x44, 0x45]. The next group can choose
-either of the encodings.</p>
-
-<h1 id="boolean-run-length-encoding">Boolean Run Length Encoding</h1>
-
-<p>For encoding boolean types, the bits are put in the bytes from most
-significant to least significant. The bytes are encoded using byte run
-length encoding as described in the previous section. For example,
-the byte sequence [0xff, 0x80] would be one true followed by
-seven false values.</p>
-
-<h1 id="integer-run-length-encoding-version-1">Integer Run Length Encoding, version 1</h1>
-
-<p>In Hive 0.11 ORC files used Run Length Encoding version 1 (RLEv1),
-which provides a lightweight compression of signed or unsigned integer
-sequences. RLEv1 has two sub-encodings:</p>
-
-<ul>
-  <li>Run - a sequence of values that differ by a small fixed delta</li>
-  <li>Literals - a sequence of varint encoded values</li>
-</ul>
-
-<p>Runs start with an initial byte of 0x00 to 0x7f, which encodes the
-length of the run - 3. A second byte provides the fixed delta in the
-range of -128 to 127. Finally, the first value of the run is encoded
-as a base 128 varint.</p>
-
-<p>For example, if the sequence is 100 instances of 7 the encoding would
-start with 100 - 3, followed by a delta of 0, and a varint of 7 for
-an encoding of [0x61, 0x00, 0x07]. To encode the sequence of numbers
-running from 100 to 1, the first byte is 100 - 3, the delta is -1,
-and the varint is 100 for an encoding of [0x61, 0xff, 0x64].</p>
-
-<p>Literals start with an initial byte of 0x80 to 0xff, which corresponds
-to the negative of number of literals in the sequence. Following the
-header byte, the list of N varints is encoded. Thus, if there are
-no runs, the overhead is 1 byte for each 128 integers. The first 5
-prime numbers [2, 3, 4, 7, 11] would encoded as [0xfb, 0x02, 0x03,
-0x04, 0x07, 0xb].</p>
-
-<h1 id="integer-run-length-encoding-version-2">Integer Run Length Encoding, version 2</h1>
-
-<p>In Hive 0.12, ORC introduced Run Length Encoding version 2 (RLEv2),
-which has improved compression and fixed bit width encodings for
-faster expansion. RLEv2 uses four sub-encodings based on the data:</p>
-
-<ul>
-  <li>Short Repeat - used for short sequences with repeated values</li>
-  <li>Direct - used for random sequences with a fixed bit width</li>
-  <li>Patched Base - used for random sequences with a variable bit width</li>
-  <li>Delta - used for monotonically increasing or decreasing sequences</li>
-</ul>
-
-<h2 id="short-repeat">Short Repeat</h2>
-
-<p>The short repeat encoding is used for short repeating integer
-sequences with the goal of minimizing the overhead of the header. All
-of the bits listed in the header are from the first byte to the last
-and from most significant bit to least significant bit. If the type is
-signed, the value is zigzag encoded.</p>
-
-<ul>
-  <li>1 byte header
-    <ul>
-      <li>2 bits for encoding type (0)</li>
-      <li>3 bits for width (W) of repeating value (1 to 8 bytes)</li>
-      <li>3 bits for repeat count (3 to 10 values)</li>
-    </ul>
-  </li>
-  <li>W bytes in big endian format, which is zigzag encoded if they type
-is signed</li>
-</ul>
-
-<p>The unsigned sequence of [10000, 10000, 10000, 10000, 10000] would be
-serialized with short repeat encoding (0), a width of 2 bytes (1), and
-repeat count of 5 (2) as [0x0a, 0x27, 0x10].</p>
-
-<h2 id="direct">Direct</h2>
-
-<p>The direct encoding is used for integer sequences whose values have a
-relatively constant bit width. It encodes the values directly using a
-fixed width big endian encoding. The width of the values is encoded
-using the table below.</p>
-
-<p>The 5 bit width encoding table for RLEv2:</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Width in Bits</th>
-      <th style="text-align: left">Encoded Value</th>
-      <th style="text-align: left">Notes</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">0</td>
-      <td style="text-align: left">0</td>
-      <td style="text-align: left">for delta encoding</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">1</td>
-      <td style="text-align: left">0</td>
-      <td style="text-align: left">for non-delta encoding</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">2</td>
-      <td style="text-align: left">1</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">4</td>
-      <td style="text-align: left">3</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">8</td>
-      <td style="text-align: left">7</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">16</td>
-      <td style="text-align: left">15</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">24</td>
-      <td style="text-align: left">23</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">32</td>
-      <td style="text-align: left">27</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">40</td>
-      <td style="text-align: left">28</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">48</td>
-      <td style="text-align: left">29</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">56</td>
-      <td style="text-align: left">30</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">64</td>
-      <td style="text-align: left">31</td>
-      <td style="text-align: left"> </td>
-    </tr>
-    <tr>
-      <td style="text-align: left">3</td>
-      <td style="text-align: left">2</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">5 &lt;= x &lt;= 7</td>
-      <td style="text-align: left">x - 1</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">9 &lt;= x &lt;= 15</td>
-      <td style="text-align: left">x - 1</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">17 &lt;= x &lt;= 21</td>
-      <td style="text-align: left">x - 1</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">26</td>
-      <td style="text-align: left">24</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">28</td>
-      <td style="text-align: left">25</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">30</td>
-      <td style="text-align: left">26</td>
-      <td style="text-align: left">deprecated</td>
-    </tr>
-  </tbody>
-</table>
-
-<ul>
-  <li>2 bytes header
-    <ul>
-      <li>2 bits for encoding type (1)</li>
-      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
-width encoding table</li>
-      <li>9 bits for length (L) (1 to 512 values)</li>
-    </ul>
-  </li>
-  <li>W * L bits (padded to the next byte) encoded in big endian format, which is
-zigzag encoding if the type is signed</li>
-</ul>
-
-<p>The unsigned sequence of [23713, 43806, 57005, 48879] would be
-serialized with direct encoding (1), a width of 16 bits (15), and
-length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
-0xbe, 0xef].</p>
-
-<h2 id="patched-base">Patched Base</h2>
-
-<p>The patched base encoding is used for integer sequences whose bit
-widths varies a lot. The minimum signed value of the sequence is found
-and subtracted from the other values. The bit width of those adjusted
-values is analyzed and the 90 percentile of the bit width is chosen
-as W. The 10\% of values larger than W use patches from a patch list
-to set the additional bits. Patches are encoded as a list of gaps in
-the index values and the additional value bits.</p>
-
-<ul>
-  <li>4 bytes header
-    <ul>
-      <li>2 bits for encoding type (2)</li>
-      <li>5 bits for encoded width (W) of values (1 to 64 bits) using the 5 bit
-  width encoding table</li>
-      <li>9 bits for length (L) (1 to 512 values)</li>
-      <li>3 bits for base value width (BW) (1 to 8 bytes)</li>
-      <li>5 bits for patch width (PW) (1 to 64 bits) using  the 5 bit width
-encoding table</li>
-      <li>3 bits for patch gap width (PGW) (1 to 8 bits)</li>
-      <li>5 bits for patch list length (PLL) (0 to 31 patches)</li>
-    </ul>
-  </li>
-  <li>Base value (BW bytes) - The base value is stored as a big endian value
-with negative values marked by the most significant bit set. If it that
-bit is set, the entire value is negated.</li>
-  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
-values that are added to the base value.</li>
-  <li>Data values (W * L bits padded to the byte) - A sequence of W bit positive
-values that are added to the base value.</li>
-  <li>Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
-that didn’t fit within W bits. Each entry in the list consists of a
-gap, which is the number of elements skipped from the previous
-patch, and a patch value. Patches are applied by logically or’ing
-the data values with the relevant patch shifted W bits left. If a
-patch is 0, it was introduced to skip over more than 255 items. The
-combined length of each patch (PGW + PW) must be less or equal to
-64.</li>
-</ul>
-
-<p>The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
-2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
-has a minimum of 2000, which makes the adjusted
-sequence [30, 0, 20, 998000, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
-150, 160, 170, 180, 190]. It has an
-encoding of patched base (2), a bit width of 8 (7), a length of 20
-(19), a base value width of 2 bytes (1), a patch width of 12 bits (11),
-patch gap width of 2 bits (1), and a patch list length of 1 (1). The
-base value is 2000 and the combined result is [0x8e, 0x13, 0x2b, 0x21, 0x07,
-0xd0, 0x1e, 0x00, 0x14, 0x70, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e,
-0x78, 0x82, 0x8c, 0x96, 0xa0, 0xaa, 0xb4, 0xbe, 0xfc, 0xe8]</p>
-
-<h2 id="delta">Delta</h2>
-
-<p>The Delta encoding is used for monotonically increasing or decreasing
-sequences. The first two numbers in the sequence can not be identical,
-because the encoding is using the sign of the first delta to determine
-if the series is increasing or decreasing.</p>
-
-<ul>
-  <li>2 bytes header
-    <ul>
-      <li>2 bits for encoding type (3)</li>
-      <li>5 bits for encoded width (W) of deltas (0 to 64 bits) using the 5 bit
-width encoding table</li>
-      <li>9 bits for run length (L) (1 to 512 values)</li>
-    </ul>
-  </li>
-  <li>Base value - encoded as (signed or unsigned) varint</li>
-  <li>Delta base - encoded as signed varint</li>
-  <li>Delta values $W * (L - 2)$ bytes - encode each delta after the first
-one. If the delta base is positive, the sequence is increasing and if it is
-negative the sequence is decreasing.</li>
-</ul>
-
-<p>The unsigned sequence of [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] would be
-serialized with delta encoding (3), a width of 4 bits (3), length of
-10 (9), a base of 2 (2), and first delta of 1 (2). The resulting
-sequence is [0xc6, 0x09, 0x02, 0x02, 0x22, 0x42, 0x42, 0x46].</p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/compression.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/stripes.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>


[9/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
Pushing ORC-339 reorganize the ORC file format spec.

Signed-off-by: Owen O'Malley <om...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/orc/repo
Commit: http://git-wip-us.apache.org/repos/asf/orc/commit/c6e29090
Tree: http://git-wip-us.apache.org/repos/asf/orc/tree/c6e29090
Diff: http://git-wip-us.apache.org/repos/asf/orc/diff/c6e29090

Branch: refs/heads/asf-site
Commit: c6e2909025381446398961f4ac1da61550cd13b5
Parents: c63412b
Author: Owen O'Malley <om...@apache.org>
Authored: Tue Apr 17 10:49:12 2018 -0700
Committer: Owen O'Malley <om...@apache.org>
Committed: Tue Apr 17 10:49:12 2018 -0700

----------------------------------------------------------------------
 develop/index.html       |    3 +
 docs/acid.html           | 1073 ++--------------
 docs/adopters.html       | 1185 ++---------------
 docs/building.html       | 1071 ++--------------
 docs/compression.html    | 2193 --------------------------------
 docs/core-cpp.html       | 1429 ++++-----------------
 docs/core-java.html      | 1083 ++--------------
 docs/cpp-tools.html      | 1523 +++++-----------------
 docs/encodings.html      | 2790 -----------------------------------------
 docs/file-tail.html      | 2477 ------------------------------------
 docs/hive-config.html    | 1075 ++--------------
 docs/hive-ddl.html       | 1145 ++---------------
 docs/index.html          | 1329 +++-----------------
 docs/indexes.html        | 1133 ++---------------
 docs/java-tools.html     | 1549 +++++------------------
 docs/mapred.html         | 1083 ++--------------
 docs/mapreduce.html      | 1085 ++--------------
 docs/releases.html       | 1071 ++--------------
 docs/run-length.html     | 2566 -------------------------------------
 docs/spec-index.html     | 2298 ---------------------------------
 docs/spec-intro.html     | 2180 --------------------------------
 docs/stripes.html        | 2257 ---------------------------------
 docs/types.html          | 1215 +++---------------
 specification/ORCv0.html | 1260 +++++++++++++++++++
 specification/ORCv1.html | 1744 ++++++++++++++++++++++++++
 specification/ORCv2.html | 1769 ++++++++++++++++++++++++++
 specification/index.html |  159 +++
 27 files changed, 7126 insertions(+), 32619 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/develop/index.html
----------------------------------------------------------------------
diff --git a/develop/index.html b/develop/index.html
index e920320..d9224d0 100644
--- a/develop/index.html
+++ b/develop/index.html
@@ -87,6 +87,9 @@
         <p>Information about the ORC project that is most important for
 developers working on the project.</p>
 
+<p>The <a href="/specification">ORC format specification</a> defines the format
+to promote compatibility between implementations.</p>
+
 <h2 id="development-community">Development community</h2>
 
 <p>We have committers from many different companies. The full

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/acid.html
----------------------------------------------------------------------
diff --git a/docs/acid.html b/docs/acid.html
index c460d41..71c980c 100644
--- a/docs/acid.html
+++ b/docs/acid.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,386 +663,21 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>ACID support</h1>
-          <p>Historically, the only way to atomically add data to a table in Hive
-was to add a new partition. Updating or deleting data in partition
-required removing the old partition and adding it back with the new
-data and it wasn’t possible to do atomically.</p>
+      <div class="unit four-fifths">
+        <article>
+          <h1>ACID support</h1>
+          <p>Historically, the only way to atomically add data to a table in Hive
+was to add a new partition. Updating or deleting data in partition
+required removing the old partition and adding it back with the new
+data and it wasn’t possible to do atomically.</p>
 
 <p>However, user’s data is continually changing and as Hive matured,
 users required reliability guarantees despite the churning data
@@ -1465,270 +898,44 @@ file that don’t need to be read in this task.</p>
 
   
   
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/indexes.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/building.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class="current"><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/indexes.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/building.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1757,11 +964,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1775,34 +978,10 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1839,7 +1018,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1869,49 +1048,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1923,22 +1060,14 @@ file that don’t need to be read in this task.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class="current"><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1955,15 +1084,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2001,14 +1122,14 @@ file that don’t need to be read in this task.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2035,31 +1156,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2083,31 +1180,17 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2139,19 +1222,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2187,13 +1258,25 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2203,7 +1286,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2221,17 +1304,17 @@ file that don’t need to be read in this task.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2251,11 +1334,17 @@ file that don’t need to be read in this task.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2277,7 +1366,7 @@ file that don’t need to be read in this task.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/adopters.html
----------------------------------------------------------------------
diff --git a/docs/adopters.html b/docs/adopters.html
index b30ef6e..7ee402b 100644
--- a/docs/adopters.html
+++ b/docs/adopters.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,717 +663,126 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>ORC Adopters</h1>
-          <p>If your company or tool uses ORC, please let us know so that we can update
-this page.</p>
-
-<h3 id="apache-hadoophttpshadoopapacheorg"><a href="https://hadoop.apache.org/">Apache Hadoop</a></h3>
-
-<p>ORC files have always supporting reading and writing from Hadoop’s MapReduce,
-but with the ORC 1.1.0 release it is now easier than ever without pulling in
-Hive’s exec jar and all of its dependencies. OrcStruct now also implements
-WritableComparable and can be serialized through the MapReduce shuffle.</p>
-
-<h3 id="apache-hivehttpshiveapacheorg"><a href="https://hive.apache.org/">Apache Hive</a></h3>
-
-<p>Apache Hive was the original use case and home for ORC.  ORC’s strong
-type system, advanced compression, column projection, predicate push
-down, and vectorization support make Hive <a href="https://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/">perform
-better</a>
-than any other format for your data.</p>
-
-<h3 id="apache-nifihttpsnifiapacheorg"><a href="https://nifi.apache.org/">Apache Nifi</a></h3>
-
-<p>Apache Nifi is <a href="https://issues.apache.org/jira/browse/NIFI-1663">adding
-support</a> for writing
-ORC files.</p>
-
-<h3 id="apache-pighttpspigapacheorg"><a href="https://pig.apache.org/">Apache Pig</a></h3>
-
-<p>Apache Pig added support for reading and writing ORC files in <a href="https://hortonworks.com/blog/announcing-apache-pig-0-14-0/">Pig
-14.0</a>.</p>
-
-<h3 id="apache-sparkhttpssparkapacheorg"><a href="https://spark.apache.org/">Apache Spark</a></h3>
-
-<p>Apache Spark has <a href="https://hortonworks.com/blog/bringing-orc-support-into-apache-spark/">added
-support</a>
-for reading and writing ORC files with support for column project and
-predicate push down.</p>
-
-<h3 id="eelhttpsgithubcom51zeroeel-sdk"><a href="https://github.com/51zero/eel-sdk">EEL</a></h3>
-
-<p>EEL is a Scala BigData API that supports reading and writing data for
-various file formats and storage systems including to and from ORC. It
-is designed as a in-process low level API for manipulating data. Data
-is lazily streamed from source to sink and using standard Scala
-operations such as map, flatMap and filter, it is especially suited
-for ETL style applications. EEL supports ORC predicate and projection
-pushdowns and correct handles conversions from other formats including
-complex types such as maps, lists or nested structs. A typical use
-case would be to extract data from JDBC to ORC files housed in HDFS,
-or directly into Hive tables backed by an ORC file format.</p>
-
-<h3 id="facebookhttpsfacebookcom"><a href="https://facebook.com">Facebook</a></h3>
-
-<p>With more than 300 PB of data, Facebook was an <a href="https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/">early adopter of
-ORC</a> and quickly put it into production.</p>
-
-<h3 id="prestohttpsprestodbio"><a href="https://prestodb.io/">Presto</a></h3>
-
-<p>The Presto team has done a lot of work <a href="https://code.facebook.com/posts/370832626374903/even-faster-data-at-the-speed-of-presto-orc/">integrating
-ORC</a> into their SQL engine.</p>
-
-<h3 id="timberhttpstimberio"><a href="https://timber.io/">Timber</a></h3>
-
-<p>Timber adopted ORC for it’s S3 based logging platform that stores
-petabytes of log data. ORC has been key in ensuring a fast,
-cost-effective strategy for persisting and querying that data.</p>
-
-<h3 id="verticahttpwww8hpcomusensoftware-solutionsadvanced-sql-big-data-analytics"><a href="http://www8.hp.com/us/en/software-solutions/advanced-sql-big-data-analytics/">Vertica</a></h3>
-
-<p>HPE Vertica has contributed significantly to the ORC C++ library. ORC
-is a significant part of Vertica SQL-on-Hadoop (VSQLoH) which brings
-the performance, reliability and standards compliance of the Vertica
-Analytic Database to the Hadoop ecosystem.</p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/index.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/types.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class="current"><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
+      <div class="unit four-fifths">
+        <article>
+          <h1>ORC Adopters</h1>
+          <p>If your company or tool uses ORC, please let us know so that we can update
+this page.</p>
 
+<h3 id="apache-hadoophttpshadoopapacheorg"><a href="https://hadoop.apache.org/">Apache Hadoop</a></h3>
 
-  
+<p>ORC files have always supporting reading and writing from Hadoop’s MapReduce,
+but with the ORC 1.1.0 release it is now easier than ever without pulling in
+Hive’s exec jar and all of its dependencies. OrcStruct now also implements
+WritableComparable and can be serialized through the MapReduce shuffle.</p>
 
-  
-    
-  
+<h3 id="apache-hivehttpshiveapacheorg"><a href="https://hive.apache.org/">Apache Hive</a></h3>
 
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
+<p>Apache Hive was the original use case and home for ORC.  ORC’s strong
+type system, advanced compression, column projection, predicate push
+down, and vectorization support make Hive <a href="https://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/">perform
+better</a>
+than any other format for your data.</p>
 
+<h3 id="apache-nifihttpsnifiapacheorg"><a href="https://nifi.apache.org/">Apache Nifi</a></h3>
 
-</ul>
+<p>Apache Nifi is <a href="https://issues.apache.org/jira/browse/NIFI-1663">adding
+support</a> for writing
+ORC files.</p>
 
-    
-    <h4>Installing</h4>
-    
+<h3 id="apache-pighttpspigapacheorg"><a href="https://pig.apache.org/">Apache Pig</a></h3>
 
-<ul>
+<p>Apache Pig added support for reading and writing ORC files in <a href="https://hortonworks.com/blog/announcing-apache-pig-0-14-0/">Pig
+14.0</a>.</p>
 
-  
+<h3 id="apache-sparkhttpssparkapacheorg"><a href="https://spark.apache.org/">Apache Spark</a></h3>
 
-  
-    
-  
+<p>Apache Spark has <a href="https://hortonworks.com/blog/bringing-orc-support-into-apache-spark/">added
+support</a>
+for reading and writing ORC files with support for column project and
+predicate push down.</p>
+
+<h3 id="eelhttpsgithubcom51zeroeel-sdk"><a href="https://github.com/51zero/eel-sdk">EEL</a></h3>
+
+<p>EEL is a Scala BigData API that supports reading and writing data for
+various file formats and storage systems including to and from ORC. It
+is designed as a in-process low level API for manipulating data. Data
+is lazily streamed from source to sink and using standard Scala
+operations such as map, flatMap and filter, it is especially suited
+for ETL style applications. EEL supports ORC predicate and projection
+pushdowns and correct handles conversions from other formats including
+complex types such as maps, lists or nested structs. A typical use
+case would be to extract data from JDBC to ORC files housed in HDFS,
+or directly into Hive tables backed by an ORC file format.</p>
+
+<h3 id="facebookhttpsfacebookcom"><a href="https://facebook.com">Facebook</a></h3>
+
+<p>With more than 300 PB of data, Facebook was an <a href="https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/">early adopter of
+ORC</a> and quickly put it into production.</p>
+
+<h3 id="prestohttpsprestodbio"><a href="https://prestodb.io/">Presto</a></h3>
+
+<p>The Presto team has done a lot of work <a href="https://code.facebook.com/posts/370832626374903/even-faster-data-at-the-speed-of-presto-orc/">integrating
+ORC</a> into their SQL engine.</p>
+
+<h3 id="timberhttpstimberio"><a href="https://timber.io/">Timber</a></h3>
+
+<p>Timber adopted ORC for it’s S3 based logging platform that stores
+petabytes of log data. ORC has been key in ensuring a fast,
+cost-effective strategy for persisting and querying that data.</p>
+
+<h3 id="verticahttpwww8hpcomusensoftware-solutionsadvanced-sql-big-data-analytics"><a href="http://www8.hp.com/us/en/software-solutions/advanced-sql-big-data-analytics/">Vertica</a></h3>
+
+<p>HPE Vertica has contributed significantly to the ORC C++ library. ORC
+is a significant part of Vertica SQL-on-Hadoop (VSQLoH) which brings
+the performance, reliability and standards compliance of the Vertica
+Analytic Database to the Hadoop ecosystem.</p>
+
+          
 
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
 
 
-  
 
-  
-    
-  
 
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/index.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/types.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1604,11 +811,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1622,34 +825,10 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class="current"><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1686,7 +865,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1716,49 +895,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1770,22 +907,14 @@ Analytic Database to the Hadoop ecosystem.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1802,15 +931,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1848,14 +969,14 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1882,31 +1003,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1930,31 +1027,17 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -1986,19 +1069,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2034,13 +1105,25 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2050,7 +1133,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2068,17 +1151,17 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2098,11 +1181,17 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2124,7 +1213,7 @@ Analytic Database to the Hadoop ecosystem.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/building.html
----------------------------------------------------------------------
diff --git a/docs/building.html b/docs/building.html
index bbe1ec4..378f541 100644
--- a/docs/building.html
+++ b/docs/building.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,383 +663,18 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Building ORC</h1>
-          <h2 id="building-both-c-and-java">Building both C++ and Java</h2>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Building ORC</h1>
+          <h2 id="building-both-c-and-java">Building both C++ and Java</h2>
 
 <p>The C++ library is supported on the following operating systems:</p>
 
@@ -1338,276 +771,50 @@ is invoking:</p>
 
 
 
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/acid.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/releases.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/acid.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/releases.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1636,11 +843,7 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1654,34 +857,10 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1718,7 +897,7 @@ is invoking:</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1748,49 +927,7 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1802,22 +939,14 @@ is invoking:</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1834,15 +963,7 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class="current"><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -1880,14 +1001,14 @@ is invoking:</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -1914,31 +1035,7 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -1962,31 +1059,17 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2018,19 +1101,7 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2066,13 +1137,25 @@ is invoking:</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2082,7 +1165,7 @@ is invoking:</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2100,17 +1183,17 @@ is invoking:</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2130,11 +1213,17 @@ is invoking:</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2156,7 +1245,7 @@ is invoking:</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/compression.html
----------------------------------------------------------------------
diff --git a/docs/compression.html b/docs/compression.html
deleted file mode 100644
index 2c70cb8..0000000
--- a/docs/compression.html
+++ /dev/null
@@ -1,2193 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Compression</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Compression</h1>
-          <p>If the ORC file writer selects a generic compression codec (zlib or
-snappy), every part of the ORC file except for the Postscript is
-compressed with that codec. However, one of the requirements for ORC
-is that the reader be able to skip over compressed bytes without
-decompressing the entire stream. To manage this, ORC writes compressed
-streams in chunks with headers as in the figure below.
-To handle uncompressable data, if the compressed data is larger than
-the original, the original is stored and the isOriginal flag is
-set. Each header is 3 bytes long with (compressedLength * 2 +
-isOriginal) stored as a little endian value. For example, the header
-for a chunk that compressed to 100,000 bytes would be [0x40, 0x0d,
-0x03]. The header for 5 bytes that did not compress would be [0x0b,
-0x00, 0x00]. Each compression chunk is compressed independently so
-that as long as a decompressor starts at the top of a header, it can
-start decompressing without the previous bytes.</p>
-
-<p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
-
-<p>The default compression chunk size is 256K, but writers can choose
-their own value. Larger chunks lead to better compression, but require
-more memory. The chunk size is recorded in the Postscript so that
-readers can allocate appropriately sized buffers. Readers are
-guaranteed that no chunk will expand to more than the compression chunk
-size.</p>
-
-<p>ORC files without generic compression write each stream directly
-with no headers.</p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/file-tail.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/run-length.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>


[7/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/encodings.html
----------------------------------------------------------------------
diff --git a/docs/encodings.html b/docs/encodings.html
deleted file mode 100644
index 0a2a3f7..0000000
--- a/docs/encodings.html
+++ /dev/null
@@ -1,2790 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>Column Encodings</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Column Encodings</h1>
-          <h2 id="smallint-int-and-bigint-columns">SmallInt, Int, and BigInt Columns</h2>
-
-<p>All of the 16, 32, and 64 bit integer column types use the same set of
-potential encodings, which is basically whether they use RLE v1 or
-v2. If the PRESENT stream is not included, all of the values are
-present. For values that have false bits in the present stream, no
-values are included in the data stream.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="float-and-double-columns">Float and Double Columns</h2>
-
-<p>Floating point types are stored using IEEE 754 floating point bit
-layout. Float columns use 4 bytes per value and double columns use 8
-bytes.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">IEEE 754 floating point representation</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
-
-<p>String, char, and varchar columns may be encoded either using a
-dictionary encoding or a direct encoding. A direct encoding should be
-preferred when there are many distinct values. In all of the
-encodings, the PRESENT stream encodes whether the value is null. The
-Java ORC writer automatically picks the encoding after the first row
-group (10,000 rows).</p>
-
-<p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
-the length of each value is written into the LENGTH stream. In direct
-encoding, if the values were [“Nevada”, “California”]; the DATA
-would be “NevadaCalifornia” and the LENGTH would be [6, 10].</p>
-
-<p>For dictionary encodings the dictionary is sorted and UTF-8 bytes of
-each unique value are placed into DICTIONARY_DATA. The length of each
-item in the dictionary is put into the LENGTH stream. The DATA stream
-consists of the sequence of references to the dictionary elements.</p>
-
-<p>In dictionary encoding, if the values were [“Nevada”,
-“California”, “Nevada”, “California”, and “Florida”]; the
-DICTIONARY_DATA would be “CaliforniaFloridaNevada” and LENGTH would
-be [10, 7, 6]. The DATA would be [2, 0, 2, 0, 1].</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DICTIONARY</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DICTIONARY_DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DICTIONARY_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DICTIONARY_DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="boolean-columns">Boolean Columns</h2>
-
-<p>Boolean columns are rare, but have a simple encoding.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="tinyint-columns">TinyInt Columns</h2>
-
-<p>TinyInt (byte) columns use byte run length encoding.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Byte RLE</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="binary-columns">Binary Columns</h2>
-
-<p>Binary data is encoded with a PRESENT stream, a DATA stream that records
-the contents, and a LENGTH stream that records the number of bytes per a
-value.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">String contents</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="decimal-columns">Decimal Columns</h2>
-
-<p>Decimal was introduced in Hive 0.11 with infinite precision (the total
-number of digits). In Hive 0.13, the definition was change to limit
-the precision to a maximum of 38 digits, which conveniently uses 127
-bits plus a sign bit. The current encoding of decimal columns stores
-the integer representation of the value as an unbounded length zigzag
-encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unbounded base 128 varints</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">SECONDARY</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unbounded base 128 varints</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">SECONDARY</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="date-columns">Date Columns</h2>
-
-<p>Date data is encoded with a PRESENT stream, a DATA stream that records
-the number of days after January 1, 1970 in UTC.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="timestamp-columns">Timestamp Columns</h2>
-
-<p>Timestamp records times down to nanoseconds as a PRESENT stream that
-records non-null values, a DATA stream that records the number of
-seconds after 1 January 2015, and a SECONDARY stream that records the
-number of nanoseconds.</p>
-
-<p>Because the number of nanoseconds often has a large number of trailing
-zeros, the number has trailing decimal zero digits removed and the
-last three bits are used to record how many zeros were removed. Thus
-1000 nanoseconds would be serialized as 0x0b and 100000 would be
-serialized as 0x0d.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">SECONDARY</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DATA</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Signed Integer RLE v2</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">SECONDARY</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="struct-columns">Struct Columns</h2>
-
-<p>Structs have no data themselves and delegate everything to their child
-columns except for their PRESENT stream. They have a child column
-for each of the fields.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="list-columns">List Columns</h2>
-
-<p>Lists are encoded as the PRESENT stream and a length stream with
-number of items in each list. They have a single child column for the
-element values.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="map-columns">Map Columns</h2>
-
-<p>Maps are encoded as the PRESENT stream and a length stream with number
-of items in each list. They have a child column for the key and
-another child column for the value.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v1</td>
-    </tr>
-    <tr>
-      <td style="text-align: left">DIRECT_V2</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">LENGTH</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Unsigned Integer RLE v2</td>
-    </tr>
-  </tbody>
-</table>
-
-<h2 id="union-columns">Union Columns</h2>
-
-<p>Unions are encoded as the PRESENT stream and a tag stream that controls which
-potential variant is used. They have a child column for each variant of the
-union. Currently ORC union types are limited to 256 variants, which matches
-the Hive type model.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th style="text-align: left">Encoding</th>
-      <th style="text-align: left">Stream Kind</th>
-      <th style="text-align: left">Optional</th>
-      <th style="text-align: left">Contents</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">PRESENT</td>
-      <td style="text-align: left">Yes</td>
-      <td style="text-align: left">Boolean RLE</td>
-    </tr>
-    <tr>
-      <td style="text-align: left"> </td>
-      <td style="text-align: left">DIRECT</td>
-      <td style="text-align: left">No</td>
-      <td style="text-align: left">Byte RLE</td>
-    </tr>
-  </tbody>
-</table>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/stripes.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/spec-index.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/file-tail.html
----------------------------------------------------------------------
diff --git a/docs/file-tail.html b/docs/file-tail.html
deleted file mode 100644
index 3e4c9a4..0000000
--- a/docs/file-tail.html
+++ /dev/null
@@ -1,2477 +0,0 @@
-<!DOCTYPE HTML>
-<html lang="en-US">
-<head>
-  <meta charset="UTF-8">
-  <title>File Tail</title>
-  <meta name="viewport" content="width=device-width,initial-scale=1">
-  <meta name="generator" content="Jekyll v2.4.0">
-  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
-  <link rel="stylesheet" href="/css/screen.css">
-  <link rel="icon" type="image/x-icon" href="/favicon.ico">
-  <!--[if lt IE 9]>
-  <script src="/js/html5shiv.min.js"></script>
-  <script src="/js/respond.min.js"></script>
-  <![endif]-->
-</head>
-
-
-<body class="wrap">
-  <header role="banner">
-  <nav class="mobile-nav show-on-mobiles">
-    <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-  </nav>
-  <div class="grid">
-    <div class="unit one-third center-on-mobiles">
-      <h1>
-        <a href="/">
-          <span class="sr-only">Apache ORC</span>
-          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
-        </a>
-      </h1>
-    </div>
-    <nav class="main-nav unit two-thirds hide-on-mobiles">
-      <ul>
-  <li class="">
-    <a href="/">Home</a>
-  </li>
-  <li class="current">
-    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
-                     <span class="hide-on-mobiles">Documentation</span></a>
-  </li>
-  <li class="">
-    <a href="/talks/">Talks</a>
-  </li>
-  <li class="">
-    <a href="/news/">News</a>
-  </li>
-  <li class="">
-    <a href="/help/">Help</a>
-  </li>
-  <li class="">
-    <a href="/develop/">Develop</a>
-  </li>
-</ul>
-
-    </nav>
-  </div>
-</header>
-
-
-    <section class="docs">
-    <div class="grid">
-
-      <div class="docs-nav-mobile unit whole show-on-mobiles">
-  <select onchange="if (this.value) window.location.href=this.value">
-    <option value="">Navigate the docs…</option>
-    
-    <optgroup label="Overview">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/index.html">Background</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-      <option value="/docs/adopters.html">ORC Adopters</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/types.html">Types</option>
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/indexes.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-      <option value="/docs/acid.html">ACID support</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Installing">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-      <option value="/docs/building.html">Building ORC</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/releases.html">Releases</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in Hive">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-ddl.html">Hive DDL</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/hive-config.html">Hive Configuration</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using in MapReduce">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapred.html">Using in MapRed</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/mapreduce.html">Using in MapReduce</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Using ORC Core">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-java.html">Using Core Java</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/core-cpp.html">Using Core C++</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Tools">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/cpp-tools.html">C++ Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/java-tools.html">Java Tools</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-    <optgroup label="Format Specification">
-      
-
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>File Tail</h1>
-          <p>Since HDFS does not support changing the data in a file after it is
-written, ORC stores the top level index at the end of the file. The
-overall structure of the file is given in the figure above.  The
-file’s tail consists of 3 parts; the file metadata, file footer and
-postscript.</p>
-
-<p>The metadata for ORC is stored using
-<a href="https://s.apache.org/protobuf_encoding">Protocol Buffers</a>, which provides
-the ability to add new fields without breaking readers. This document
-incorporates the Protobuf definition from the
-<a href="https://s.apache.org/orc_proto">ORC source code</a> and the
-reader is encouraged to review the Protobuf encoding if they need to
-understand the byte-level encoding</p>
-
-<h1 id="postscript">Postscript</h1>
-
-<p>The Postscript section provides the necessary information to interpret
-the rest of the file including the length of the file’s Footer and
-Metadata sections, the version of the file, and the kind of general
-compression used (eg. none, zlib, or snappy). The Postscript is never
-compressed and ends one byte before the end of the file. The version
-stored in the Postscript is the lowest version of Hive that is
-guaranteed to be able to read the file and it stored as a sequence of
-the major and minor version. There are currently two versions that are
-used: [0,11] for Hive 0.11, and [0,12] for Hive 0.12 or later.</p>
-
-<p>The process of reading an ORC file works backwards through the
-file. Rather than making multiple short reads, the ORC reader reads
-the last 16k bytes of the file with the hope that it will contain both
-the Footer and Postscript sections. The final byte of the file
-contains the serialized length of the Postscript, which must be less
-than 256 bytes. Once the Postscript is parsed, the compressed
-serialized length of the Footer is known and it can be decompressed
-and parsed.</p>
-
-<p><code>message PostScript {
- // the length of the footer section in bytes
- optional uint64 footerLength = 1;
- // the kind of generic compression used
- optional CompressionKind compression = 2;
- // the maximum size of each compression chunk
- optional uint64 compressionBlockSize = 3;
- // the version of the writer
- repeated uint32 version = 4 [packed = true];
- // the length of the metadata section in bytes
- optional uint64 metadataLength = 5;
- // the fixed string "ORC"
- optional string magic = 8000;
-}
-</code></p>
-
-<p><code>enum CompressionKind {
- NONE = 0;
- ZLIB = 1;
- SNAPPY = 2;
- LZO = 3;
- LZ4 = 4;
- ZSTD = 5;
-}
-</code></p>
-
-<h1 id="footer">Footer</h1>
-
-<p>The Footer section contains the layout of the body of the file, the
-type schema information, the number of rows, and the statistics about
-each of the columns.</p>
-
-<p>The file is broken in to three parts- Header, Body, and Tail. The
-Header consists of the bytes “ORC’’ to support tools that want to
-scan the front of the file to determine the type of the file. The Body
-contains the rows and indexes, and the Tail gives the file level
-information as described in this section.</p>
-
-<p><code>message Footer {
- // the length of the file header in bytes (always 3)
- optional uint64 headerLength = 1;
- // the length of the file header and body in bytes
- optional uint64 contentLength = 2;
- // the information about the stripes
- repeated StripeInformation stripes = 3;
- // the schema information
- repeated Type types = 4;
- // the user metadata that was added
- repeated UserMetadataItem metadata = 5;
- // the total number of rows in the file
- optional uint64 numberOfRows = 6;
- // the statistics of each column across the file
- repeated ColumnStatistics statistics = 7;
- // the maximum number of rows in each index entry
- optional uint32 rowIndexStride = 8;
-}
-</code></p>
-
-<h2 id="stripe-information">Stripe Information</h2>
-
-<p>The body of the file is divided into stripes. Each stripe is self
-contained and may be read using only its own bytes combined with the
-file’s Footer and Postscript. Each stripe contains only entire rows so
-that rows never straddle stripe boundaries. Stripes have three
-sections: a set of indexes for the rows within the stripe, the data
-itself, and a stripe footer. Both the indexes and the data sections
-are divided by columns so that only the data for the required columns
-needs to be read.</p>
-
-<p><code>message StripeInformation {
- // the start of the stripe within the file
- optional uint64 offset = 1;
- // the length of the indexes in bytes
- optional uint64 indexLength = 2;
- // the length of the data in bytes
- optional uint64 dataLength = 3;
- // the length of the footer in bytes
- optional uint64 footerLength = 4;
- // the number of rows in the stripe
- optional uint64 numberOfRows = 5;
-}
-</code></p>
-
-<h2 id="type-information">Type Information</h2>
-
-<p>All of the rows in an ORC file must have the same schema. Logically
-the schema is expressed as a tree as in the figure below, where
-the compound types have subcolumns under them.</p>
-
-<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
-
-<p>The equivalent Hive DDL would be:</p>
-
-<p><code>create table Foobar (
- myInt int,
- myMap map&lt;string,
- struct&lt;myString : string,
- myDouble: double&gt;&gt;,
- myTime timestamp
-);
-</code></p>
-
-<p>The type tree is flattened in to a list via a pre-order traversal
-where each type is assigned the next id. Clearly the root of the type
-tree is always type id 0. Compound types have a field named subtypes
-that contains the list of their children’s type ids.</p>
-
-<p><code>message Type {
- enum Kind {
- BOOLEAN = 0;
- BYTE = 1;
- SHORT = 2;
- INT = 3;
- LONG = 4;
- FLOAT = 5;
- DOUBLE = 6;
- STRING = 7;
- BINARY = 8;
- TIMESTAMP = 9;
- LIST = 10;
- MAP = 11;
- STRUCT = 12;
- UNION = 13;
- DECIMAL = 14;
- DATE = 15;
- VARCHAR = 16;
- CHAR = 17;
- }
- // the kind of this type
- required Kind kind = 1;
- // the type ids of any subcolumns for list, map, struct, or union
- repeated uint32 subtypes = 2 [packed=true];
- // the list of field names for struct
- repeated string fieldNames = 3;
- // the maximum length of the type for varchar or char in UTF-8 characters
- optional uint32 maximumLength = 4;
- // the precision and scale for decimal
- optional uint32 precision = 5;
- optional uint32 scale = 6;
-}
-</code></p>
-
-<h2 id="column-statistics">Column Statistics</h2>
-
-<p>The goal of the column statistics is that for each column, the writer
-records the count and depending on the type other useful fields. For
-most of the primitive types, it records the minimum and maximum
-values; and for numeric types it additionally stores the sum.
-From Hive 1.1.0 onwards, the column statistics will also record if
-there are any null values within the row group by setting the hasNull flag.
-The hasNull flag is used by ORC’s predicate pushdown to better answer
-‘IS NULL’ queries.</p>
-
-<p><code>message ColumnStatistics {
- // the number of values
- optional uint64 numberOfValues = 1;
- // At most one of these has a value for any column
- optional IntegerStatistics intStatistics = 2;
- optional DoubleStatistics doubleStatistics = 3;
- optional StringStatistics stringStatistics = 4;
- optional BucketStatistics bucketStatistics = 5;
- optional DecimalStatistics decimalStatistics = 6;
- optional DateStatistics dateStatistics = 7;
- optional BinaryStatistics binaryStatistics = 8;
- optional TimestampStatistics timestampStatistics = 9;
- optional bool hasNull = 10;
-}
-</code></p>
-
-<p>For integer types (tinyint, smallint, int, bigint), the column
-statistics includes the minimum, maximum, and sum. If the sum
-overflows long at any point during the calculation, no sum is
-recorded.</p>
-
-<p><code>message IntegerStatistics {
- optional sint64 minimum = 1;
- optional sint64 maximum = 2;
- optional sint64 sum = 3;
-}
-</code></p>
-
-<p>For floating point types (float, double), the column statistics
-include the minimum, maximum, and sum. If the sum overflows a double,
-no sum is recorded.</p>
-
-<p><code>message DoubleStatistics {
- optional double minimum = 1;
- optional double maximum = 2;
- optional double sum = 3;
-}
-</code></p>
-
-<p>For strings, the minimum value, maximum value, and the sum of the
-lengths of the values are recorded.</p>
-
-<p><code>message StringStatistics {
- optional string minimum = 1;
- optional string maximum = 2;
- // sum will store the total length of all strings
- optional sint64 sum = 3;
-}
-</code></p>
-
-<p>For booleans, the statistics include the count of false and true values.</p>
-
-<p><code>message BucketStatistics {
- repeated uint64 count = 1 [packed=true];
-}
-</code></p>
-
-<p>For decimals, the minimum, maximum, and sum are stored.</p>
-
-<p><code>message DecimalStatistics {
- optional string minimum = 1;
- optional string maximum = 2;
- optional string sum = 3;
-}
-</code></p>
-
-<p>Date columns record the minimum and maximum values as the number of
-days since the epoch (1/1/2015).</p>
-
-<p><code>message DateStatistics {
- // min,max values saved as days since epoch
- optional sint32 minimum = 1;
- optional sint32 maximum = 2;
-}
-</code></p>
-
-<p>Timestamp columns record the minimum and maximum values as the number of
-milliseconds since the epoch (1/1/2015).</p>
-
-<p><code>message TimestampStatistics {
- // min,max values saved as milliseconds since epoch
- optional sint64 minimum = 1;
- optional sint64 maximum = 2;
-}
-</code></p>
-
-<p>Binary columns store the aggregate number of bytes across all of the values.</p>
-
-<p><code>message BinaryStatistics {
- // sum will store the total binary blob length
- optional sint64 sum = 1;
-}
-</code></p>
-
-<h2 id="user-metadata">User Metadata</h2>
-
-<p>The user can add arbitrary key/value pairs to an ORC file as it is
-written. The contents of the keys and values are completely
-application defined, but the key is a string and the value is
-binary. Care should be taken by applications to make sure that their
-keys are unique and in general should be prefixed with an organization
-code.</p>
-
-<p><code>message UserMetadataItem {
- // the user defined key
- required string name = 1;
- // the user defined binary value
- required bytes value = 2;
-}
-</code></p>
-
-<h2 id="file-metadata">File Metadata</h2>
-
-<p>The file Metadata section contains column statistics at the stripe
-level granularity. These statistics enable input split elimination
-based on the predicate push-down evaluated per a stripe.</p>
-
-<p><code>message StripeStatistics {
- repeated ColumnStatistics colStats = 1;
-}
-</code></p>
-
-<p><code>message Metadata {
- repeated StripeStatistics stripeStats = 1;
-}
-</code></p>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/spec-intro.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/compression.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in Hive</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Tools</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Format Specification</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/file-tail.html">File Tail</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
-      
-
-
-</ul>
-
-    
-  </aside>
-</div>
-
-
-      <div class="clear"></div>
-
-    </div>
-  </section>
-
-
-  <footer role="contentinfo">
-  <p>The contents of this website are &copy;&nbsp;2018
-     <a href="https://www.apache.org/">Apache Software Foundation</a>
-     under the terms of the <a
-      href="https://www.apache.org/licenses/LICENSE-2.0.html">
-      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
-      of the Apache Software Foundation.</p>
-</footer>
-
-  <script>
-  var anchorForId = function (id) {
-    var anchor = document.createElement("a");
-    anchor.className = "header-link";
-    anchor.href      = "#" + id;
-    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
-    anchor.title = "Permalink";
-    return anchor;
-  };
-
-  var linkifyAnchors = function (level, containingElement) {
-    var headers = containingElement.getElementsByTagName("h" + level);
-    for (var h = 0; h < headers.length; h++) {
-      var header = headers[h];
-
-      if (typeof header.id !== "undefined" && header.id !== "") {
-        header.appendChild(anchorForId(header.id));
-      }
-    }
-  };
-
-  document.onreadystatechange = function () {
-    if (this.readyState === "complete") {
-      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
-      if (!contentBlock) {
-        return;
-      }
-      for (var level = 1; level <= 6; level++) {
-        linkifyAnchors(level, contentBlock);
-      }
-    }
-  };
-</script>
-
-
-</body>
-</html>

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/hive-config.html
----------------------------------------------------------------------
diff --git a/docs/hive-config.html b/docs/hive-config.html
index 6fe958c..bc2f68c 100644
--- a/docs/hive-config.html
+++ b/docs/hive-config.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,383 +663,18 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Hive Configuration</h1>
-          <h2 id="table-properties">Table properties</h2>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Hive Configuration</h1>
+          <h2 id="table-properties">Table properties</h2>
 
 <p>Tables stored as ORC files use table properties to control their behavior. By
 using table properties, the table owner ensures that all clients store data
@@ -1460,286 +893,60 @@ with the same options.</p>
 
 
 
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/hive-ddl.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/mapred.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
+
 
   
-    
   
 
   
-    
-  
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/hive-ddl.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/mapred.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1768,11 +975,7 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1786,34 +989,10 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -1850,7 +1029,7 @@ with the same options.</p>
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -1880,49 +1059,7 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -1934,22 +1071,14 @@ with the same options.</p>
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -1966,15 +1095,7 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2012,14 +1133,14 @@ with the same options.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2046,31 +1167,7 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2094,31 +1191,17 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class="current"><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2150,19 +1233,7 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2198,13 +1269,25 @@ with the same options.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2214,7 +1297,7 @@ with the same options.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2232,17 +1315,17 @@ with the same options.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2262,11 +1345,17 @@ with the same options.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2288,7 +1377,7 @@ with the same options.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 


[3/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/specification/ORCv0.html
----------------------------------------------------------------------
diff --git a/specification/ORCv0.html b/specification/ORCv0.html
new file mode 100644
index 0000000..ecf335a
--- /dev/null
+++ b/specification/ORCv0.html
@@ -0,0 +1,1260 @@
+<!DOCTYPE HTML>
+<html lang="en-US">
+<head>
+  <meta charset="UTF-8">
+  <title>ORC Specification v0</title>
+  <meta name="viewport" content="width=device-width,initial-scale=1">
+  <meta name="generator" content="Jekyll v2.4.0">
+  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+  <link rel="stylesheet" href="/css/screen.css">
+  <link rel="icon" type="image/x-icon" href="/favicon.ico">
+  <!--[if lt IE 9]>
+  <script src="/js/html5shiv.min.js"></script>
+  <script src="/js/respond.min.js"></script>
+  <![endif]-->
+</head>
+
+
+<body class="wrap">
+  <header role="banner">
+  <nav class="mobile-nav show-on-mobiles">
+    <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+  </nav>
+  <div class="grid">
+    <div class="unit one-third center-on-mobiles">
+      <h1>
+        <a href="/">
+          <span class="sr-only">Apache ORC</span>
+          <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
+        </a>
+      </h1>
+    </div>
+    <nav class="main-nav unit two-thirds hide-on-mobiles">
+      <ul>
+  <li class="">
+    <a href="/">Home</a>
+  </li>
+  <li class="">
+    <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+                     <span class="hide-on-mobiles">Documentation</span></a>
+  </li>
+  <li class="">
+    <a href="/talks/">Talks</a>
+  </li>
+  <li class="">
+    <a href="/news/">News</a>
+  </li>
+  <li class="">
+    <a href="/help/">Help</a>
+  </li>
+  <li class="">
+    <a href="/develop/">Develop</a>
+  </li>
+</ul>
+
+    </nav>
+  </div>
+</header>
+
+
+  <section class="standalone">
+  <div class="grid">
+
+    <div class="unit whole">
+      <article>
+        <h1>ORC Specification v0</h1>
+        <p>This version of the file format was originally released as part of
+Hive 0.11.</p>
+
+<h1 id="motivation">Motivation</h1>
+
+<p>Hive’s RCFile was the standard format for storing tabular data in
+Hadoop for several years. However, RCFile has limitations because it
+treats each column as a binary blob without semantics. In Hive 0.11 we
+added a new file format named Optimized Row Columnar (ORC) file that
+uses and retains the type information from the table definition. ORC
+uses type specific readers and writers that provide light weight
+compression techniques such as dictionary encoding, bit packing, delta
+encoding, and run length encoding – resulting in dramatically smaller
+files. Additionally, ORC can apply generic compression using zlib, or
+Snappy on top of the lightweight compression for even smaller
+files. However, storage savings are only part of the gain. ORC
+supports projection, which selects subsets of the columns for reading,
+so that queries reading only one column read only the required
+bytes. Furthermore, ORC files include light weight indexes that
+include the minimum and maximum values for each column in each set of
+10,000 rows and the entire file. Using pushdown filters from Hive, the
+file reader can skip entire sets of rows that aren’t important for
+this query.</p>
+
+<p><img src="/img/OrcFileLayout.png" alt="ORC file structure" /></p>
+
+<h1 id="file-tail">File Tail</h1>
+
+<p>Since HDFS does not support changing the data in a file after it is
+written, ORC stores the top level index at the end of the file. The
+overall structure of the file is given in the figure above.  The
+file’s tail consists of 3 parts; the file metadata, file footer and
+postscript.</p>
+
+<p>The metadata for ORC is stored using
+<a href="https://s.apache.org/protobuf_encoding">Protocol Buffers</a>, which provides
+the ability to add new fields without breaking readers. This document
+incorporates the Protobuf definition from the
+<a href="https://s.apache.org/orc_proto">ORC source code</a> and the
+reader is encouraged to review the Protobuf encoding if they need to
+understand the byte-level encoding</p>
+
+<h2 id="postscript">Postscript</h2>
+
+<p>The Postscript section provides the necessary information to interpret
+the rest of the file including the length of the file’s Footer and
+Metadata sections, the version of the file, and the kind of general
+compression used (eg. none, zlib, or snappy). The Postscript is never
+compressed and ends one byte before the end of the file. The version
+stored in the Postscript is the lowest version of Hive that is
+guaranteed to be able to read the file and it stored as a sequence of
+the major and minor version. This version is stored as [0, 11].</p>
+
+<p>The process of reading an ORC file works backwards through the
+file. Rather than making multiple short reads, the ORC reader reads
+the last 16k bytes of the file with the hope that it will contain both
+the Footer and Postscript sections. The final byte of the file
+contains the serialized length of the Postscript, which must be less
+than 256 bytes. Once the Postscript is parsed, the compressed
+serialized length of the Footer is known and it can be decompressed
+and parsed.</p>
+
+<p><code>message PostScript {
+ // the length of the footer section in bytes
+ optional uint64 footerLength = 1;
+ // the kind of generic compression used
+ optional CompressionKind compression = 2;
+ // the maximum size of each compression chunk
+ optional uint64 compressionBlockSize = 3;
+ // the version of the writer
+ repeated uint32 version = 4 [packed = true];
+ // the length of the metadata section in bytes
+ optional uint64 metadataLength = 5;
+ // the fixed string "ORC"
+ optional string magic = 8000;
+}
+</code></p>
+
+<p><code>enum CompressionKind {
+ NONE = 0;
+ ZLIB = 1;
+ SNAPPY = 2;
+ LZO = 3;
+ LZ4 = 4;
+ ZSTD = 5;
+}
+</code></p>
+
+<h2 id="footer">Footer</h2>
+
+<p>The Footer section contains the layout of the body of the file, the
+type schema information, the number of rows, and the statistics about
+each of the columns.</p>
+
+<p>The file is broken in to three parts- Header, Body, and Tail. The
+Header consists of the bytes “ORC’’ to support tools that want to
+scan the front of the file to determine the type of the file. The Body
+contains the rows and indexes, and the Tail gives the file level
+information as described in this section.</p>
+
+<p><code>message Footer {
+ // the length of the file header in bytes (always 3)
+ optional uint64 headerLength = 1;
+ // the length of the file header and body in bytes
+ optional uint64 contentLength = 2;
+ // the information about the stripes
+ repeated StripeInformation stripes = 3;
+ // the schema information
+ repeated Type types = 4;
+ // the user metadata that was added
+ repeated UserMetadataItem metadata = 5;
+ // the total number of rows in the file
+ optional uint64 numberOfRows = 6;
+ // the statistics of each column across the file
+ repeated ColumnStatistics statistics = 7;
+ // the maximum number of rows in each index entry
+ optional uint32 rowIndexStride = 8;
+}
+</code></p>
+
+<h3 id="stripe-information">Stripe Information</h3>
+
+<p>The body of the file is divided into stripes. Each stripe is self
+contained and may be read using only its own bytes combined with the
+file’s Footer and Postscript. Each stripe contains only entire rows so
+that rows never straddle stripe boundaries. Stripes have three
+sections: a set of indexes for the rows within the stripe, the data
+itself, and a stripe footer. Both the indexes and the data sections
+are divided by columns so that only the data for the required columns
+needs to be read.</p>
+
+<p><code>message StripeInformation {
+ // the start of the stripe within the file
+ optional uint64 offset = 1;
+ // the length of the indexes in bytes
+ optional uint64 indexLength = 2;
+ // the length of the data in bytes
+ optional uint64 dataLength = 3;
+ // the length of the footer in bytes
+ optional uint64 footerLength = 4;
+ // the number of rows in the stripe
+ optional uint64 numberOfRows = 5;
+}
+</code></p>
+
+<h3 id="type-information">Type Information</h3>
+
+<p>All of the rows in an ORC file must have the same schema. Logically
+the schema is expressed as a tree as in the figure below, where
+the compound types have subcolumns under them.</p>
+
+<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
+
+<p>The equivalent Hive DDL would be:</p>
+
+<p><code>create table Foobar (
+ myInt int,
+ myMap map&lt;string,
+ struct&lt;myString : string,
+ myDouble: double&gt;&gt;,
+ myTime timestamp
+);
+</code></p>
+
+<p>The type tree is flattened in to a list via a pre-order traversal
+where each type is assigned the next id. Clearly the root of the type
+tree is always type id 0. Compound types have a field named subtypes
+that contains the list of their children’s type ids.</p>
+
+<p><code>message Type {
+ enum Kind {
+ BOOLEAN = 0;
+ BYTE = 1;
+ SHORT = 2;
+ INT = 3;
+ LONG = 4;
+ FLOAT = 5;
+ DOUBLE = 6;
+ STRING = 7;
+ BINARY = 8;
+ TIMESTAMP = 9;
+ LIST = 10;
+ MAP = 11;
+ STRUCT = 12;
+ UNION = 13;
+ DECIMAL = 14;
+ DATE = 15;
+ VARCHAR = 16;
+ CHAR = 17;
+ }
+ // the kind of this type
+ required Kind kind = 1;
+ // the type ids of any subcolumns for list, map, struct, or union
+ repeated uint32 subtypes = 2 [packed=true];
+ // the list of field names for struct
+ repeated string fieldNames = 3;
+ // the maximum length of the type for varchar or char in UTF-8 characters
+ optional uint32 maximumLength = 4;
+ // the precision and scale for decimal
+ optional uint32 precision = 5;
+ optional uint32 scale = 6;
+}
+</code></p>
+
+<h3 id="column-statistics">Column Statistics</h3>
+
+<p>The goal of the column statistics is that for each column, the writer
+records the count and depending on the type other useful fields. For
+most of the primitive types, it records the minimum and maximum
+values; and for numeric types it additionally stores the sum.
+From Hive 1.1.0 onwards, the column statistics will also record if
+there are any null values within the row group by setting the hasNull flag.
+The hasNull flag is used by ORC’s predicate pushdown to better answer
+‘IS NULL’ queries.</p>
+
+<p><code>message ColumnStatistics {
+ // the number of values
+ optional uint64 numberOfValues = 1;
+ // At most one of these has a value for any column
+ optional IntegerStatistics intStatistics = 2;
+ optional DoubleStatistics doubleStatistics = 3;
+ optional StringStatistics stringStatistics = 4;
+ optional BucketStatistics bucketStatistics = 5;
+ optional DecimalStatistics decimalStatistics = 6;
+ optional DateStatistics dateStatistics = 7;
+ optional BinaryStatistics binaryStatistics = 8;
+ optional TimestampStatistics timestampStatistics = 9;
+ optional bool hasNull = 10;
+}
+</code></p>
+
+<p>For integer types (tinyint, smallint, int, bigint), the column
+statistics includes the minimum, maximum, and sum. If the sum
+overflows long at any point during the calculation, no sum is
+recorded.</p>
+
+<p><code>message IntegerStatistics {
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For floating point types (float, double), the column statistics
+include the minimum, maximum, and sum. If the sum overflows a double,
+no sum is recorded.</p>
+
+<p><code>message DoubleStatistics {
+ optional double minimum = 1;
+ optional double maximum = 2;
+ optional double sum = 3;
+}
+</code></p>
+
+<p>For strings, the minimum value, maximum value, and the sum of the
+lengths of the values are recorded.</p>
+
+<p><code>message StringStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ // sum will store the total length of all strings
+ optional sint64 sum = 3;
+}
+</code></p>
+
+<p>For booleans, the statistics include the count of false and true values.</p>
+
+<p><code>message BucketStatistics {
+ repeated uint64 count = 1 [packed=true];
+}
+</code></p>
+
+<p>For decimals, the minimum, maximum, and sum are stored.</p>
+
+<p><code>message DecimalStatistics {
+ optional string minimum = 1;
+ optional string maximum = 2;
+ optional string sum = 3;
+}
+</code></p>
+
+<p>Date columns record the minimum and maximum values as the number of
+days since the epoch (1/1/2015).</p>
+
+<p><code>message DateStatistics {
+ // min,max values saved as days since epoch
+ optional sint32 minimum = 1;
+ optional sint32 maximum = 2;
+}
+</code></p>
+
+<p>Timestamp columns record the minimum and maximum values as the number of
+milliseconds since the epoch (1/1/2015).</p>
+
+<p><code>message TimestampStatistics {
+ // min,max values saved as milliseconds since epoch
+ optional sint64 minimum = 1;
+ optional sint64 maximum = 2;
+}
+</code></p>
+
+<p>Binary columns store the aggregate number of bytes across all of the values.</p>
+
+<p><code>message BinaryStatistics {
+ // sum will store the total binary blob length
+ optional sint64 sum = 1;
+}
+</code></p>
+
+<h3 id="user-metadata">User Metadata</h3>
+
+<p>The user can add arbitrary key/value pairs to an ORC file as it is
+written. The contents of the keys and values are completely
+application defined, but the key is a string and the value is
+binary. Care should be taken by applications to make sure that their
+keys are unique and in general should be prefixed with an organization
+code.</p>
+
+<p><code>message UserMetadataItem {
+ // the user defined key
+ required string name = 1;
+ // the user defined binary value
+ required bytes value = 2;
+}
+</code></p>
+
+<h3 id="file-metadata">File Metadata</h3>
+
+<p>The file Metadata section contains column statistics at the stripe
+level granularity. These statistics enable input split elimination
+based on the predicate push-down evaluated per a stripe.</p>
+
+<p><code>message StripeStatistics {
+ repeated ColumnStatistics colStats = 1;
+}
+</code></p>
+
+<p><code>message Metadata {
+ repeated StripeStatistics stripeStats = 1;
+}
+</code></p>
+
+<h1 id="compression">Compression</h1>
+
+<p>If the ORC file writer selects a generic compression codec (zlib or
+snappy), every part of the ORC file except for the Postscript is
+compressed with that codec. However, one of the requirements for ORC
+is that the reader be able to skip over compressed bytes without
+decompressing the entire stream. To manage this, ORC writes compressed
+streams in chunks with headers as in the figure below.
+To handle uncompressable data, if the compressed data is larger than
+the original, the original is stored and the isOriginal flag is
+set. Each header is 3 bytes long with (compressedLength * 2 +
+isOriginal) stored as a little endian value. For example, the header
+for a chunk that compressed to 100,000 bytes would be [0x40, 0x0d,
+0x03]. The header for 5 bytes that did not compress would be [0x0b,
+0x00, 0x00]. Each compression chunk is compressed independently so
+that as long as a decompressor starts at the top of a header, it can
+start decompressing without the previous bytes.</p>
+
+<p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
+
+<p>The default compression chunk size is 256K, but writers can choose
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
+size.</p>
+
+<p>ORC files without generic compression write each stream directly
+with no headers.</p>
+
+<h1 id="run-length-encoding">Run Length Encoding</h1>
+
+<h2 id="base-128-varint">Base 128 Varint</h2>
+
+<p>Variable width integer encodings take advantage of the fact that most
+numbers are small and that having smaller encodings for small numbers
+shrinks the overall size of the data. ORC uses the varint format from
+Protocol Buffers, which writes data in little endian format using the
+low 7 bits of each byte. The high bit in each byte is set if the
+number continues into the next byte.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Unsigned Original</th>
+      <th style="text-align: left">Serialized</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0x00</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">127</td>
+      <td style="text-align: left">0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">128</td>
+      <td style="text-align: left">0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">129</td>
+      <td style="text-align: left">0x81, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,383</td>
+      <td style="text-align: left">0xff, 0x7f</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,384</td>
+      <td style="text-align: left">0x80, 0x80, 0x01</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">16,385</td>
+      <td style="text-align: left">0x81, 0x80, 0x01</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>For signed integer types, the number is converted into an unsigned
+number using a zigzag encoding. Zigzag encoding moves the sign bit to
+the least significant bit using the expression (val « 1) ^ (val »
+63) and derives its name from the fact that positive and negative
+numbers alternate once encoded. The unsigned number is then serialized
+as above.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Signed Original</th>
+      <th style="text-align: left">Unsigned</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">0</td>
+      <td style="text-align: left">0</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-1</td>
+      <td style="text-align: left">1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">1</td>
+      <td style="text-align: left">2</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">-2</td>
+      <td style="text-align: left">3</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">2</td>
+      <td style="text-align: left">4</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="byte-run-length-encoding">Byte Run Length Encoding</h2>
+
+<p>For byte streams, ORC uses a very light weight encoding of identical
+values.</p>
+
+<ul>
+  <li>Run - a sequence of at least 3 identical values</li>
+  <li>Literals - a sequence of non-identical values</li>
+</ul>
+
+<p>The first byte of each group of values is a header than determines
+whether it is a run (value between 0 to 127) or literal list (value
+between -128 to -1). For runs, the control byte is the length of the
+run minus the length of the minimal run (3) and the control byte for
+literal lists is the negative length of the list. For example, a
+hundred 0’s is encoded as [0x61, 0x00] and the sequence 0x44, 0x45
+would be encoded as [0xfe, 0x44, 0x45]. The next group can choose
+either of the encodings.</p>
+
+<h2 id="boolean-run-length-encoding">Boolean Run Length Encoding</h2>
+
+<p>For encoding boolean types, the bits are put in the bytes from most
+significant to least significant. The bytes are encoded using byte run
+length encoding as described in the previous section. For example,
+the byte sequence [0xff, 0x80] would be one true followed by
+seven false values.</p>
+
+<h2 id="integer-run-length-encoding-version-1">Integer Run Length Encoding, version 1</h2>
+
+<p>ORC v0 files use Run Length Encoding version 1 (RLEv1),
+which provides a lightweight compression of signed or unsigned integer
+sequences. RLEv1 has two sub-encodings:</p>
+
+<ul>
+  <li>Run - a sequence of values that differ by a small fixed delta</li>
+  <li>Literals - a sequence of varint encoded values</li>
+</ul>
+
+<p>Runs start with an initial byte of 0x00 to 0x7f, which encodes the
+length of the run - 3. A second byte provides the fixed delta in the
+range of -128 to 127. Finally, the first value of the run is encoded
+as a base 128 varint.</p>
+
+<p>For example, if the sequence is 100 instances of 7 the encoding would
+start with 100 - 3, followed by a delta of 0, and a varint of 7 for
+an encoding of [0x61, 0x00, 0x07]. To encode the sequence of numbers
+running from 100 to 1, the first byte is 100 - 3, the delta is -1,
+and the varint is 100 for an encoding of [0x61, 0xff, 0x64].</p>
+
+<p>Literals start with an initial byte of 0x80 to 0xff, which corresponds
+to the negative of number of literals in the sequence. Following the
+header byte, the list of N varints is encoded. Thus, if there are
+no runs, the overhead is 1 byte for each 128 integers. The first 5
+prime numbers [2, 3, 4, 7, 11] would encoded as [0xfb, 0x02, 0x03,
+0x04, 0x07, 0xb].</p>
+
+<h1 id="stripes">Stripes</h1>
+
+<p>The body of ORC files consists of a series of stripes. Stripes are
+large (typically ~200MB) and independent of each other and are often
+processed by different tasks. The defining characteristic for columnar
+storage formats is that the data for each column is stored separately
+and that reading data out of the file should be proportional to the
+number of columns read.</p>
+
+<p>In ORC files, each column is stored in several streams that are stored
+next to each other in the file. For example, an integer column is
+represented as two streams PRESENT, which uses one with a bit per
+value recording if the value is non-null, and DATA, which records the
+non-null values. If all of a column’s values in a stripe are non-null,
+the PRESENT stream is omitted from the stripe. For binary data, ORC
+uses three streams PRESENT, DATA, and LENGTH, which stores the length
+of each value. The details of each type will be presented in the
+following subsections.</p>
+
+<h2 id="stripe-footer">Stripe Footer</h2>
+
+<p>The stripe footer contains the encoding of each column and the
+directory of the streams including their location.</p>
+
+<p><code>message StripeFooter {
+ // the location of each stream
+ repeated Stream streams = 1;
+ // the encoding of each column
+ repeated ColumnEncoding columns = 2;
+}
+</code></p>
+
+<p>To describe each stream, ORC stores the kind of stream, the column id,
+and the stream’s size in bytes. The details of what is stored in each stream
+depends on the type and encoding of the column.</p>
+
+<p><code>message Stream {
+ enum Kind {
+ // boolean stream of whether the next value is non-null
+ PRESENT = 0;
+ // the primary data stream
+ DATA = 1;
+ // the length of each value for variable length data
+ LENGTH = 2;
+ // the dictionary blob
+ DICTIONARY\_DATA = 3;
+ // deprecated prior to Hive 0.11
+ // It was used to store the number of instances of each value in the
+ // dictionary
+ DICTIONARY_COUNT = 4;
+ // a secondary data stream
+ SECONDARY = 5;
+ // the index for seeking to particular row groups
+ ROW_INDEX = 6;
+ }
+ required Kind kind = 1;
+ // the column id
+ optional uint32 column = 2;
+ // the number of bytes in the file
+ optional uint64 length = 3;
+}
+</code></p>
+
+<p>Depending on their type several options for encoding are possible. The
+encodings are divided into direct or dictionary-based categories and
+further refined as to whether they use RLE v1 or v2.</p>
+
+<p><code>message ColumnEncoding {
+ enum Kind {
+ // the encoding is mapped directly to the stream using RLE v1
+ DIRECT = 0;
+ // the encoding uses a dictionary of unique values using RLE v1
+ DICTIONARY = 1;
+ // the encoding is direct using RLE v2
+ }
+ required Kind kind = 1;
+ // for dictionary encodings, record the size of the dictionary
+ optional uint32 dictionarySize = 2;
+}
+</code></p>
+
+<h1 id="column-encodings">Column Encodings</h1>
+
+<h2 id="smallint-int-and-bigint-columns">SmallInt, Int, and BigInt Columns</h2>
+
+<p>All of the 16, 32, and 64 bit integer column types use the same set of
+potential encodings, which is basically whether they use RLE v1 or
+v2. If the PRESENT stream is not included, all of the values are
+present. For values that have false bits in the present stream, no
+values are included in the data stream.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="float-and-double-columns">Float and Double Columns</h2>
+
+<p>Floating point types are stored using IEEE 754 floating point bit
+layout. Float columns use 4 bytes per value and double columns use 8
+bytes.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">IEEE 754 floating point representation</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
+
+<p>String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).</p>
+
+<p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
+the length of each value is written into the LENGTH stream. In direct
+encoding, if the values were [“Nevada”, “California”]; the DATA
+would be “NevadaCalifornia” and the LENGTH would be [6, 10].</p>
+
+<p>For dictionary encodings the dictionary is sorted and UTF-8 bytes of
+each unique value are placed into DICTIONARY_DATA. The length of each
+item in the dictionary is put into the LENGTH stream. The DATA stream
+consists of the sequence of references to the dictionary elements.</p>
+
+<p>In dictionary encoding, if the values were [“Nevada”,
+“California”, “Nevada”, “California”, and “Florida”]; the
+DICTIONARY_DATA would be “CaliforniaFloridaNevada” and LENGTH would
+be [10, 7, 6]. The DATA would be [2, 0, 2, 0, 1].</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left">DICTIONARY</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DICTIONARY_DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="boolean-columns">Boolean Columns</h2>
+
+<p>Boolean columns are rare, but have a simple encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="tinyint-columns">TinyInt Columns</h2>
+
+<p>TinyInt (byte) columns use byte run length encoding.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="binary-columns">Binary Columns</h2>
+
+<p>Binary data is encoded with a PRESENT stream, a DATA stream that records
+the contents, and a LENGTH stream that records the number of bytes per a
+value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">String contents</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="decimal-columns">Decimal Columns</h2>
+
+<p>Decimal was introduced in Hive 0.11 with infinite precision (the total
+number of digits). In Hive 0.13, the definition was change to limit
+the precision to a maximum of 38 digits, which conveniently uses 127
+bits plus a sign bit. The current encoding of decimal columns stores
+the integer representation of the value as an unbounded length zigzag
+encoded base 128 varint. The scale is stored in the SECONDARY stream
+as an signed integer.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unbounded base 128 varints</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="date-columns">Date Columns</h2>
+
+<p>Date data is encoded with a PRESENT stream, a DATA stream that records
+the number of days after January 1, 1970 in UTC.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="timestamp-columns">Timestamp Columns</h2>
+
+<p>Timestamp records times down to nanoseconds as a PRESENT stream that
+records non-null values, a DATA stream that records the number of
+seconds after 1 January 2015, and a SECONDARY stream that records the
+number of nanoseconds.</p>
+
+<p>Because the number of nanoseconds often has a large number of trailing
+zeros, the number has trailing decimal zero digits removed and the
+last three bits are used to record how many zeros were removed. Thus
+1000 nanoseconds would be serialized as 0x0b and 100000 would be
+serialized as 0x0d.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DATA</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Signed Integer RLE v1</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">SECONDARY</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="struct-columns">Struct Columns</h2>
+
+<p>Structs have no data themselves and delegate everything to their child
+columns except for their PRESENT stream. They have a child column
+for each of the fields.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="list-columns">List Columns</h2>
+
+<p>Lists are encoded as the PRESENT stream and a length stream with
+number of items in each list. They have a single child column for the
+element values.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="map-columns">Map Columns</h2>
+
+<p>Maps are encoded as the PRESENT stream and a length stream with number
+of items in each list. They have a child column for the key and
+another child column for the value.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">LENGTH</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Unsigned Integer RLE v1</td>
+    </tr>
+  </tbody>
+</table>
+
+<h2 id="union-columns">Union Columns</h2>
+
+<p>Unions are encoded as the PRESENT stream and a tag stream that controls which
+potential variant is used. They have a child column for each variant of the
+union. Currently ORC union types are limited to 256 variants, which matches
+the Hive type model.</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: left">Encoding</th>
+      <th style="text-align: left">Stream Kind</th>
+      <th style="text-align: left">Optional</th>
+      <th style="text-align: left">Contents</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">PRESENT</td>
+      <td style="text-align: left">Yes</td>
+      <td style="text-align: left">Boolean RLE</td>
+    </tr>
+    <tr>
+      <td style="text-align: left"> </td>
+      <td style="text-align: left">DIRECT</td>
+      <td style="text-align: left">No</td>
+      <td style="text-align: left">Byte RLE</td>
+    </tr>
+  </tbody>
+</table>
+
+<h1 id="indexes">Indexes</h1>
+
+<h2 id="row-group-index">Row Group Index</h2>
+
+<p>The row group indexes consist of a ROW_INDEX stream for each primitive
+column that has an entry for each row group. Row groups are controlled
+by the writer and default to 10,000 rows. Each RowIndexEntry gives the
+position of each stream for the column and the statistics for that row
+group.</p>
+
+<p>The index streams are placed at the front of the stripe, because in
+the default case of streaming they do not need to be read. They are
+only loaded when either predicate push down is being used or the
+reader seeks to a particular row.</p>
+
+<p><code>message RowIndexEntry {
+ repeated uint64 positions = 1 [packed=true];
+ optional ColumnStatistics statistics = 2;
+}
+</code></p>
+
+<p><code>message RowIndex {
+ repeated RowIndexEntry entry = 1;
+}
+</code></p>
+
+<p>To record positions, each stream needs a sequence of numbers. For
+uncompressed streams, the position is the byte offset of the RLE run’s
+start location followed by the number of values that need to be
+consumed from the run. In compressed streams, the first number is the
+start of the compression chunk in the stream, followed by the number
+of decompressed bytes that need to be consumed, and finally the number
+of values consumed in the RLE.</p>
+
+<p>For columns with multiple streams, the sequences of positions in each
+stream are concatenated. That was an unfortunate decision on my part
+that we should fix at some point, because it makes code that uses the
+indexes error-prone.</p>
+
+<p>Because dictionaries are accessed randomly, there is not a position to
+record for the dictionary and the entire dictionary must be read even
+if only part of a stripe is being read.</p>
+
+
+      </article>
+    </div>
+
+    <div class="clear"></div>
+
+  </div>
+</section>
+
+
+  <footer role="contentinfo">
+  <p>The contents of this website are &copy;&nbsp;2018
+     <a href="https://www.apache.org/">Apache Software Foundation</a>
+     under the terms of the <a
+      href="https://www.apache.org/licenses/LICENSE-2.0.html">
+      Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
+      of the Apache Software Foundation.</p>
+</footer>
+
+  <script>
+  var anchorForId = function (id) {
+    var anchor = document.createElement("a");
+    anchor.className = "header-link";
+    anchor.href      = "#" + id;
+    anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
+    anchor.title = "Permalink";
+    return anchor;
+  };
+
+  var linkifyAnchors = function (level, containingElement) {
+    var headers = containingElement.getElementsByTagName("h" + level);
+    for (var h = 0; h < headers.length; h++) {
+      var header = headers[h];
+
+      if (typeof header.id !== "undefined" && header.id !== "") {
+        header.appendChild(anchorForId(header.id));
+      }
+    }
+  };
+
+  document.onreadystatechange = function () {
+    if (this.readyState === "complete") {
+      var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
+      if (!contentBlock) {
+        return;
+      }
+      for (var level = 1; level <= 6; level++) {
+        linkifyAnchors(level, contentBlock);
+      }
+    }
+  };
+</script>
+
+
+</body>
+</html>


[8/9] orc git commit: Pushing ORC-339 reorganize the ORC file format spec.

Posted by om...@apache.org.
http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/core-cpp.html
----------------------------------------------------------------------
diff --git a/docs/core-cpp.html b/docs/core-cpp.html
index 130d019..ec31d6f 100644
--- a/docs/core-cpp.html
+++ b/docs/core-cpp.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,384 +663,19 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Using Core C++</h1>
-          <p>The C++ Core ORC API reads and writes ORC files into its own
-orc::ColumnVectorBatch vectorized classes.</p>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Using Core C++</h1>
+          <p>The C++ Core ORC API reads and writes ORC files into its own
+orc::ColumnVectorBatch vectorized classes.</p>
 
 <h2 id="vectorized-row-batch">Vectorized Row Batch</h2>
 
@@ -1345,576 +778,280 @@ value is null.</p>
       <td>UnionVectorBatch</td>
     </tr>
     <tr>
-      <td>varchar</td>
-      <td>StringVectorBatch</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>LongVectorBatch handles all of the integer types (boolean, bigint,
-date, int, smallint, and tinyint). The data is represented as a
-buffer of int64_t where each value is sign-extended as necessary.</p>
-
-<pre><code class="language-cpp">  struct LongVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;int64_t&gt; data;
-    ...
-  };
-</code></pre>
-
-<p>TimestampVectorBatch handles timestamp values. The data is
-represented as two buffers of int64_t for seconds and nanoseconds
-respectively. Note that we always assume data is in GMT timezone;
-therefore it is user’s responsibility to convert wall clock time
-from local timezone to GMT.</p>
-
-<pre><code class="language-cpp">  struct TimestampVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;int64_t&gt; data;
-    DataBuffer&lt;int64_t&gt; nanoseconds;
-    ...
-  };
-</code></pre>
-
-<p>DoubleVectorBatch handles all of the floating point types
-(double, and float). The data is represented as a buffer of doubles.</p>
-
-<pre><code class="language-cpp">  struct DoubleVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;double&gt; data;
-    ...
-  };
-</code></pre>
-
-<p>Decimal64VectorBatch handles decimal columns with precision no
-greater than 18. Decimal128VectorBatch handles the others. The data
-is represented as a buffer of int64_t and orc::Int128 respectively.</p>
-
-<pre><code class="language-cpp">  struct Decimal64VectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;int64_t&gt; values;
-    ...
-  };
-
-  struct Decimal128VectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;Int128&gt; values;
-    ...
-  };
-</code></pre>
-
-<p>StringVectorBatch handles all of the binary types (binary,
-char, string, and varchar). The data is represented as a char* buffer,
-and a length buffer.</p>
-
-<pre><code class="language-cpp">  struct StringVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;char*&gt; data;
-    DataBuffer&lt;int64_t&gt; length;
-    ...
-  };
-</code></pre>
-
-<p>StructVectorBatch handles the struct columns and represents
-the data as a buffer of <code>ColumnVectorBatch</code>.</p>
-
-<pre><code class="language-cpp">  struct StructVectorBatch: public ColumnVectorBatch {
-    std::vector&lt;ColumnVectorBatch*&gt; fields;
-    ...
-  };
-</code></pre>
+      <td>varchar</td>
+      <td>StringVectorBatch</td>
+    </tr>
+  </tbody>
+</table>
 
-<p>UnionVectorBatch handles the union columns. It uses <code>tags</code>
-to indicate which subtype has the value and <code>offsets</code> indicates
-the offset in child batch of that subtype. A individual
-<code>ColumnVectorBatch</code> is used for each subtype.</p>
+<p>LongVectorBatch handles all of the integer types (boolean, bigint,
+date, int, smallint, and tinyint). The data is represented as a
+buffer of int64_t where each value is sign-extended as necessary.</p>
 
-<pre><code class="language-cpp">  struct UnionVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;unsigned char&gt; tags;
-    DataBuffer&lt;uint64_t&gt; offsets;
-    std::vector&lt;ColumnVectorBatch*&gt; children;
+<pre><code class="language-cpp">  struct LongVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;int64_t&gt; data;
     ...
   };
 </code></pre>
 
-<p>ListVectorBatch handles the array columns and represents
-the data as a buffer of integers for the offsets and a
-<code>ColumnVectorBatch</code> for the children values.</p>
+<p>TimestampVectorBatch handles timestamp values. The data is
+represented as two buffers of int64_t for seconds and nanoseconds
+respectively. Note that we always assume data is in GMT timezone;
+therefore it is user’s responsibility to convert wall clock time
+from local timezone to GMT.</p>
 
-<pre><code class="language-cpp">  struct ListVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;int64_t&gt; offsets;
-    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; elements;
+<pre><code class="language-cpp">  struct TimestampVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;int64_t&gt; data;
+    DataBuffer&lt;int64_t&gt; nanoseconds;
     ...
   };
 </code></pre>
 
-<p>MapVectorBatch handles the map columns and represents the data
-as two arrays of integers for the offsets and two <code>ColumnVectorBatch</code>s
-for the keys and values.</p>
+<p>DoubleVectorBatch handles all of the floating point types
+(double, and float). The data is represented as a buffer of doubles.</p>
 
-<pre><code class="language-cpp">  struct MapVectorBatch: public ColumnVectorBatch {
-    DataBuffer&lt;int64_t&gt; offsets;
-    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; keys;
-    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; elements;
+<pre><code class="language-cpp">  struct DoubleVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;double&gt; data;
     ...
   };
 </code></pre>
 
-<h2 id="writing-orc-files">Writing ORC Files</h2>
-
-<p>To write an ORC file, you need to include <code>OrcFile.hh</code> and define
-the schema; then use <code>orc::OutputStream</code> and <code>orc::WriterOptions</code>
-to create a <code>orc::Writer</code> with the desired filename. This example
-sets the required schema parameter, but there are many other
-options to control the ORC writer.</p>
-
-<pre><code class="language-cpp">ORC_UNIQUE_PTR&lt;OutputStream&gt; outStream =
-  writeLocalFile("my-file.orc");
-ORC_UNIQUE_PTR&lt;Type&gt; schema(
-  Type::buildTypeFromString("struct&lt;x:int,y:int&gt;"));
-WriterOptions options;
-ORC_UNIQUE_PTR&lt;Writer&gt; writer =
-  createWriter(*schema, outStream.get(), options);
-</code></pre>
-
-<p>Now you need to create a row batch, set the data, and write it to the file
-as the batch fills up. When the file is done, close the <code>Writer</code>.</p>
-
-<pre><code class="language-cpp">uint64_t batchSize = 1024, rowCount = 10000;
-ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; batch =
-  writer-&gt;createRowBatch(batchSize);
-StructVectorBatch *root =
-  dynamic_cast&lt;StructVectorBatch *&gt;(batch.get());
-LongVectorBatch *x =
-  dynamic_cast&lt;LongVectorBatch *&gt;(root-&gt;fields[0]);
-LongVectorBatch *y =
-  dynamic_cast&lt;LongVectorBatch *&gt;(root-&gt;fields[1]);
-
-uint64_t rows = 0;
-for (uint64_t i = 0; i &lt; rowCount; ++i) {
-  x-&gt;data[rows] = i;
-  y-&gt;data[rows] = i * 3;
-  rows++;
-
-  if (rows == batchSize) {
-    root-&gt;numElements = rows;
-    x-&gt;numElements = rows;
-    y-&gt;numElements = rows;
-
-    writer-&gt;add(*batch);
-    rows = 0;
-  }
-}
-
-if (rows != 0) {
-  root-&gt;numElements = rows;
-  x-&gt;numElements = rows;
-  y-&gt;numElements = rows;
-
-  writer-&gt;add(*batch);
-  rows = 0;
-}
-
-writer-&gt;close();
-</code></pre>
-
-<h2 id="reading-orc-files">Reading ORC Files</h2>
-
-<p>To read ORC files, include <code>OrcFile.hh</code> file to create a <code>orc::Reader</code>
-that contains the metadata about the file. There are a few options to
-the <code>orc::Reader</code>, but far fewer than the writer and none of them are
-required. The reader has methods for getting the number of rows,
-schema, compression, etc. from the file.</p>
-
-<pre><code class="language-cpp">ORC_UNIQUE_PTR&lt;InputStream&gt; inStream =
-  readLocalFile("my-file.orc");
-ReaderOptions options;
-ORC_UNIQUE_PTR&lt;Reader&gt; reader =
-  createReader(inStream, options);
-</code></pre>
-
-<p>To get the data, create a <code>orc::RowReader</code> object. By default,
-the RowReader reads all rows and all columns, but there are
-options to control the data that is read.</p>
-
-<pre><code class="language-cpp">RowReaderOptions rowReaderOptions;
-ORC_UNIQUE_PTR&lt;RowReader&gt; rowReader =
-  reader-&gt;createRowReader(rowReaderOptions);
-ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; batch =
-  rowReader-&gt;createRowBatch(1024);
-</code></pre>
-
-<p>With a <code>orc::RowReader</code> the user can ask for the next batch until there
-are no more left. The reader will stop the batch at certain boundaries,
-so the returned batch may not be full, but it will always contain some rows.</p>
-
-<pre><code class="language-cpp">while (rowReader-&gt;next(*batch)) {
-  for (uint64_t r = 0; r &lt; batch-&gt;numElements; ++r) {
-    ... process row r from batch
-  }
-}
-</code></pre>
-
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/core-java.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/cpp-tools.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
+<p>Decimal64VectorBatch handles decimal columns with precision no
+greater than 18. Decimal128VectorBatch handles the others. The data
+is represented as a buffer of int64_t and orc::Int128 respectively.</p>
 
-  
-    
-  
+<pre><code class="language-cpp">  struct Decimal64VectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;int64_t&gt; values;
+    ...
+  };
 
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
+  struct Decimal128VectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;Int128&gt; values;
+    ...
+  };
+</code></pre>
 
+<p>StringVectorBatch handles all of the binary types (binary,
+char, string, and varchar). The data is represented as a char* buffer,
+and a length buffer.</p>
 
-  
+<pre><code class="language-cpp">  struct StringVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;char*&gt; data;
+    DataBuffer&lt;int64_t&gt; length;
+    ...
+  };
+</code></pre>
 
-  
-    
-  
+<p>StructVectorBatch handles the struct columns and represents
+the data as a buffer of <code>ColumnVectorBatch</code>.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
+<pre><code class="language-cpp">  struct StructVectorBatch: public ColumnVectorBatch {
+    std::vector&lt;ColumnVectorBatch*&gt; fields;
+    ...
+  };
+</code></pre>
 
+<p>UnionVectorBatch handles the union columns. It uses <code>tags</code>
+to indicate which subtype has the value and <code>offsets</code> indicates
+the offset in child batch of that subtype. A individual
+<code>ColumnVectorBatch</code> is used for each subtype.</p>
 
-  
+<pre><code class="language-cpp">  struct UnionVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;unsigned char&gt; tags;
+    DataBuffer&lt;uint64_t&gt; offsets;
+    std::vector&lt;ColumnVectorBatch*&gt; children;
+    ...
+  };
+</code></pre>
 
-  
-    
-  
+<p>ListVectorBatch handles the array columns and represents
+the data as a buffer of integers for the offsets and a
+<code>ColumnVectorBatch</code> for the children values.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
+<pre><code class="language-cpp">  struct ListVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;int64_t&gt; offsets;
+    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; elements;
+    ...
+  };
+</code></pre>
 
+<p>MapVectorBatch handles the map columns and represents the data
+as two arrays of integers for the offsets and two <code>ColumnVectorBatch</code>s
+for the keys and values.</p>
 
-  
+<pre><code class="language-cpp">  struct MapVectorBatch: public ColumnVectorBatch {
+    DataBuffer&lt;int64_t&gt; offsets;
+    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; keys;
+    ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; elements;
+    ...
+  };
+</code></pre>
 
-  
-    
-  
+<h2 id="writing-orc-files">Writing ORC Files</h2>
 
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
+<p>To write an ORC file, you need to include <code>OrcFile.hh</code> and define
+the schema; then use <code>orc::OutputStream</code> and <code>orc::WriterOptions</code>
+to create a <code>orc::Writer</code> with the desired filename. This example
+sets the required schema parameter, but there are many other
+options to control the ORC writer.</p>
 
+<pre><code class="language-cpp">ORC_UNIQUE_PTR&lt;OutputStream&gt; outStream =
+  writeLocalFile("my-file.orc");
+ORC_UNIQUE_PTR&lt;Type&gt; schema(
+  Type::buildTypeFromString("struct&lt;x:int,y:int&gt;"));
+WriterOptions options;
+ORC_UNIQUE_PTR&lt;Writer&gt; writer =
+  createWriter(*schema, outStream.get(), options);
+</code></pre>
 
-</ul>
+<p>Now you need to create a row batch, set the data, and write it to the file
+as the batch fills up. When the file is done, close the <code>Writer</code>.</p>
 
-    
-    <h4>Installing</h4>
-    
+<pre><code class="language-cpp">uint64_t batchSize = 1024, rowCount = 10000;
+ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; batch =
+  writer-&gt;createRowBatch(batchSize);
+StructVectorBatch *root =
+  dynamic_cast&lt;StructVectorBatch *&gt;(batch.get());
+LongVectorBatch *x =
+  dynamic_cast&lt;LongVectorBatch *&gt;(root-&gt;fields[0]);
+LongVectorBatch *y =
+  dynamic_cast&lt;LongVectorBatch *&gt;(root-&gt;fields[1]);
 
-<ul>
+uint64_t rows = 0;
+for (uint64_t i = 0; i &lt; rowCount; ++i) {
+  x-&gt;data[rows] = i;
+  y-&gt;data[rows] = i * 3;
+  rows++;
 
-  
+  if (rows == batchSize) {
+    root-&gt;numElements = rows;
+    x-&gt;numElements = rows;
+    y-&gt;numElements = rows;
 
-  
-    
-  
+    writer-&gt;add(*batch);
+    rows = 0;
+  }
+}
 
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
+if (rows != 0) {
+  root-&gt;numElements = rows;
+  x-&gt;numElements = rows;
+  y-&gt;numElements = rows;
 
+  writer-&gt;add(*batch);
+  rows = 0;
+}
 
-  
+writer-&gt;close();
+</code></pre>
 
-  
-    
-  
+<h2 id="reading-orc-files">Reading ORC Files</h2>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
+<p>To read ORC files, include <code>OrcFile.hh</code> file to create a <code>orc::Reader</code>
+that contains the metadata about the file. There are a few options to
+the <code>orc::Reader</code>, but far fewer than the writer and none of them are
+required. The reader has methods for getting the number of rows,
+schema, compression, etc. from the file.</p>
 
+<pre><code class="language-cpp">ORC_UNIQUE_PTR&lt;InputStream&gt; inStream =
+  readLocalFile("my-file.orc");
+ReaderOptions options;
+ORC_UNIQUE_PTR&lt;Reader&gt; reader =
+  createReader(inStream, options);
+</code></pre>
+
+<p>To get the data, create a <code>orc::RowReader</code> object. By default,
+the RowReader reads all rows and all columns, but there are
+options to control the data that is read.</p>
+
+<pre><code class="language-cpp">RowReaderOptions rowReaderOptions;
+ORC_UNIQUE_PTR&lt;RowReader&gt; rowReader =
+  reader-&gt;createRowReader(rowReaderOptions);
+ORC_UNIQUE_PTR&lt;ColumnVectorBatch&gt; batch =
+  rowReader-&gt;createRowBatch(1024);
+</code></pre>
+
+<p>With a <code>orc::RowReader</code> the user can ask for the next batch until there
+are no more left. The reader will stop the batch at certain boundaries,
+so the returned batch may not be full, but it will always contain some rows.</p>
+
+<pre><code class="language-cpp">while (rowReader-&gt;next(*batch)) {
+  for (uint64_t r = 0; r &lt; batch-&gt;numElements; ++r) {
+    ... process row r from batch
+  }
+}
+</code></pre>
+
+          
 
-</ul>
 
-    
-    <h4>Using in Hive</h4>
-    
 
-<ul>
 
-  
 
   
-    
   
 
   
-    
-  
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
-
 
   
+  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/core-java.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/cpp-tools.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in MapReduce</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1943,19 +1080,21 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
+      <li class=""><a href="/docs/index.html">Background</a></li>
+      
+
+
   
-    
-  
-    
+
   
     
   
-    
+
   
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
@@ -1995,20 +1134,10 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
   
 
   
@@ -2027,34 +1156,34 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
   
     
   
-
-  
     
   
     
   
     
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      
+
+
+  
+
   
     
   
+
+  
     
-      <li class="current"><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -2071,15 +1200,7 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2117,14 +1238,14 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2151,31 +1272,7 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2199,31 +1296,17 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2255,19 +1338,7 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2303,13 +1374,25 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2319,7 +1402,7 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2337,17 +1420,17 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-  
-    
-  
-    
-  
+      <li class="current"><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2367,11 +1450,17 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2393,7 +1482,7 @@ so the returned batch may not be full, but it will always contain some rows.</p>
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/core-java.html
----------------------------------------------------------------------
diff --git a/docs/core-java.html b/docs/core-java.html
index 196bf0d..ca4e99b 100644
--- a/docs/core-java.html
+++ b/docs/core-java.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,385 +663,20 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>Using Core Java</h1>
-          <p>The Core ORC API reads and writes ORC files into Hive’s storage-api
-vectorized classes. Both Hive and MapReduce use the Core API to actually
-read and write the data.</p>
+      <div class="unit four-fifths">
+        <article>
+          <h1>Using Core Java</h1>
+          <p>The Core ORC API reads and writes ORC files into Hive’s storage-api
+vectorized classes. Both Hive and MapReduce use the Core API to actually
+read and write the data.</p>
 
 <h2 id="vectorized-row-batch">Vectorized Row Batch</h2>
 
@@ -1646,289 +1079,63 @@ rows.close();
   
 
   
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/mapreduce.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/core-cpp.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
-
-        </article>
-      </div>
-
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
-
-
-  
-
-  
-    
-  
-
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Installing</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
-
-
-  
+  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
+  
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/mapreduce.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/core-cpp.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in Hive</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1957,11 +1164,7 @@ rows.close();
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
+      <li class=""><a href="/docs/index.html">Background</a></li>
       
 
 
@@ -1975,34 +1178,10 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using in MapReduce</h4>
-    
-
-<ul>
-
   
 
   
@@ -2039,7 +1218,7 @@ rows.close();
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
@@ -2069,49 +1248,7 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
-      
-
-
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
-  
-
-  
-    
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/core-java.html">Using Core Java</a></li>
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
       
 
 
@@ -2123,22 +1260,14 @@ rows.close();
 
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -2155,15 +1284,7 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2201,14 +1322,14 @@ rows.close();
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2235,31 +1356,7 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2283,31 +1380,17 @@ rows.close();
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2339,19 +1422,7 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2387,13 +1458,25 @@ rows.close();
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2403,7 +1486,7 @@ rows.close();
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class="current"><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2421,17 +1504,17 @@ rows.close();
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2451,11 +1534,17 @@ rows.close();
     
   
     
+      <li class=""><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2477,7 +1566,7 @@ rows.close();
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
       
 
 

http://git-wip-us.apache.org/repos/asf/orc/blob/c6e29090/docs/cpp-tools.html
----------------------------------------------------------------------
diff --git a/docs/cpp-tools.html b/docs/cpp-tools.html
index 171dc0d..abe6e2e 100644
--- a/docs/cpp-tools.html
+++ b/docs/cpp-tools.html
@@ -109,12 +109,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/index.html">Background</option>
     
   
@@ -130,14 +124,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -174,20 +160,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -221,20 +193,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
       <option value="/docs/types.html">Types</option>
     
   
@@ -261,12 +219,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/indexes.html">Indexes</option>
     
   
@@ -280,14 +232,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -324,20 +268,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -381,20 +311,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -426,25 +342,11 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/releases.html">Releases</option>
     
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -471,12 +373,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-ddl.html">Hive DDL</option>
     
   
@@ -494,14 +390,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -519,12 +407,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/hive-config.html">Hive Configuration</option>
     
   
@@ -544,14 +426,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -586,12 +460,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapred.html">Using in MapRed</option>
     
   
@@ -601,14 +469,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -638,12 +498,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/mapreduce.html">Using in MapReduce</option>
     
   
@@ -651,14 +505,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -679,8 +525,6 @@
     
   
     
-  
-    
       <option value="/docs/core-java.html">Using Core Java</option>
     
   
@@ -704,18 +548,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -727,8 +559,6 @@
     
   
     
-  
-    
       <option value="/docs/core-cpp.html">Using Core C++</option>
     
   
@@ -754,18 +584,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
@@ -788,8 +606,6 @@
     
   
     
-  
-    
       <option value="/docs/cpp-tools.html">C++ Tools</option>
     
   
@@ -811,18 +627,6 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
 
   
 
@@ -848,12 +652,6 @@
     
   
     
-  
-    
-  
-    
-  
-    
       <option value="/docs/java-tools.html">Java Tools</option>
     
   
@@ -865,1004 +663,343 @@
   
     
   
-    
-  
-    
-  
-    
-  
-    
-  
 
 
     </optgroup>
     
-    <optgroup label="Format Specification">
-      
+  </select>
+</div>
 
 
-  
+      <div class="unit four-fifths">
+        <article>
+          <h1>C++ Tools</h1>
+          <h2 id="orc-contents">orc-contents</h2>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-intro.html">Introduction</option>
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/file-tail.html">File Tail</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/compression.html">Compression</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/run-length.html">Run Length Encoding</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/stripes.html">Stripes</option>
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/encodings.html">Column Encodings</option>
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-
-  
-
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <option value="/docs/spec-index.html">Indexes</option>
-    
-  
-    
-  
-    
-  
-    
-  
-
-
-    </optgroup>
-    
-  </select>
-</div>
-
-
-      <div class="unit four-fifths">
-        <article>
-          <h1>C++ Tools</h1>
-          <h2 id="orc-contents">orc-contents</h2>
-
-<p>Displays the contents of the ORC file as a JSON document. With the
-<code>columns</code> argument only the selected columns are printed.</p>
-
-<pre><code class="language-shell">% orc-contents  [--columns=1,2,...] &lt;filename&gt;
-</code></pre>
-
-<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see (without
-the line breaks within each record):</p>
-
-<pre><code class="language-shell">% orc-contents examples/TestOrcFile.test1.orc
-{"boolean1": false, "byte1": 1, "short1": 1024, "int1": 65536, \\
- "long1": 9223372036854775807, "float1": 1, "double1": -15, \\
- "bytes1": [0, 1, 2, 3, 4], "string1": "hi", "middle": \\
-    {"list": [{"int1": 1, "string1": "bye"}, \\
-              {"int1": 2, "string1": "sigh"}]}, \\
- "list": [{"int1": 3, "string1": "good"}, \\
-          {"int1": 4, "string1": "bad"}], \\
- "map": []}
-{"boolean1": true, "byte1": 100, "short1": 2048, "int1": 65536,
- "long1": 9223372036854775807, "float1": 2, "double1": -5, \\
- "bytes1": [], "string1": "bye", \\
- "middle": {"list": [{"int1": 1, "string1": "bye"}, \\
-                     {"int1": 2, "string1": "sigh"}]}, \\
- "list": [{"int1": 100000000, "string1": "cat"}, \\
-          {"int1": -100000, "string1": "in"}, \\
-          {"int1": 1234, "string1": "hat"}], \\
- "map": [{"key": "chani", "value": {"int1": 5, "string1": "chani"}}, \\
-         {"key": "mauddib", \\
-          "value": {"int1": 1, "string1": "mauddib"}}]}
-</code></pre>
-
-<h2 id="orc-metadata">orc-metadata</h2>
-
-<p>Displays the metadata of the ORC file as a JSON document. With the
-<code>verbose</code> option additional information about the layout of the file
-is also printed.</p>
-
-<p>For diagnosing problems, it is useful to use the ‘–raw’ option that
-prints the protocol buffers from the ORC file directly rather than
-interpreting them.</p>
-
-<pre><code class="language-shell">% orc-metadata [-v] [--raw] &lt;filename&gt;
-</code></pre>
-
-<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see:</p>
-
-<pre><code class="language-shell">% orc-metadata examples/TestOrcFile.test1.orc
-{ "name": "../examples/TestOrcFile.test1.orc",
-  "type": "struct&lt;boolean1:boolean,byte1:tinyint,short1:smallint,
-int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,
-string1:string,middle:struct&lt;list:array&lt;struct&lt;int1:int,string1:
-string&gt;&gt;&gt;,list:array&lt;struct&lt;int1:int,string1:string&gt;&gt;,map:map&lt;
-string,struct&lt;int1:int,string1:string&gt;&gt;&gt;",
-  "rows": 2,
-  "stripe count": 1,
-  "format": "0.12", "writer version": "HIVE-8732",
-  "compression": "zlib", "compression block": 10000,
-  "file length": 1711,
-  "content": 1015, "stripe stats": 250, "footer": 421, "postscript": 24,
-  "row index stride": 10000,
-  "user metadata": {
-  },
-  "stripes": [
-    { "stripe": 0, "rows": 2,
-      "offset": 3, "length": 1012,
-      "index": 570, "data": 243, "footer": 199
-    }
-  ]
-}
-</code></pre>
-
-<h2 id="csv-import">csv-import</h2>
-
-<p>Imports CSV file into an Orc file using the specified schema.
-Compound types are not yet supported. <code>delimiter</code> option indicates
-the delimiter in the input CSV file and by default is <code>,</code>. <code>stripe</code>
-option means the stripe size and set to 128MB by default. <code>block</code>
-option is compression block size which is 64KB by default. <code>batch</code>
-option is by default 1024 rows for one batch.</p>
-
-<pre><code class="language-shell">% csv-import [--delimiter=&lt;character&gt;] [--stripe=&lt;size&gt;]
-             [--block=&lt;size&gt;] [--batch=&lt;size&gt;]
-             &lt;schema&gt; &lt;inputCSVFile&gt; &lt;outputORCFile&gt;
-</code></pre>
-
-<p>If you run it on the example file TestCSVFileImport.test10rows.csv,
-you’ll see:</p>
-
-<pre><code class="language-shell">% csv-import "struct&lt;a:bigint,b:string,c:double&gt;"
-             examples/TestCSVFileImport.test10rows.csv /tmp/test.orc
-[2018-04-11 11:12:16] Start importing Orc file...
-[2018-04-11 11:12:16] Finish importing Orc file.
-[2018-04-11 11:12:16] Total writer elasped time: 0.001352s.
-[2018-04-11 11:12:16] Total writer CPU time: 0.001339s.
-</code></pre>
-
-<h2 id="orc-scan">orc-scan</h2>
-
-<p>Scans and displays the row count of the ORC file. With the <code>batch</code> option
-to set the batch size which is 1024 rows by default. It is useful to check
-if the ORC file is damaged.</p>
-
-<pre><code class="language-shell">% orc-scan [--batch=&lt;size&gt;] &lt;filename&gt;
-</code></pre>
-
-<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see:</p>
-
-<pre><code class="language-shell">% orc-scan examples/TestOrcFile.test1.orc
-Rows: 2
-Batches: 1
-</code></pre>
-
-<h2 id="orc-statistics">orc-statistics</h2>
-
-<p>Displays the file-level and stripe-level column statistics of the ORC file.
-With the <code>withIndex</code> option to include column statistics in each row group.</p>
-
-<pre><code class="language-shell">% orc-statistics [--withIndex] &lt;filename&gt;
-</code></pre>
-
-<p>If you run it on the example file TestOrcFile.TestOrcFile.columnProjection.orc
-you’ll see:</p>
-
-<pre><code class="language-shell">% orc-statistics examples/TestOrcFile.columnProjection.orc
-File examples/TestOrcFile.columnProjection.orc has 3 columns
-*** Column 0 ***
-Column has 21000 values and has null value: no
-
-*** Column 1 ***
-Data type: Integer
-Values: 21000
-Has null: no
-Minimum: -2147439072
-Maximum: 2147257982
-Sum: 268482658568
-
-*** Column 2 ***
-Data type: String
-Values: 21000
-Has null: no
-Minimum: 100119c272d7db89
-Maximum: fffe9f6f23b287f3
-Total length: 334559
-
-File examples/TestOrcFile.columnProjection.orc has 5 stripes
-*** Stripe 0 ***
-
---- Column 0 ---
-Column has 5000 values and has null value: no
-
---- Column 1 ---
-Data type: Integer
-Values: 5000
-Has null: no
-Minimum: -2145365268
-Maximum: 2147025027
-Sum: -29841423854
-
---- Column 2 ---
-Data type: String
-Values: 5000
-Has null: no
-Minimum: 1005350489418be2
-Maximum: fffbb8718c92b09f
-Total length: 79644
-
-*** Stripe 1 ***
-
---- Column 0 ---
-Column has 5000 values and has null value: no
-
---- Column 1 ---
-Data type: Integer
-Values: 5000
-Has null: no
-Minimum: -2147115959
-Maximum: 2147257982
-Sum: 108604887785
-
---- Column 2 ---
-Data type: String
-Values: 5000
-Has null: no
-Minimum: 100119c272d7db89
-Maximum: fff0ae41d41e6afc
-Total length: 79640
-
-*** Stripe 2 ***
-
---- Column 0 ---
-Column has 5000 values and has null value: no
-
---- Column 1 ---
-Data type: Integer
-Values: 5000
-Has null: no
-Minimum: -2145932387
-Maximum: 2145877119
-Sum: 70064190848
-
---- Column 2 ---
-Data type: String
-Values: 5000
-Has null: no
-Minimum: 10130af874ae036c
-Maximum: fffe9f6f23b287f3
-Total length: 79645
-
-*** Stripe 3 ***
-
---- Column 0 ---
-Column has 5000 values and has null value: no
-
---- Column 1 ---
-Data type: Integer
-Values: 5000
-Has null: no
-Minimum: -2147439072
-Maximum: 2147074354
-Sum: 104681356482
-
---- Column 2 ---
-Data type: String
-Values: 5000
-Has null: no
-Minimum: 102547d48ed06518
-Maximum: fffa47c57dc7b69a
-Total length: 79689
-
-*** Stripe 4 ***
-
---- Column 0 ---
-Column has 1000 values and has null value: no
-
---- Column 1 ---
-Data type: Integer
-Values: 1000
-Has null: no
-Minimum: -2141222223
-Maximum: 2145816096
-Sum: 14973647307
+<p>Displays the contents of the ORC file as a JSON document. With the
+<code>columns</code> argument only the selected columns are printed.</p>
 
---- Column 2 ---
-Data type: String
-Values: 1000
-Has null: no
-Minimum: 1059d81c9025a217
-Maximum: ffc17f0e35e1a6c0
-Total length: 15941
+<pre><code class="language-shell">% orc-contents  [--columns=1,2,...] &lt;filename&gt;
 </code></pre>
 
-          
-
-
-
-
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
-
-  
-  
+<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see (without
+the line breaks within each record):</p>
 
-  
-  
+<pre><code class="language-shell">% orc-contents examples/TestOrcFile.test1.orc
+{"boolean1": false, "byte1": 1, "short1": 1024, "int1": 65536, \\
+ "long1": 9223372036854775807, "float1": 1, "double1": -15, \\
+ "bytes1": [0, 1, 2, 3, 4], "string1": "hi", "middle": \\
+    {"list": [{"int1": 1, "string1": "bye"}, \\
+              {"int1": 2, "string1": "sigh"}]}, \\
+ "list": [{"int1": 3, "string1": "good"}, \\
+          {"int1": 4, "string1": "bad"}], \\
+ "map": []}
+{"boolean1": true, "byte1": 100, "short1": 2048, "int1": 65536,
+ "long1": 9223372036854775807, "float1": 2, "double1": -5, \\
+ "bytes1": [], "string1": "bye", \\
+ "middle": {"list": [{"int1": 1, "string1": "bye"}, \\
+                     {"int1": 2, "string1": "sigh"}]}, \\
+ "list": [{"int1": 100000000, "string1": "cat"}, \\
+          {"int1": -100000, "string1": "in"}, \\
+          {"int1": 1234, "string1": "hat"}], \\
+ "map": [{"key": "chani", "value": {"int1": 5, "string1": "chani"}}, \\
+         {"key": "mauddib", \\
+          "value": {"int1": 1, "string1": "mauddib"}}]}
+</code></pre>
 
-  
-  
+<h2 id="orc-metadata">orc-metadata</h2>
 
-  
-  
+<p>Displays the metadata of the ORC file as a JSON document. With the
+<code>verbose</code> option additional information about the layout of the file
+is also printed.</p>
 
-  
-  
+<p>For diagnosing problems, it is useful to use the ‘–raw’ option that
+prints the protocol buffers from the ORC file directly rather than
+interpreting them.</p>
 
-  
-  
+<pre><code class="language-shell">% orc-metadata [-v] [--raw] &lt;filename&gt;
+</code></pre>
 
-  
-  
-    <div class="section-nav">
-      <div class="left align-right">
-          
-            
-            
-            <a href="/docs/core-cpp.html" class="prev">Back</a>
-          
-      </div>
-      <div class="right align-left">
-          
-            
-            
-            <a href="/docs/java-tools.html" class="next">Next</a>
-          
-      </div>
-    </div>
-    <div class="clear"></div>
-    
+<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see:</p>
 
-        </article>
-      </div>
+<pre><code class="language-shell">% orc-metadata examples/TestOrcFile.test1.orc
+{ "name": "../examples/TestOrcFile.test1.orc",
+  "type": "struct&lt;boolean1:boolean,byte1:tinyint,short1:smallint,
+int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,
+string1:string,middle:struct&lt;list:array&lt;struct&lt;int1:int,string1:
+string&gt;&gt;&gt;,list:array&lt;struct&lt;int1:int,string1:string&gt;&gt;,map:map&lt;
+string,struct&lt;int1:int,string1:string&gt;&gt;&gt;",
+  "rows": 2,
+  "stripe count": 1,
+  "format": "0.12", "writer version": "HIVE-8732",
+  "compression": "zlib", "compression block": 10000,
+  "file length": 1711,
+  "content": 1015, "stripe stats": 250, "footer": 421, "postscript": 24,
+  "row index stride": 10000,
+  "user metadata": {
+  },
+  "stripes": [
+    { "stripe": 0, "rows": 2,
+      "offset": 3, "length": 1012,
+      "index": 570, "data": 243, "footer": 199
+    }
+  ]
+}
+</code></pre>
 
-      <div class="unit one-fifth hide-on-mobiles">
-  <aside>
-    
-    <h4>Overview</h4>
-    
+<h2 id="csv-import">csv-import</h2>
 
-<ul>
+<p>Imports CSV file into an Orc file using the specified schema.
+Compound types are not yet supported. <code>delimiter</code> option indicates
+the delimiter in the input CSV file and by default is <code>,</code>. <code>stripe</code>
+option means the stripe size and set to 128MB by default. <code>block</code>
+option is compression block size which is 64KB by default. <code>batch</code>
+option is by default 1024 rows for one batch.</p>
 
-  
+<pre><code class="language-shell">% csv-import [--delimiter=&lt;character&gt;] [--stripe=&lt;size&gt;]
+             [--block=&lt;size&gt;] [--batch=&lt;size&gt;]
+             &lt;schema&gt; &lt;inputCSVFile&gt; &lt;outputORCFile&gt;
+</code></pre>
 
-  
-    
-  
+<p>If you run it on the example file TestCSVFileImport.test10rows.csv,
+you’ll see:</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/index.html">Background</a></li>
-      
+<pre><code class="language-shell">% csv-import "struct&lt;a:bigint,b:string,c:double&gt;"
+             examples/TestCSVFileImport.test10rows.csv /tmp/test.orc
+[2018-04-11 11:12:16] Start importing Orc file...
+[2018-04-11 11:12:16] Finish importing Orc file.
+[2018-04-11 11:12:16] Total writer elasped time: 0.001352s.
+[2018-04-11 11:12:16] Total writer CPU time: 0.001339s.
+</code></pre>
 
+<h2 id="orc-scan">orc-scan</h2>
 
-  
+<p>Scans and displays the row count of the ORC file. With the <code>batch</code> option
+to set the batch size which is 1024 rows by default. It is useful to check
+if the ORC file is damaged.</p>
 
-  
-    
-  
+<pre><code class="language-shell">% orc-scan [--batch=&lt;size&gt;] &lt;filename&gt;
+</code></pre>
 
-  
-    
-  
-    
-      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
-      
+<p>If you run it on the example file TestOrcFile.test1.orc, you’ll see:</p>
 
+<pre><code class="language-shell">% orc-scan examples/TestOrcFile.test1.orc
+Rows: 2
+Batches: 1
+</code></pre>
 
-  
+<h2 id="orc-statistics">orc-statistics</h2>
 
-  
-    
-  
+<p>Displays the file-level and stripe-level column statistics of the ORC file.
+With the <code>withIndex</code> option to include column statistics in each row group.</p>
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/types.html">Types</a></li>
-      
+<pre><code class="language-shell">% orc-statistics [--withIndex] &lt;filename&gt;
+</code></pre>
 
+<p>If you run it on the example file TestOrcFile.TestOrcFile.columnProjection.orc
+you’ll see:</p>
 
-  
+<pre><code class="language-shell">% orc-statistics examples/TestOrcFile.columnProjection.orc
+File examples/TestOrcFile.columnProjection.orc has 3 columns
+*** Column 0 ***
+Column has 21000 values and has null value: no
 
-  
-    
-  
+*** Column 1 ***
+Data type: Integer
+Values: 21000
+Has null: no
+Minimum: -2147439072
+Maximum: 2147257982
+Sum: 268482658568
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
-      
+*** Column 2 ***
+Data type: String
+Values: 21000
+Has null: no
+Minimum: 100119c272d7db89
+Maximum: fffe9f6f23b287f3
+Total length: 334559
 
+File examples/TestOrcFile.columnProjection.orc has 5 stripes
+*** Stripe 0 ***
 
-  
+--- Column 0 ---
+Column has 5000 values and has null value: no
 
-  
-    
-  
+--- Column 1 ---
+Data type: Integer
+Values: 5000
+Has null: no
+Minimum: -2145365268
+Maximum: 2147025027
+Sum: -29841423854
 
-  
-    
-      <li class=""><a href="/docs/acid.html">ACID support</a></li>
-      
+--- Column 2 ---
+Data type: String
+Values: 5000
+Has null: no
+Minimum: 1005350489418be2
+Maximum: fffbb8718c92b09f
+Total length: 79644
 
+*** Stripe 1 ***
 
-</ul>
+--- Column 0 ---
+Column has 5000 values and has null value: no
 
-    
-    <h4>Installing</h4>
-    
+--- Column 1 ---
+Data type: Integer
+Values: 5000
+Has null: no
+Minimum: -2147115959
+Maximum: 2147257982
+Sum: 108604887785
 
-<ul>
+--- Column 2 ---
+Data type: String
+Values: 5000
+Has null: no
+Minimum: 100119c272d7db89
+Maximum: fff0ae41d41e6afc
+Total length: 79640
 
-  
+*** Stripe 2 ***
 
-  
-    
-  
+--- Column 0 ---
+Column has 5000 values and has null value: no
 
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/building.html">Building ORC</a></li>
-      
+--- Column 1 ---
+Data type: Integer
+Values: 5000
+Has null: no
+Minimum: -2145932387
+Maximum: 2145877119
+Sum: 70064190848
 
+--- Column 2 ---
+Data type: String
+Values: 5000
+Has null: no
+Minimum: 10130af874ae036c
+Maximum: fffe9f6f23b287f3
+Total length: 79645
 
-  
+*** Stripe 3 ***
 
-  
-    
-  
+--- Column 0 ---
+Column has 5000 values and has null value: no
 
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/releases.html">Releases</a></li>
-      
+--- Column 1 ---
+Data type: Integer
+Values: 5000
+Has null: no
+Minimum: -2147439072
+Maximum: 2147074354
+Sum: 104681356482
 
+--- Column 2 ---
+Data type: String
+Values: 5000
+Has null: no
+Minimum: 102547d48ed06518
+Maximum: fffa47c57dc7b69a
+Total length: 79689
+
+*** Stripe 4 ***
+
+--- Column 0 ---
+Column has 1000 values and has null value: no
+
+--- Column 1 ---
+Data type: Integer
+Values: 1000
+Has null: no
+Minimum: -2141222223
+Maximum: 2145816096
+Sum: 14973647307
+
+--- Column 2 ---
+Data type: String
+Values: 1000
+Has null: no
+Minimum: 1059d81c9025a217
+Maximum: ffc17f0e35e1a6c0
+Total length: 15941
+</code></pre>
+
+          
 
-</ul>
 
-    
-    <h4>Using in Hive</h4>
-    
 
-<ul>
 
-  
 
   
-    
   
 
   
-    
-  
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
-      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
-      
 
+  
+  
 
   
+  
 
   
-    
   
 
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
-    
+
   
-    
   
+    <div class="section-nav">
+      <div class="left align-right">
+          
+            
+            
+            <a href="/docs/core-cpp.html" class="prev">Back</a>
+          
+      </div>
+      <div class="right align-left">
+          
+            
+            
+            <a href="/docs/java-tools.html" class="next">Next</a>
+          
+      </div>
+    </div>
+    <div class="clear"></div>
     
-      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
-      
 
+        </article>
+      </div>
 
-</ul>
-
+      <div class="unit one-fifth hide-on-mobiles">
+  <aside>
     
-    <h4>Using in MapReduce</h4>
+    <h4>Overview</h4>
     
 
 <ul>
@@ -1891,19 +1028,21 @@ Total length: 15941
     
   
     
+      <li class=""><a href="/docs/index.html">Background</a></li>
+      
+
+
   
-    
-  
-    
+
   
     
   
-    
+
   
     
   
     
-      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
+      <li class=""><a href="/docs/adopters.html">ORC Adopters</a></li>
       
 
 
@@ -1943,20 +1082,10 @@ Total length: 15941
     
   
     
-  
-    
-      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      <li class=""><a href="/docs/types.html">Types</a></li>
       
 
 
-</ul>
-
-    
-    <h4>Using ORC Core</h4>
-    
-
-<ul>
-
   
 
   
@@ -1975,34 +1104,34 @@ Total length: 15941
     
   
     
-      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
-      
-
-
-  
-
   
     
   
-
-  
     
   
     
   
     
+      <li class=""><a href="/docs/indexes.html">Indexes</a></li>
+      
+
+
+  
+
   
     
   
+
+  
     
-      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      <li class=""><a href="/docs/acid.html">ACID support</a></li>
       
 
 
 </ul>
 
     
-    <h4>Tools</h4>
+    <h4>Installing</h4>
     
 
 <ul>
@@ -2019,15 +1148,7 @@ Total length: 15941
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class="current"><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      <li class=""><a href="/docs/building.html">Building ORC</a></li>
       
 
 
@@ -2065,14 +1186,14 @@ Total length: 15941
     
   
     
-      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>
+      <li class=""><a href="/docs/releases.html">Releases</a></li>
       
 
 
 </ul>
 
     
-    <h4>Format Specification</h4>
+    <h4>Using in Hive</h4>
     
 
 <ul>
@@ -2099,31 +1220,7 @@ Total length: 15941
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/spec-intro.html">Introduction</a></li>
+      <li class=""><a href="/docs/hive-ddl.html">Hive DDL</a></li>
       
 
 
@@ -2147,31 +1244,17 @@ Total length: 15941
     
   
     
-  
-    
-  
-    
-      <li class=""><a href="/docs/file-tail.html">File Tail</a></li>
+      <li class=""><a href="/docs/hive-config.html">Hive Configuration</a></li>
       
 
 
-  
-
-  
-    
-  
+</ul>
 
-  
-    
-  
     
-  
-    
-  
+    <h4>Using in MapReduce</h4>
     
-      <li class=""><a href="/docs/compression.html">Compression</a></li>
-      
 
+<ul>
 
   
 
@@ -2203,19 +1286,7 @@ Total length: 15941
     
   
     
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-  
-    
-      <li class=""><a href="/docs/run-length.html">Run Length Encoding</a></li>
+      <li class=""><a href="/docs/mapred.html">Using in MapRed</a></li>
       
 
 
@@ -2251,13 +1322,25 @@ Total length: 15941
     
   
     
-  
+      <li class=""><a href="/docs/mapreduce.html">Using in MapReduce</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Using ORC Core</h4>
     
+
+<ul>
+
+  
+
   
     
   
+
+  
     
   
     
@@ -2267,7 +1350,7 @@ Total length: 15941
     
   
     
-      <li class=""><a href="/docs/stripes.html">Stripes</a></li>
+      <li class=""><a href="/docs/core-java.html">Using Core Java</a></li>
       
 
 
@@ -2285,17 +1368,17 @@ Total length: 15941
     
   
     
-  
-    
-  
-    
-  
+      <li class=""><a href="/docs/core-cpp.html">Using Core C++</a></li>
+      
+
+
+</ul>
+
     
-  
+    <h4>Tools</h4>
     
-      <li class=""><a href="/docs/encodings.html">Column Encodings</a></li>
-      
 
+<ul>
 
   
 
@@ -2315,11 +1398,17 @@ Total length: 15941
     
   
     
+      <li class="current"><a href="/docs/cpp-tools.html">C++ Tools</a></li>
+      
+
+
   
-    
+
   
     
   
+
+  
     
   
     
@@ -2341,7 +1430,7 @@ Total length: 15941
     
   
     
-      <li class=""><a href="/docs/spec-index.html">Indexes</a></li>
+      <li class=""><a href="/docs/java-tools.html">Java Tools</a></li>