You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by om...@apache.org on 2017/06/19 22:33:33 UTC
orc git commit: Deploying site with Dain's documentation updates.
Repository: orc
Updated Branches:
refs/heads/asf-site 28d825c2c -> cce469c77
Deploying site with Dain's documentation updates.
Signed-off-by: Owen O'Malley <om...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/orc/repo
Commit: http://git-wip-us.apache.org/repos/asf/orc/commit/cce469c7
Tree: http://git-wip-us.apache.org/repos/asf/orc/tree/cce469c7
Diff: http://git-wip-us.apache.org/repos/asf/orc/diff/cce469c7
Branch: refs/heads/asf-site
Commit: cce469c77b64510795617ea31c8ce753b402e3a8
Parents: 28d825c
Author: Owen O'Malley <om...@apache.org>
Authored: Mon Jun 19 15:32:48 2017 -0700
Committer: Owen O'Malley <om...@apache.org>
Committed: Mon Jun 19 15:32:48 2017 -0700
----------------------------------------------------------------------
docs/compression.html | 9 +++++----
docs/encodings.html | 9 ++++++---
docs/file-tail.html | 2 +-
3 files changed, 12 insertions(+), 8 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/orc/blob/cce469c7/docs/compression.html
----------------------------------------------------------------------
diff --git a/docs/compression.html b/docs/compression.html
index 94aeefe..e96d5b1 100644
--- a/docs/compression.html
+++ b/docs/compression.html
@@ -1080,10 +1080,11 @@ start decompressing without the previous bytes.</p>
<p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
<p>The default compression chunk size is 256K, but writers can choose
-their own value less than 223. Larger chunks lead to better
-compression, but require more memory. The chunk size is recorded in
-the Postscript so that readers can allocate appropriately sized
-buffers.</p>
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
+size.</p>
<p>ORC files without generic compression write each stream directly
with no headers.</p>
http://git-wip-us.apache.org/repos/asf/orc/blob/cce469c7/docs/encodings.html
----------------------------------------------------------------------
diff --git a/docs/encodings.html b/docs/encodings.html
index b9fa2b0..bcc663a 100644
--- a/docs/encodings.html
+++ b/docs/encodings.html
@@ -1139,9 +1139,12 @@ bytes.</p>
<h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
-<p>String columns are adaptively encoded based on whether the first
-10,000 values are sufficiently distinct. In all of the encodings, the
-PRESENT stream encodes whether the value is null.</p>
+<p>String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).</p>
<p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
the length of each value is written into the LENGTH stream. In direct
http://git-wip-us.apache.org/repos/asf/orc/blob/cce469c7/docs/file-tail.html
----------------------------------------------------------------------
diff --git a/docs/file-tail.html b/docs/file-tail.html
index b4cf021..2fc4461 100644
--- a/docs/file-tail.html
+++ b/docs/file-tail.html
@@ -1230,7 +1230,7 @@ that contains the list of their children’s type ids.</p>
repeated uint32 subtypes = 2 [packed=true];
// the list of field names for struct
repeated string fieldNames = 3;
- // the maximum length of the type for varchar or char
+ // the maximum length of the type for varchar or char in UTF-8 characters
optional uint32 maximumLength = 4;
// the precision and scale for decimal
optional uint32 precision = 5;