You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by om...@apache.org on 2017/06/19 22:31:41 UTC

orc git commit: Fix the documentation issues that Dain brought up.

Repository: orc
Updated Branches:
  refs/heads/master 54c54775a -> cdfc1ea47


Fix the documentation issues that Dain brought up.

Fixes #133

Signed-off-by: Owen O'Malley <om...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/orc/repo
Commit: http://git-wip-us.apache.org/repos/asf/orc/commit/cdfc1ea4
Tree: http://git-wip-us.apache.org/repos/asf/orc/tree/cdfc1ea4
Diff: http://git-wip-us.apache.org/repos/asf/orc/diff/cdfc1ea4

Branch: refs/heads/master
Commit: cdfc1ea47584d5aee2e2dc3dcca597d53ba5527a
Parents: 54c5477
Author: Owen O'Malley <om...@apache.org>
Authored: Mon Jun 19 13:18:43 2017 -0700
Committer: Owen O'Malley <om...@apache.org>
Committed: Mon Jun 19 15:30:40 2017 -0700

----------------------------------------------------------------------
 site/_docs/compression.md | 9 +++++----
 site/_docs/encodings.md   | 9 ++++++---
 site/_docs/file-tail.md   | 2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/orc/blob/cdfc1ea4/site/_docs/compression.md
----------------------------------------------------------------------
diff --git a/site/_docs/compression.md b/site/_docs/compression.md
index 62cc199..aee2640 100644
--- a/site/_docs/compression.md
+++ b/site/_docs/compression.md
@@ -23,10 +23,11 @@ start decompressing without the previous bytes.
 ![compression streams]({{ site.url }}/img/CompressionStream.png)
 
 The default compression chunk size is 256K, but writers can choose
-their own value less than 223. Larger chunks lead to better
-compression, but require more memory. The chunk size is recorded in
-the Postscript so that readers can allocate appropriately sized
-buffers.
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
+size.
 
 ORC files without generic compression write each stream directly
 with no headers.

http://git-wip-us.apache.org/repos/asf/orc/blob/cdfc1ea4/site/_docs/encodings.md
----------------------------------------------------------------------
diff --git a/site/_docs/encodings.md b/site/_docs/encodings.md
index 285ca71..9c565dc 100644
--- a/site/_docs/encodings.md
+++ b/site/_docs/encodings.md
@@ -32,9 +32,12 @@ DIRECT    | PRESENT     | Yes      | Boolean RLE
 
 ## String, Char, and VarChar Columns
 
-String columns are adaptively encoded based on whether the first
-10,000 values are sufficiently distinct. In all of the encodings, the
-PRESENT stream encodes whether the value is null.
+String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).
 
 For direct encoding the UTF-8 bytes are saved in the DATA stream and
 the length of each value is written into the LENGTH stream. In direct

http://git-wip-us.apache.org/repos/asf/orc/blob/cdfc1ea4/site/_docs/file-tail.md
----------------------------------------------------------------------
diff --git a/site/_docs/file-tail.md b/site/_docs/file-tail.md
index d2700bb..316c001 100644
--- a/site/_docs/file-tail.md
+++ b/site/_docs/file-tail.md
@@ -173,7 +173,7 @@ that contains the list of their children's type ids.
  repeated uint32 subtypes = 2 [packed=true];
  // the list of field names for struct
  repeated string fieldNames = 3;
- // the maximum length of the type for varchar or char
+ // the maximum length of the type for varchar or char in UTF-8 characters
  optional uint32 maximumLength = 4;
  // the precision and scale for decimal
  optional uint32 precision = 5;