You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2022/11/18 03:25:50 UTC

[orc] branch asf-site updated: Update website with ORC-1283 and ORC-1295

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new c7d6b42b5 Update website with ORC-1283 and ORC-1295
c7d6b42b5 is described below

commit c7d6b42b5d7219a6fd98a33a9178c0e1d3794ccb
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Thu Nov 17 19:25:41 2022 -0800

    Update website with ORC-1283 and ORC-1295
---
 docs/hive-config.html          |   2 +-
 docs/java-tools.html           |  17 +++++++++++++++++
 docs/spark-config.html         |   2 +-
 img/Direct.png                 | Bin 0 -> 64400 bytes
 specification/ORCv1/index.html |   7 +++++++
 specification/ORCv2/index.html |   7 +++++++
 6 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/docs/hive-config.html b/docs/hive-config.html
index f184aaf63..d6e7f1a77 100644
--- a/docs/hive-config.html
+++ b/docs/hive-config.html
@@ -954,7 +954,7 @@ with the same options.</p>
     <tr>
       <td style="text-align: left">orc.create.index</td>
       <td style="text-align: left">true</td>
-      <td style="text-align: left">create indexes?</td>
+      <td style="text-align: left">whether the ORC writer create indexes as part of the file or not</td>
     </tr>
     <tr>
       <td style="text-align: left">orc.bloom.filter.columns</td>
diff --git a/docs/java-tools.html b/docs/java-tools.html
index aef32a626..8ca6ad5ea 100644
--- a/docs/java-tools.html
+++ b/docs/java-tools.html
@@ -931,6 +931,7 @@ supports both the local file system and HDFS.</p>
   <li>key (since ORC 1.5) - print information about the encryption keys</li>
   <li>meta - print the metadata of an ORC file</li>
   <li>scan (since ORC 1.3) - scan the data for benchmarking</li>
+  <li>sizes (since ORC 1.7.2) - list size on disk of each column</li>
   <li>version (since ORC 1.6) - print the version of this ORC tool</li>
 </ul>
 
@@ -1211,6 +1212,22 @@ cost of printing the data out.</p>
   <dd>Print exceptions</dd>
 </dl>
 
+<h2 id="java-sizes">Java Sizes</h2>
+
+<p>The sizes command lists size on disk of each column. The output contains not
+only the raw data of the table, but also the size of metadata such as <code class="highlighter-rouge">padding</code>,
+<code class="highlighter-rouge">stripeFooter</code>, <code class="highlighter-rouge">fileFooter</code>, <code class="highlighter-rouge">stripeIndex</code> and <code class="highlighter-rouge">stripeData</code>.</p>
+
+<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% java <span class="nt">-jar</span> orc-tools-X.Y.Z-uber.jar sizes examples/my-file.orc
+Percent  Bytes/Row  Name
+  98.45  2.62       y
+  0.81   0.02       _file_footer
+  0.30   0.01       _index
+  0.25   0.01       x
+  0.19   0.01       _stripe_footer
+______________________________________________________________________
+</code></pre></div></div>
+
 <h2 id="java-version">Java Version</h2>
 
 <p>The version command prints the version of this ORC tool.</p>
diff --git a/docs/spark-config.html b/docs/spark-config.html
index bc3e1f0ed..e7298576f 100644
--- a/docs/spark-config.html
+++ b/docs/spark-config.html
@@ -954,7 +954,7 @@ with the same options.</p>
     <tr>
       <td style="text-align: left">orc.create.index</td>
       <td style="text-align: left">true</td>
-      <td style="text-align: left">create indexes?</td>
+      <td style="text-align: left">whether the ORC writer create indexes as part of the file or not</td>
     </tr>
     <tr>
       <td style="text-align: left">orc.bloom.filter.columns</td>
diff --git a/img/Direct.png b/img/Direct.png
new file mode 100644
index 000000000..eadf5ff87
Binary files /dev/null and b/img/Direct.png differ
diff --git a/specification/ORCv1/index.html b/specification/ORCv1/index.html
index b08f73988..a7b29ee4c 100644
--- a/specification/ORCv1/index.html
+++ b/specification/ORCv1/index.html
@@ -1026,6 +1026,13 @@ serialized with direct encoding (1), a width of 16 bits (15), and
 length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
 0xbe, 0xef].</p>
 
+<blockquote>
+  <p>Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
+(See <a href="https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236">Hive-4123</a>)</p>
+</blockquote>
+
+<p><img src="/img/Direct.png" alt="Direct" /></p>
+
 <h3 id="patched-base">Patched Base</h3>
 
 <p>The patched base encoding is used for integer sequences whose bit
diff --git a/specification/ORCv2/index.html b/specification/ORCv2/index.html
index e95b14949..d886115ae 100644
--- a/specification/ORCv2/index.html
+++ b/specification/ORCv2/index.html
@@ -1050,6 +1050,13 @@ serialized with direct encoding (1), a width of 16 bits (15), and
 length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
 0xbe, 0xef].</p>
 
+<blockquote>
+  <p>Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
+(See <a href="https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236">Hive-4123</a>)</p>
+</blockquote>
+
+<p><img src="/img/Direct.png" alt="Direct" /></p>
+
 <h3 id="patched-base">Patched Base</h3>
 
 <p>The patched base encoding is used for integer sequences whose bit