You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2022/11/18 03:25:50 UTC
[orc] branch asf-site updated: Update website with ORC-1283 and ORC-1295
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c7d6b42b5 Update website with ORC-1283 and ORC-1295
c7d6b42b5 is described below
commit c7d6b42b5d7219a6fd98a33a9178c0e1d3794ccb
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Thu Nov 17 19:25:41 2022 -0800
Update website with ORC-1283 and ORC-1295
---
docs/hive-config.html | 2 +-
docs/java-tools.html | 17 +++++++++++++++++
docs/spark-config.html | 2 +-
img/Direct.png | Bin 0 -> 64400 bytes
specification/ORCv1/index.html | 7 +++++++
specification/ORCv2/index.html | 7 +++++++
6 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/docs/hive-config.html b/docs/hive-config.html
index f184aaf63..d6e7f1a77 100644
--- a/docs/hive-config.html
+++ b/docs/hive-config.html
@@ -954,7 +954,7 @@ with the same options.</p>
<tr>
<td style="text-align: left">orc.create.index</td>
<td style="text-align: left">true</td>
- <td style="text-align: left">create indexes?</td>
+ <td style="text-align: left">whether the ORC writer create indexes as part of the file or not</td>
</tr>
<tr>
<td style="text-align: left">orc.bloom.filter.columns</td>
diff --git a/docs/java-tools.html b/docs/java-tools.html
index aef32a626..8ca6ad5ea 100644
--- a/docs/java-tools.html
+++ b/docs/java-tools.html
@@ -931,6 +931,7 @@ supports both the local file system and HDFS.</p>
<li>key (since ORC 1.5) - print information about the encryption keys</li>
<li>meta - print the metadata of an ORC file</li>
<li>scan (since ORC 1.3) - scan the data for benchmarking</li>
+ <li>sizes (since ORC 1.7.2) - list size on disk of each column</li>
<li>version (since ORC 1.6) - print the version of this ORC tool</li>
</ul>
@@ -1211,6 +1212,22 @@ cost of printing the data out.</p>
<dd>Print exceptions</dd>
</dl>
+<h2 id="java-sizes">Java Sizes</h2>
+
+<p>The sizes command lists size on disk of each column. The output contains not
+only the raw data of the table, but also the size of metadata such as <code class="highlighter-rouge">padding</code>,
+<code class="highlighter-rouge">stripeFooter</code>, <code class="highlighter-rouge">fileFooter</code>, <code class="highlighter-rouge">stripeIndex</code> and <code class="highlighter-rouge">stripeData</code>.</p>
+
+<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% java <span class="nt">-jar</span> orc-tools-X.Y.Z-uber.jar sizes examples/my-file.orc
+Percent Bytes/Row Name
+ 98.45 2.62 y
+ 0.81 0.02 _file_footer
+ 0.30 0.01 _index
+ 0.25 0.01 x
+ 0.19 0.01 _stripe_footer
+______________________________________________________________________
+</code></pre></div></div>
+
<h2 id="java-version">Java Version</h2>
<p>The version command prints the version of this ORC tool.</p>
diff --git a/docs/spark-config.html b/docs/spark-config.html
index bc3e1f0ed..e7298576f 100644
--- a/docs/spark-config.html
+++ b/docs/spark-config.html
@@ -954,7 +954,7 @@ with the same options.</p>
<tr>
<td style="text-align: left">orc.create.index</td>
<td style="text-align: left">true</td>
- <td style="text-align: left">create indexes?</td>
+ <td style="text-align: left">whether the ORC writer create indexes as part of the file or not</td>
</tr>
<tr>
<td style="text-align: left">orc.bloom.filter.columns</td>
diff --git a/img/Direct.png b/img/Direct.png
new file mode 100644
index 000000000..eadf5ff87
Binary files /dev/null and b/img/Direct.png differ
diff --git a/specification/ORCv1/index.html b/specification/ORCv1/index.html
index b08f73988..a7b29ee4c 100644
--- a/specification/ORCv1/index.html
+++ b/specification/ORCv1/index.html
@@ -1026,6 +1026,13 @@ serialized with direct encoding (1), a width of 16 bits (15), and
length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
0xbe, 0xef].</p>
+<blockquote>
+ <p>Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
+(See <a href="https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236">Hive-4123</a>)</p>
+</blockquote>
+
+<p><img src="/img/Direct.png" alt="Direct" /></p>
+
<h3 id="patched-base">Patched Base</h3>
<p>The patched base encoding is used for integer sequences whose bit
diff --git a/specification/ORCv2/index.html b/specification/ORCv2/index.html
index e95b14949..d886115ae 100644
--- a/specification/ORCv2/index.html
+++ b/specification/ORCv2/index.html
@@ -1050,6 +1050,13 @@ serialized with direct encoding (1), a width of 16 bits (15), and
length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad,
0xbe, 0xef].</p>
+<blockquote>
+ <p>Note: the run length(4) is one-off. We can get 4 by adding 1 to 3
+(See <a href="https://github.com/apache/hive/commit/69deabeaac020ba60b0f2156579f53e9fe46157a#diff-c00fea1863eaf0d6f047535e874274199020ffed3eb00deb897f513aa86f6b59R232-R236">Hive-4123</a>)</p>
+</blockquote>
+
+<p><img src="/img/Direct.png" alt="Direct" /></p>
+
<h3 id="patched-base">Patched Base</h3>
<p>The patched base encoding is used for integer sequences whose bit