You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by om...@apache.org on 2020/03/27 22:30:16 UTC
[orc] branch master updated: ORC-519: Correct encoding for Decimal
columns in specification
This is an automated email from the ASF dual-hosted git repository.
omalley pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/master by this push:
new a9ec6a2 ORC-519: Correct encoding for Decimal columns in specification
a9ec6a2 is described below
commit a9ec6a2e39ed71ef8a2d874df14700956aa847be
Author: Norbert Luksa <no...@cloudera.com>
AuthorDate: Fri Feb 21 15:53:16 2020 +0100
ORC-519: Correct encoding for Decimal columns in specification
Currently the ORC specifications state that the secondary stream of
decimal columns are Unsigned RLE, but according to the code (both
Java and C++), signed RLE is used.
Fixes #484
Signed-off-by: Owen O'Malley <om...@apache.org>
---
site/specification/ORCv0.md | 4 ++--
site/specification/ORCv1.md | 6 +++---
site/specification/ORCv2.md | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/site/specification/ORCv0.md b/site/specification/ORCv0.md
index 2ca58b6..861b819 100644
--- a/site/specification/ORCv0.md
+++ b/site/specification/ORCv0.md
@@ -664,13 +664,13 @@ the precision to a maximum of 38 digits, which conveniently uses 127
bits plus a sign bit. The current encoding of decimal columns stores
the integer representation of the value as an unbounded length zigzag
encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+as a signed integer.
Encoding | Stream Kind | Optional | Contents
:------------ | :-------------- | :------- | :-------
DIRECT | PRESENT | Yes | Boolean RLE
| DATA | No | Unbounded base 128 varints
- | SECONDARY | No | Unsigned Integer RLE v1
+ | SECONDARY | No | Signed Integer RLE v1
## Date Columns
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index db63356..812c962 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -1097,16 +1097,16 @@ the precision to a maximum of 38 digits, which conveniently uses 127
bits plus a sign bit. The current encoding of decimal columns stores
the integer representation of the value as an unbounded length zigzag
encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+as a signed integer.
Encoding | Stream Kind | Optional | Contents
:------------ | :-------------- | :------- | :-------
DIRECT | PRESENT | Yes | Boolean RLE
| DATA | No | Unbounded base 128 varints
- | SECONDARY | No | Unsigned Integer RLE v1
+ | SECONDARY | No | Signed Integer RLE v1
DIRECT_V2 | PRESENT | Yes | Boolean RLE
| DATA | No | Unbounded base 128 varints
- | SECONDARY | No | Unsigned Integer RLE v2
+ | SECONDARY | No | Signed Integer RLE v2
## Date Columns
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index b84dac9..74eb567 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -1121,7 +1121,7 @@ DIRECT | PRESENT | Yes | Boolean RLE
| DATA | No | Signed Integer RLE v2
DIRECT_V2 | PRESENT | Yes | Boolean RLE
| DATA | No | Unbounded base 128 varints
- | SECONDARY | No | Unsigned Integer RLE v2
+ | SECONDARY | No | Signed Integer RLE v2
## Date Columns