You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/03 15:36:00 UTC
[jira] [Commented] (ORC-343) Enable C++ writer to support RleV2
[ https://issues.apache.org/jira/browse/ORC-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499454#comment-16499454 ]
ASF GitHub Bot commented on ORC-343:
------------------------------------
Github user majetideepak commented on a diff in the pull request:
https://github.com/apache/orc/pull/273#discussion_r192593861
--- Diff: c++/src/RLEv2.hh ---
@@ -25,13 +25,89 @@
#include <vector>
+#define MIN_REPEAT 3
+#define HIST_LEN 32
namespace orc {
-class RleDecoderV2 : public RleDecoder {
+struct FixedBitSizes {
+ enum FBS {
+ ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN, TWELVE,
+ THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, NINETEEN,
+ TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX,
+ TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, SIXTYFOUR, SIZE
+ };
+};
+
+enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 };
+
+struct EncodingOption {
+ EncodingType encoding;
+ int64_t fixedDelta;
+ int64_t gapVsPatchListCount;
+ int64_t zigzagLiteralsCount;
+ int64_t baseRedLiteralsCount;
+ int64_t adjDeltasCount;
+ uint32_t zzBits90p;
+ uint32_t zzBits100p;
+ uint32_t brBits95p;
+ uint32_t brBits100p;
+ uint32_t bitsDeltaMax;
+ uint32_t patchWidth;
+ uint32_t patchGapWidth;
+ uint32_t patchLength;
+ int64_t min;
+ bool isFixedDelta;
+};
+
+class RleEncoderV2 : public RleEncoder {
public:
+ RleEncoderV2(std::unique_ptr<BufferedOutputStream> outStream, bool hasSigned, bool alignBitPacking = true);
--- End diff --
`alignedBitPacking` is always true. Should we add a WriterOption to enable/disable it?
Java uses the Encoding Strategy to choose this. C++ currently does not have this.
```
java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java:144
if (writer.getEncodingStrategy().equals(OrcFile.EncodingStrategy.SPEED)) {
alignedBitpacking = true;
}
```
> Enable C++ writer to support RleV2
> ----------------------------------
>
> Key: ORC-343
> URL: https://issues.apache.org/jira/browse/ORC-343
> Project: ORC
> Issue Type: New Feature
> Components: C++
> Reporter: Yurui Zhou
> Priority: Major
>
> Currently only the Java implementation support RleV2 encoder, the C++ implementation only support RleV2 decoding.
> The issue aims to enable the c++ writer to support RleV2 encoding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)