You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/03/31 11:26:21 UTC

[GitHub] [orc] dirtysalt opened a new pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

dirtysalt opened a new pull request #676:
URL: https://github.com/apache/orc/pull/676


   The first 5 prime numbers are [2,3,5,7,11] and the encoded data have three parts
   - 2 [0xff, 0x02]
   - 3,5,7 (delta=2,length=3,base=3) [0x00, 0x02, 0x03]
   - 11 [0xff, 0xb]
   
   BTW: if numbers are [2,3,4,7,11], the encoded data is also wrong. It should be
   - 2,3,4 (delta=1, length=3, base=2) [0x00, 0x01, 0x02]
   - 7,11  [0xfe, 0x7, 0xb]
   
   And I have verified the encoded data with the following C++ code. It used RLEEncoderv1 to encode 5 prime numbers and dump data to a local file called `rle.data`. 
   
   ```c++
   
   int main() {
       auto pool = getDefaultPool();
       auto outputStream = writeLocalFile("rle.data");
       auto buf = new BufferedOutputStream(*pool, outputStream.get(), 1024, 32);
       std::unique_ptr<BufferedOutputStream> pbuf(buf);
       auto encoder = new RleEncoderV1(std::move(pbuf), false);
       encoder->write(2);
       encoder->write(3);
       encoder->write(5);
       encoder->write(7);
       encoder->write(11);
       encoder->flush();
       return 0;
   }
   ```
   
   And then I use `od -A n -t x1 rle.data` to see data in hex format. The result is following
   
   ```
    ff 02 00 02 03 ff 0b
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #676: MINOR: Fix RLEv1 encoding examples in ORCv0/v1/v2 docs

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #676:
URL: https://github.com/apache/orc/pull/676


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #676:
URL: https://github.com/apache/orc/pull/676#issuecomment-812980442


   Thank you for updating, @dirtysalt . Could you revise the PR description accordingly together? For example, `first 5 prime numbers`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dirtysalt commented on a change in pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dirtysalt commented on a change in pull request #676:
URL: https://github.com/apache/orc/pull/676#discussion_r606724082



##########
File path: temp/.gitignore
##########
@@ -0,0 +1,2 @@
+*.data
+*.exe

Review comment:
       Same issue, sorry. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #676:
URL: https://github.com/apache/orc/pull/676#issuecomment-811619937


   Thanks @dirtysalt for the good catch! The illustration in the documentation is wrong as you have noticed. But I'd like to remove the word "prime" from that section as it aims to illustrate rule of LITERAL (not RUN) in the RLEv1. The correction may also consider ORCv0.md and ORCv2.md.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #676: MINOR: Fix RLEv1 encoding examples in ORCv0/v1/v2 docs

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #676:
URL: https://github.com/apache/orc/pull/676#issuecomment-812980957


   For PR title, I updated it according to the AS-IS PR content.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dirtysalt commented on pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dirtysalt commented on pull request #676:
URL: https://github.com/apache/orc/pull/676#issuecomment-811647264


   @wgtmac But `[2,3,4,7,11]` also has a run which is `[2,3,4]`. Better to update to `[2,3,6,7,11]`, what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dirtysalt commented on a change in pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dirtysalt commented on a change in pull request #676:
URL: https://github.com/apache/orc/pull/676#discussion_r606724067



##########
File path: .gitignore
##########
@@ -9,3 +9,4 @@ target
 *.iws
 .idea
 .DS_Store
+.vscode/

Review comment:
       Yes. Sorry, I don't expect this PR  will trace my master branch commit automatically. Reverted already.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #676:
URL: https://github.com/apache/orc/pull/676#discussion_r606696559



##########
File path: .gitignore
##########
@@ -9,3 +9,4 @@ target
 *.iws
 .idea
 .DS_Store
+.vscode/

Review comment:
       Hi, @dirtysalt . This is orthogonal change from the proposed document fix. Apache ORC recommends that each commit should have one theme. Could you revert this change from this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dirtysalt commented on pull request #676: MINOR: Fix RLEv1 encoding examples in ORCv0/v1/v2 docs

Posted by GitBox <gi...@apache.org>.
dirtysalt commented on pull request #676:
URL: https://github.com/apache/orc/pull/676#issuecomment-812995913


   Yes. Updated. I add a short description of PR at the head. I want to keep the old description because it provides some detailed explanation why not to use those examples. Not sure it's ok or not. Thanks for your review and suggestion. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #676: Update ORCv1.md: Fix inaccurate encoded data in RLEv1 encoding

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #676:
URL: https://github.com/apache/orc/pull/676#discussion_r606696617



##########
File path: temp/.gitignore
##########
@@ -0,0 +1,2 @@
+*.data
+*.exe

Review comment:
       Why do you add `temp` directory, @dirtysalt ? Could you revert this `temp` directory?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org