You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/03/27 21:52:13 UTC

[GitHub] [orc] williamhyun opened a new pull request #672: ORC-775: Fix a regression on column names with dot.

williamhyun opened a new pull request #672:
URL: https://github.com/apache/orc/pull/672


   ### What changes were proposed in this pull request?
   
   This PR aims to fix regression on column names with a dot character. 
   
   ### Why are the changes needed?
   
   Since ORC-696, we can not read the orc files with column names including a dot. For example, the following test file was read incorrectly.
   ```
   % orc-tools meta core/src/test/resources/col.dot.orc
   Processing data file core/src/test/resources/col.dot.orc [length: 235]
   Structure for core/src/test/resources/col.dot.orc
   File Version: 0.12 with ORC_517
   Rows: 1
   Compression: SNAPPY
   Compression size: 262144
   Calendar: Julian/Gregorian
   Type: struct<`col.dot`:bigint>
   
   Stripe Statistics:
     Stripe 1:
       Column 0: count: 1 hasNull: false
       Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
   
   File Statistics:
     Column 0: count: 1 hasNull: false
     Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
   
   Stripes:
     Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
       Stream: column 0 section ROW_INDEX start: 3 length 11
       Stream: column 1 section ROW_INDEX start: 14 length 24
       Stream: column 1 section DATA start: 38 length 6
       Encoding column 0: DIRECT
       Encoding column 1: DIRECT_V2
   
   File length: 235 bytes
   Padding length: 0 bytes
   Padding ratio: 0%
   
   User Metadata:
     org.apache.spark.version=3.1.1
   ________________________________________________________________________________________________________________________
   ```
   
   
   ### How was this patch tested?
   Pass the CIs with the newly added test case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #672: ORC-775: Fix a regression on column names with dot.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #672:
URL: https://github.com/apache/orc/pull/672


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #672: ORC-775: Fix a regression on column names with dot.

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #672:
URL: https://github.com/apache/orc/pull/672#discussion_r602796997



##########
File path: java/core/src/test/org/apache/orc/TestReader.java
##########
@@ -78,4 +78,11 @@ public void testReadFileInvalidHeader() throws Exception {
     OrcFile.createReader(testFilePath,
       OrcFile.readerOptions(conf).filesystem(fs));
   }
+
+  @Test
+  public void testReadDocColumn() throws Exception {
+    Path path = new Path(getClass().getClassLoader().getSystemResource("col.dot.orc").getPath());

Review comment:
       Minor comment: Write, and Read the file within the Test itselft as it would more self-explanatory (col-names, rows etc.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] williamhyun commented on pull request #672: ORC-775: Fix a regression on column names with dot.

Posted by GitBox <gi...@apache.org>.
williamhyun commented on pull request #672:
URL: https://github.com/apache/orc/pull/672#issuecomment-808950151


   Thank you for the review and merge! @pgaref @dongjoon-hyun 
   I will try to make a PR for that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] williamhyun commented on pull request #672: ORC-775: Fix a regression on column names with dot.

Posted by GitBox <gi...@apache.org>.
williamhyun commented on pull request #672:
URL: https://github.com/apache/orc/pull/672#issuecomment-808807657


   Could you review this please, @pgaref ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org