You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/03/27 21:52:13 UTC
[GitHub] [orc] williamhyun opened a new pull request #672: ORC-775: Fix a regression on column names with dot.
williamhyun opened a new pull request #672:
URL: https://github.com/apache/orc/pull/672
### What changes were proposed in this pull request?
This PR aims to fix regression on column names with a dot character.
### Why are the changes needed?
Since ORC-696, we can not read the orc files with column names including a dot. For example, the following test file was read incorrectly.
```
% orc-tools meta core/src/test/resources/col.dot.orc
Processing data file core/src/test/resources/col.dot.orc [length: 235]
Structure for core/src/test/resources/col.dot.orc
File Version: 0.12 with ORC_517
Rows: 1
Compression: SNAPPY
Compression size: 262144
Calendar: Julian/Gregorian
Type: struct<`col.dot`:bigint>
Stripe Statistics:
Stripe 1:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
File Statistics:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
Stripes:
Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 24
Stream: column 1 section DATA start: 38 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 235 bytes
Padding length: 0 bytes
Padding ratio: 0%
User Metadata:
org.apache.spark.version=3.1.1
________________________________________________________________________________________________________________________
```
### How was this patch tested?
Pass the CIs with the newly added test case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [orc] dongjoon-hyun merged pull request #672: ORC-775: Fix a regression on column names with dot.
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #672:
URL: https://github.com/apache/orc/pull/672
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [orc] pgaref commented on a change in pull request #672: ORC-775: Fix a regression on column names with dot.
Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #672:
URL: https://github.com/apache/orc/pull/672#discussion_r602796997
##########
File path: java/core/src/test/org/apache/orc/TestReader.java
##########
@@ -78,4 +78,11 @@ public void testReadFileInvalidHeader() throws Exception {
OrcFile.createReader(testFilePath,
OrcFile.readerOptions(conf).filesystem(fs));
}
+
+ @Test
+ public void testReadDocColumn() throws Exception {
+ Path path = new Path(getClass().getClassLoader().getSystemResource("col.dot.orc").getPath());
Review comment:
Minor comment: Write, and Read the file within the Test itselft as it would more self-explanatory (col-names, rows etc.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [orc] williamhyun commented on pull request #672: ORC-775: Fix a regression on column names with dot.
Posted by GitBox <gi...@apache.org>.
williamhyun commented on pull request #672:
URL: https://github.com/apache/orc/pull/672#issuecomment-808950151
Thank you for the review and merge! @pgaref @dongjoon-hyun
I will try to make a PR for that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [orc] williamhyun commented on pull request #672: ORC-775: Fix a regression on column names with dot.
Posted by GitBox <gi...@apache.org>.
williamhyun commented on pull request #672:
URL: https://github.com/apache/orc/pull/672#issuecomment-808807657
Could you review this please, @pgaref ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org