You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/10/01 17:19:27 UTC

[GitHub] [orc] omalley opened a new pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

omalley opened a new pull request #925:
URL: https://github.com/apache/orc/pull/925


   ### What changes were proposed in this pull request?
   
   This patch adds a new tool that accounts for the total size of a set of ORC files. For files written by >= ORC 1.5, you'll get a column breakdown of the file. There are some virtual columns that are included:
   - _index the indexes that are used for skipping inside the stripe
   - _data the data in files written prior to ORC 1.5
   - _stripe_footer the stripe metadata
   - _file_footer the file metadata
   - _padding padding added to align stripes to HDFS block boundaries
   
   I also added a new method on TypeDescription that gets the full field name, which is the inverse of findSubtype.
   
   ### Why are the changes needed?
   
   The tool helps diagnose the compression of a set of files.
   
   ### How was this patch tested?
   
   I added a test of the new TypeDescription.getFullFieldName. I ran the tool over some of the examples and some multiple-terabyte directories of production ORC files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #925:
URL: https://github.com/apache/orc/pull/925#issuecomment-995310203


   Since this is mostly `tools`-only change, I backported this to branch-1.7.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #925:
URL: https://github.com/apache/orc/pull/925#issuecomment-934082609


   I re-triggered the GitHub Action .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #925:
URL: https://github.com/apache/orc/pull/925#issuecomment-934014164


   Let me take a look the CI failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun edited a comment on pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #925:
URL: https://github.com/apache/orc/pull/925#issuecomment-995310203


   Since this is mostly `tools`-only change, I backported this to branch-1.7.
   And, `TypeDescription.java` changes are addition-only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #925: ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #925:
URL: https://github.com/apache/orc/pull/925


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org