You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/01/05 23:58:39 UTC

[jira] [Commented] (ORC-29) ColumnPrinter should be able to print only specified columns

    [ https://issues.apache.org/jira/browse/ORC-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084003#comment-15084003 ] 

ASF GitHub Bot commented on ORC-29:
-----------------------------------

Github user omalley commented on the pull request:

    https://github.com/apache/orc/pull/9#issuecomment-169161130
  
    Sorry for the long pause on this one.
    
    I wanted to make some changes to Type that it made sense to roll together with this capability. In particular, rather than making ColumnPrinter understand the selected vector, I wanted to add a method to Reader to getSelectedType, which returns the schema filtered by the selected vector. Note that the column ids in the selected type are the same as the original file schema, so getting statistics works.
    
    You can see my take in https://github.com/omalley/orc/tree/orc-29
    
    Other changes:
    * I removed the Type.assignIds(0), which was always annoying. Now column ids are assigned automatically when the first one is requested.
    * I removed the uncalled kind2String.
    * I added Type.getMaximumColumnId, which returns the largest column id under the given type. That makes it easy to sweep through the selected vector to hit the right ids.
    * I added a ReaderOptions.include(std::vector<std::string>) that selects column names. Nothing uses or tests it yet, but it will let us extend the tools to use name-based column selection.
    * I moved the createRowBatch from the Reader to the Type class. 
    * I moved the Type class to its own Type.hh out of Vector.hh.
    * I simplified the internal implementation of TypeImpl.
    * I tried to be more consistent to use uint64_t for columnIds.
    * Added TestType for testing the TypeImpl.
    * Fix the Reader.seekToRow implementation to use ColumnReader->skip rather than next.


> ColumnPrinter should be able to print only specified columns
> ------------------------------------------------------------
>
>                 Key: ORC-29
>                 URL: https://issues.apache.org/jira/browse/ORC-29
>             Project: Orc
>          Issue Type: Improvement
>            Reporter: Aliaksei Sandryhaila
>            Assignee: Aliaksei Sandryhaila
>
> file-contents prints out the entire ORC file. It will be very handy to specify which columns to print (e.g. for reading/debugging complex-type columns)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)