You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Aliaksei Sandryhaila (JIRA)" <ji...@apache.org> on 2015/10/12 15:47:05 UTC

[jira] [Resolved] (ORC-28) Reading a subset of complex-type columns does not select the right columns

     [ https://issues.apache.org/jira/browse/ORC-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksei Sandryhaila resolved ORC-28.
-------------------------------------
    Resolution: Fixed

> Reading a subset of complex-type columns does not select the right columns
> --------------------------------------------------------------------------
>
>                 Key: ORC-28
>                 URL: https://issues.apache.org/jira/browse/ORC-28
>             Project: Orc
>          Issue Type: Bug
>            Reporter: Aliaksei Sandryhaila
>            Assignee: Aliaksei Sandryhaila
>
> Selected columns are set through ReaderOptions.include() and correspond to the top-level columns in an ORC file. ReaderImpl constructor uses this info to determine which physical columns to read from the file. The current implementation does not do this correctly.
> Reproducer:
> examples/TestOrcFile.testSeek.orc contains 12 top-level columns:
> 1: boolean
> 2-4: int
> 5-6: double
> 8: binary
> 9:string
> 10: struct<array<struct<int,string>>>
> 11: array<struct<int,string>>
> 12: map<string,struct<int,string>>
> The physical layout in the file is:
> 1: boolean
> 2-4: int
> 5-6: double
> 8: binary
> 9:string
> 10: struct
> 11: array
> 12: struct
> 13: int
> 14: string
> 15: array
> 16: struct
> 17: int
> 18: string
> 19: map
> 20: string
> 21: struct
> 22: int
> 23: string
> Trying to read column 11, which is array<struct<int,string>>, ReaderImpl actually reads column 10, because it treats 11 as the index of the physical column, and physical column 11 is a subcolumn of column 10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)