You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Aliaksei Sandryhaila (JIRA)" <ji...@apache.org> on 2015/09/23 17:16:04 UTC

[jira] [Created] (ORC-28) Reading a subset of complex-type columns does not select the right columns

Aliaksei Sandryhaila created ORC-28:
---------------------------------------

             Summary: Reading a subset of complex-type columns does not select the right columns
                 Key: ORC-28
                 URL: https://issues.apache.org/jira/browse/ORC-28
             Project: Orc
          Issue Type: Bug
            Reporter: Aliaksei Sandryhaila


Selected columns are set through ReaderOptions.include() and correspond to the top-level columns in an ORC file. ReaderImpl constructor uses this info to determine which physical columns to read from the file. The current implementation does not do this correctly.

Reproducer:
examples/TestOrcFile.testSeek.orc contains 12 top-level columns:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct<array<struct<int,string>>>
11: array<struct<int,string>>
12: map<string,struct<int,string>>

The physical layout in the file is:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct
11: array
12: struct
13: int
14: string
15: array
16: struct
17: int
18: string
19: map
20: string
21: struct
22: int
23: string

Trying to read column 11, which is array<struct<int,string>>, ReaderImpl actually reads column 10, because it treats 11 as the index of the physical column, and physical column 11 is a subcolumn of column 10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)