You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ying Zhou (Jira)" <ji...@apache.org> on 2021/02/07 22:39:00 UTC

[jira] [Commented] (ARROW-11117) [C++] ORC Reader uses wrong types

    [ https://issues.apache.org/jira/browse/ARROW-11117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280690#comment-17280690 ] 

Ying Zhou commented on ARROW-11117:
-----------------------------------

Dev complete for 3. After testing I will close this one and include it in the 8648 PR.

> [C++] ORC Reader uses wrong types
> ---------------------------------
>
>                 Key: ARROW-11117
>                 URL: https://issues.apache.org/jira/browse/ARROW-11117
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Ying Zhou
>            Assignee: Ying Zhou
>            Priority: Major
>              Labels: orc
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The Arrow C++ ORC reader does not process types correctly. In particular it does the following:
> 1. It converts the ORC STRING type to the Arrow STRING type despite the fact that all ORC STRINGs are large.
> 2. It converts the ORC LIST type to the Arrow LIST type despite the fact that all ORC LISTs are large.
> 3. It converts the ORC MAP type to LISTS of STRUCTS with hardcoded field names while an actual MAP type exists in Arrow (note that the ORC MAPs are large so we need to filter out large ones when converting). 
> These issues need to be fixed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)