You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Lei Sun (Jira)" <ji...@apache.org> on 2020/03/17 23:41:00 UTC

[jira] [Commented] (ORC-613) OrcMapredRecordReader mis-reuse struct object when actual children schema differs

    [ https://issues.apache.org/jira/browse/ORC-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061258#comment-17061258 ] 

Lei Sun commented on ORC-613:
-----------------------------

 

[~omalley] Can you help take a look ? Thanks. 

> OrcMapredRecordReader mis-reuse struct object when actual children schema differs
> ---------------------------------------------------------------------------------
>
>                 Key: ORC-613
>                 URL: https://issues.apache.org/jira/browse/ORC-613
>             Project: ORC
>          Issue Type: Bug
>          Components: Java
>            Reporter: Lei Sun
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When reading from schema like following:  
> {{uniontype <struct<field0, field1, ..., fieldN>, struct<>> }}
> `org.apache.orc.mapreduce.OrcMapreduceRecordReader#nextStruct` will determine if previous object's schema can be reused or not. The determination of this is problematic, since it only checks the top-level type (OrcStruct) but not the schema of OrcStruct. Therefore, if encountering schema like above, and when struct at tag_0 is processed followed with a struct at tag_1, it will reuse the tag_0's struct schema which results in in correct result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)