You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Rob Russo (JIRA)" <ji...@apache.org> on 2017/06/27 23:14:00 UTC

[jira] [Commented] (PARQUET-1046) Impossible to read thrift object from parquet file if it has List field that was removed from thrift schema.

    [ https://issues.apache.org/jira/browse/PARQUET-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065646#comment-16065646 ] 

Rob Russo commented on PARQUET-1046:
------------------------------------

For anyone else hitting this issue, at the moment we implemented a workaround by adding an ENUM case to the switch for TProtocolUtil.skip and just made the logic the same as it is for I32. This should be safe since it should be stored as an I32 and nothing else should be internally using the ENUM type so it should only impact the parquet thrift implementation. This seemed to be an easier workaround than modifying the parquet-thrift library.

> Impossible to read thrift object from parquet file if it has List<Enum> field that was removed from thrift schema.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1046
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1046
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Andrei Stankevich
>
> If thrift class has a field with type List<some_enum> ParquetReader makes list's elements type as enum (type id = 16) but it has to make it Int32.
> What happens is all fields that have field type as enum in thrift schema file in java class have field type as Int32. Same is true for List fields if list's elements are enum.
> But when ParquetReader creates an object it uses type enum for list's elements instead of Int32.
> Because of this fact we have an issue. We can not remove list field if it has enum elements. If we remove field like this from schema file but it will present in parquet file, when ParquetReader reads this field it tries to skip it because this field is not in the schema and it calls method TProtocolUtil.skip method with type = 15 for list and then it calls same method for each list element with type 16 for enum but TProtocolUtil.skip doesn't have this type in switch-case and it is not skipping list elements and because of this it throws exception later when it tries to skip List end.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)