You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Andrei Stankevich (JIRA)" <ji...@apache.org> on 2017/06/27 22:17:00 UTC

[jira] [Updated] (PARQUET-1046) Impossible to read thrift object from parquet file if it has List field that was removed from thrift schema.

     [ https://issues.apache.org/jira/browse/PARQUET-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrei Stankevich updated PARQUET-1046:
---------------------------------------
    Description: 
If thrift class has a field with type List<some_enum> ParquetReader makes list's elements type as enum (type id = 16) but it has to make it Int32.

What happens is all fields that have field type as enum in thrift schema file in java class have field type as Int32. Same is true for List fields if list's elements are enum.

But when ParquetReader creates an object it uses type enum for list's elements instead of Int32.
Because of this fact we have an issue. We can not remove list field if it has enum elements. If we remove field like this from schema file but it will present in parquet file, when ParquetReader reads this field it tries to skip it because this field is not in the schema and it calls method TProtocolUtil.skip method with type = 15 for list and then it calls same method for each list element with type 16 for enum but TProtocolUtil.skip doesn't have this type in switch-case and it is not skipping list elements and because of this it throws exception later when it tries to skip List end.

  was:
If thrift class has a field with type List<some_enum> ParquetReader makes list's elements type as enum (type id = 16) but it has to make it Int32.

What happens is all fields that have field type as enum in thrift schema file in java class have field type as Int32. Same is true for List fields if list's elements are enum.

But when ParquetReader creates an object it uses type enum for list's elements instead of Int32.
Because of this fact we have an issue. We can not remove list field if it has enum elements. If we remove field like this from schema file but it will present in parquet file, when ParquetReader reads this field it tries to skip it
because this field is not in the schema and it calls method TProtocolUtil.skip method with type = 15 for list and then it calls same method for each list element with type 16 for enum but TProtocolUtil.skip doesn't have
this type in switch-case and it is not skipping list elements and because of this it throws exception later when it tries to skip List end.


> Impossible to read thrift object from parquet file if it has List<Enum> field that was removed from thrift schema.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1046
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1046
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Andrei Stankevich
>
> If thrift class has a field with type List<some_enum> ParquetReader makes list's elements type as enum (type id = 16) but it has to make it Int32.
> What happens is all fields that have field type as enum in thrift schema file in java class have field type as Int32. Same is true for List fields if list's elements are enum.
> But when ParquetReader creates an object it uses type enum for list's elements instead of Int32.
> Because of this fact we have an issue. We can not remove list field if it has enum elements. If we remove field like this from schema file but it will present in parquet file, when ParquetReader reads this field it tries to skip it because this field is not in the schema and it calls method TProtocolUtil.skip method with type = 15 for list and then it calls same method for each list element with type 16 for enum but TProtocolUtil.skip doesn't have this type in switch-case and it is not skipping list elements and because of this it throws exception later when it tries to skip List end.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)