You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2015/08/12 18:50:45 UTC

[jira] [Commented] (PARQUET-350) ThriftRecordConverter throws NPE for unrecognized enum values

    [ https://issues.apache.org/jira/browse/PARQUET-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693815#comment-14693815 ] 

Ryan Blue commented on PARQUET-350:
-----------------------------------

I think the reason why we store enums as strings instead of adding logic for explicit ordinals is that we can let the dictionary encoding do the conversion for us. Then we don't have to keep a known list of symbols.

> ThriftRecordConverter throws NPE for unrecognized enum values
> -------------------------------------------------------------
>
>                 Key: PARQUET-350
>                 URL: https://issues.apache.org/jira/browse/PARQUET-350
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Alex Levenson
>
> currently:
> {noformat}
>     @Override
>     public void addBinary(final Binary value) {
>       final int id = enumLookup.get(value);
>       events.add(new ParquetProtocol("readI32() enum") {
>         @Override
>         public int readI32() throws TException {
>           return id;
>         }
>       });
>     }
> {noformat}
> the auto-unboxing from Integer to into throws a NPE when enumLookup.get(value) == null -- we should throw a better exception here that includes the value in question.
> This was actually triggered by someone *renaming* an enum, and because parquet stores enums by *name* instead of ID it is not compatible. I'm not sure why we store enums as strings, but we might want to reconsider that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)