You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ben Watson (Jira)" <ji...@apache.org> on 2020/06/05 21:52:00 UTC

[jira] [Commented] (PARQUET-1870) Handle INT96 more gracefully in parquet-avro

    [ https://issues.apache.org/jira/browse/PARQUET-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127110#comment-17127110 ] 

Ben Watson commented on PARQUET-1870:
-------------------------------------

For backstory, I maintain an [Avro and Parquet Viewer IntelliJ plugin|https://github.com/benwatson528/intellij-avro-parquet-plugin] that allows Avro and Parquet files to be displayed visually, and a repeated complaint is that it's not possible to view files containing INT96 columns.

I have been able to solve this by replacing [AvroSchemaConverter#308|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L307-L309]:
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
  throw new IllegalArgumentException("INT96 not implemented and is deprecated");
}
{code}
with
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
  return Schema.create(Schema.Type.BYTES);
}
{code}
This results in gibberish being printed, but at least the files are displayed.

I'm happy to raise a PR for this, but first want to check that this is an acceptable solution and that no one else has any better ideas.

> Handle INT96 more gracefully in parquet-avro
> --------------------------------------------
>
>                 Key: PARQUET-1870
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1870
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-avro
>    Affects Versions: 1.11.0
>            Reporter: Ben Watson
>            Priority: Minor
>
> The parquet-avro library does not support INT96 columns (PARQUET-323), and any attempt to process a file containing such a column results in:
> {code:java}
> throw new IllegalArgumentException("INT96 not implemented and is deprecated");{code}
> INT96 is still used in many legacy datasets, and so it would be useful to be able to process Parquet files containing these records, even if the INT96 values themselves aren't rendered.
> The same functionality has already been re-added into parquet-pig (PARQUET-1133).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)