You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Kristoffer Sjögren (JIRA)" <ji...@apache.org> on 2016/08/30 13:18:20 UTC

[jira] [Created] (PARQUET-697) ProtoMessageConverter fails for unknown proto fields

Kristoffer Sjögren created PARQUET-697:
------------------------------------------

             Summary: ProtoMessageConverter fails for unknown proto fields
                 Key: PARQUET-697
                 URL: https://issues.apache.org/jira/browse/PARQUET-697
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.8.1
            Reporter: Kristoffer Sjögren


Hi

We have Spark application that reads parquet files and turns them into a Protobuf RDD like the code below [1]. However, if the parquet schema contain fields that doesn't exist in protobuf class an IncompatibleSchemaModificationException [2] is thrown. 

For compatibility reasons it would be nice to make it possible to ignore fields instead of throwing an exception. Maybe as an configuration? The fix for ignoring fields is quite easy, just instantiate an empty PrimitiveConverter instead.

Cheers,
-Kristoffer


[1]
JobConf conf = new JobConf(ctx.hadoopConfiguration());
FileInputFormat.setInputPaths(conf, rawPath);
ProtoReadSupport.setProtobufClass(conf, Msg.class.getName());
NewHadoopRDD<Void, Msg.Builder> rdd =
      new NewHadoopRDD(ctx.sc(), ProtoParquetInputFormat.class, void.class, Msg.class, conf);
rdd.toJavaRDD().foreach(log -> {
  System.out.println(log._2);
});

[2] https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java#L84

[3] converters[parquetFieldIndex - 1] = new PrimitiveConverter() {};



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)