You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "tddfan (via GitHub)" <gi...@apache.org> on 2023/06/07 16:03:42 UTC

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

tddfan commented on code in PR #1102:
URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221851162


##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java:
##########
@@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {
     this.conf = conf;
     this.parent = pvc;
     this.extraMetadata = extraMetadata;
-    int parquetFieldIndex = 1;
+    boolean ignoreUnknownFields = conf.getBoolean("IGNORE_UNKNOWN_FIELDS", false);
+
+    myBuilder = builder;
 
     if (pvc == null) {
       throw new IllegalStateException("Missing parent value container");
     }
 
-    myBuilder = builder;
+    if(builder == null && ignoreUnknownFields) {
+      IntStream.range(0, parquetSchema.getFieldCount())
+        .forEach(i-> converters[i] = dummyScalarConverter(DUMMY_PVC, parquetSchema.getType(i), conf, extraMetadata));
 
-    Descriptors.Descriptor protoDescriptor = builder.getDescriptorForType();
+    } else {
 
-    for (Type parquetField : parquetSchema.getFields()) {
-      Descriptors.FieldDescriptor protoField = protoDescriptor.findFieldByName(parquetField.getName());
+      int parquetFieldIndex = 0;
+      Descriptors.Descriptor protoDescriptor =  builder.getDescriptorForType();
 
-      if (protoField == null) {
-        String description = "Scheme mismatch \n\"" + parquetField + "\"" +
-                "\n proto descriptor:\n" + protoDescriptor.toProto();
-        throw new IncompatibleSchemaModificationException("Cant find \"" + parquetField.getName() + "\" " + description);
-      }
+      for (Type parquetField : parquetSchema.getFields()) {

Review Comment:
   Fixed the indentation. Yes it was bit off.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org