You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "justaparth (via GitHub)" <gi...@apache.org> on 2023/04/29 23:43:44 UTC

[GitHub] [spark] justaparth commented on a diff in pull request #40686: [SPARK-43051][CONNECTOR] Add option to materialize zero values for fields without presence information (proto3 scalars)

justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1181141585


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have `hasField`
+    // called on them. And we only materialize zero values for fields without presence
+    // information because that flag controls the behavior for those fields in the ambiguous
+    // case of "unset" or "set to zero value". See the docs in [[ProtobufOptions]] for more
+    // details.
+    if (
+      field.isRepeated
+        || record.hasField(field)
+        || field.hasDefaultValue
+        || (!field.hasPresence && this.materializeZeroValues)) {

Review Comment:
   > Message type is pretty important.
   > You have a test covering it, that is good.
   
   aren't all the types important? by this logic we should have a callout for every single type. i agree message is important and "not a scalar" so i thought a comment + test suite that explicitly asserts the behavior would help us prevent any problems
   
   > We don't want to serialize messages even if some future java library changes meaning of 'hasPresence'
   
   I don't think this would be possible, right? Because that would mean protobuf changes the sematics of their protocol, which doesn't seem possible (it would probably break proto parsing).
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org