You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/06/10 00:12:19 UTC

[GitHub] [beam] TheNeuralBit commented on a change in pull request #11955: [BEAM-10220] add support for implicit nulls for converting between beam rows and json

TheNeuralBit commented on a change in pull request #11955:
URL: https://github.com/apache/beam/pull/11955#discussion_r437784412



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
       super(Row.class);
       this.schema = schema;
     }
+  
+    public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {

Review comment:
       Similar comment here, `withIgnoreNullsOnWrite` and add a docstring. (I think checkstyle will complain without the docstring anyway).

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
       this.schema = schema;
     }
 
+    public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
+      this.allowMissingFields = allowMissing;
+      return this;
+  }
+
     @Override
     public Row deserialize(JsonParser jsonParser, DeserializationContext deserializationContext)
         throws IOException {
 
       // Parse and convert the root object to Row as if it's a nested field with name 'root'
       return (Row)
           extractJsonNodeValue(
-              FieldValue.of("root", FieldType.row(schema), jsonParser.readValueAsTree()));
+              FieldValue.of("root", FieldType.row(schema), jsonParser.readValueAsTree(), allowMissingFields));
     }
 
+  
+
     private static Object extractJsonNodeValue(FieldValue fieldValue) {
-      if (!fieldValue.isJsonValuePresent()) {

Review comment:
       I think you could just check `this.allowMissingFields` here rather than passing it into all the `FieldValue` instances, no?

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
       this.schema = schema;
     }
 
+    public RowJsonDeserializer allowMissingFields(Boolean allowMissing){

Review comment:
       nit: could you change this to `withAllowMissingFields`? 
   
   Also please add a docstring

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
       super(Row.class);
       this.schema = schema;
     }
+  
+    public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {

Review comment:
       I might call it "DropNullsOnWrite" instead of ignore, but I don't feel strongly about it 

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -375,6 +400,9 @@ private void writeRow(Row row, Schema schema, JsonGenerator gen) throws IOExcept
       for (int i = 0; i < schema.getFieldCount(); ++i) {
         Field field = schema.getField(i);
         Object value = row.getValue(i);
+        if (ignoreNullsOnWrite && value == null){
+          continue;
+        }

Review comment:
       This should also check `field.getType().getNullable()` like the other conditional. If we get a null for a non-nullable field we should fail loudly rather than silently dropping it.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##########
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
       this.schema = schema;
     }
 
+    public RowJsonDeserializer allowMissingFields(Boolean allowMissing){

Review comment:
       We might want to make this an enum so in the future there could be a third mode where nulls _must_ be encoded with a missing field, and having a null field value would be considered an error. The mode you've added here is a permissive middle ground where we allow either one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org