You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/12/31 08:52:13 UTC

[GitHub] [flink] fsk119 commented on a change in pull request #14508: [FLINK-20773][format] Support allow-unescaped-control-chars option for JSON format.

fsk119 commented on a change in pull request #14508:
URL: https://github.com/apache/flink/pull/14508#discussion_r550414819



##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDataDeserializationSchema.java
##########
@@ -90,6 +92,10 @@ public JsonRowDataDeserializationSchema(
         if (hasDecimalType) {
             objectMapper.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
         }
+        if (allowUnescapedControlChars) {

Review comment:
       Please use a field to store the value of `allowUnescapedControlChars` because it determines the behaviour of this class. If one instances set this field false and another instances set this field true, the equal method should return false.

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonFormatFactory.java
##########
@@ -65,6 +66,7 @@
         final boolean failOnMissingField = formatOptions.get(FAIL_ON_MISSING_FIELD);
         final boolean ignoreParseErrors = formatOptions.get(IGNORE_PARSE_ERRORS);
         TimestampFormat timestampOption = JsonOptions.getTimestampFormat(formatOptions);
+        final boolean allowUnescapedControlChars = formatOptions.get(ALLOW_UNESCAPED_CONTROL_CHARS);

Review comment:
       Please add this option into the method `optionalOptions `

##########
File path: flink-formats/flink-json/src/test/java/org/apache/flink/formats/json/JsonFormatFactoryTest.java
##########
@@ -154,7 +168,8 @@ private void testSchemaDeserializationSchema(Map<String, String> options) {
                         InternalTypeInfo.of(ROW_TYPE),
                         false,
                         true,
-                        TimestampFormat.ISO_8601);
+                        TimestampFormat.ISO_8601,
+                        false);

Review comment:
       Please use `true` as input . Because the default value of the option is false.

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/table/descriptors/JsonValidator.java
##########
@@ -56,6 +58,7 @@ public void validate(DescriptorProperties properties) {
 
         properties.validateBoolean(FORMAT_FAIL_ON_MISSING_FIELD, true);
         properties.validateBoolean(FORMAT_IGNORE_PARSE_ERRORS, true);
+        properties.validateBoolean(FORMAT_ALLOW_UNESCAPED_CONTROL_CHARS, true);

Review comment:
       Please don't add new feature for the old descriptor api.

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/debezium/DebeziumJsonDeserializationSchema.java
##########
@@ -94,7 +94,8 @@ public DebeziumJsonDeserializationSchema(
             TypeInformation<RowData> producedTypeInfo,
             boolean schemaInclude,
             boolean ignoreParseErrors,
-            TimestampFormat timestampFormat) {
+            TimestampFormat timestampFormat,

Review comment:
       ditto

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/maxwell/MaxwellJsonDeserializationSchema.java
##########
@@ -70,7 +70,8 @@ public MaxwellJsonDeserializationSchema(
             RowType rowType,
             TypeInformation<RowData> resultTypeInfo,
             boolean ignoreParseErrors,
-            TimestampFormat timestampFormatOption) {
+            TimestampFormat timestampFormatOption,
+            boolean allowUnescapedControlChars) {

Review comment:
       ditto

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDataDeserializationSchema.java
##########
@@ -90,6 +92,10 @@ public JsonRowDataDeserializationSchema(
         if (hasDecimalType) {
             objectMapper.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
         }
+        if (allowUnescapedControlChars) {
+            objectMapper.configure(

Review comment:
       Maybe we can simplify to 
   ```
   objectMapper.configure(
                       JsonReadFeature.ALLOW_UNESCAPED_CONTROL_CHARS.mappedFeature(), allowUnescapedControlChars);
   ```

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDeserializationSchema.java
##########
@@ -114,18 +118,22 @@ private JsonRowDeserializationSchema(
         if (hasDecimalType) {
             objectMapper.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
         }
+        if (allowUnescapedControlChars) {

Review comment:
       Currently the community is moving to the new descriptor api. It not recommended to add new feature for the depricated api.

##########
File path: flink-formats/flink-json/src/test/java/org/apache/flink/formats/json/JsonRowDataSerDeSchemaTest.java
##########
@@ -700,7 +727,10 @@ private void testParseErrors(TestSpec spec) throws Exception {
                                     "Failed to deserialize JSON '{\"id\":\"2019-11-12T18:00:12+0800\"}'."),
                     TestSpec.json("{\"id\":1,\"factor\":799.929496989092949698}")
                             .rowType(ROW(FIELD("id", INT()), FIELD("factor", DECIMAL(38, 18))))
-                            .expect(Row.of(1, new BigDecimal("799.929496989092949698"))));
+                            .expect(Row.of(1, new BigDecimal("799.929496989092949698"))),
+                    TestSpec.json("{\"id\":\"\tstring field\"}")
+                            .rowType(ROW(FIELD("id", STRING())))

Review comment:
       I think we can add `.expectErrorMessage(..)` to also check the failed situation.

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowFormatFactory.java
##########
@@ -79,6 +79,14 @@ public JsonRowFormatFactory() {
                                 schema.ignoreParseErrors();
                             }
                         });
+        descriptorProperties

Review comment:
       Currently the community is moving to the new descriptor api. It not recommended to add new feature for the depricated api. 
   

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/canal/CanalJsonDeserializationSchema.java
##########
@@ -105,7 +106,8 @@ private CanalJsonDeserializationSchema(
                         false, // ignoreParseErrors already contains the functionality of
                         // failOnMissingField
                         ignoreParseErrors,
-                        timestampFormat);
+                        timestampFormat,
+                        allowUnescapedControlChars);

Review comment:
       Please use fields to store the value of the `allowUnescapedControlChars` and `timestampFormat `.
   
   

##########
File path: flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/maxwell/MaxwellJsonFormatFactory.java
##########
@@ -43,6 +43,7 @@
 import java.util.HashSet;
 import java.util.Set;
 
+import static org.apache.flink.formats.json.maxwell.MaxwellJsonOptions.ALLOW_UNESCAPED_CONTROL_CHARS;

Review comment:
       Please add this option into the `optionalOptions.

##########
File path: flink-formats/flink-json/src/test/java/org/apache/flink/formats/json/debezium/DebeziumJsonFormatFactoryTest.java
##########
@@ -134,7 +135,8 @@ public void testSchemaIncludeOption() {
                         InternalTypeInfo.of(PHYSICAL_DATA_TYPE.getLogicalType()),
                         true,
                         true,
-                        TimestampFormat.ISO_8601);
+                        TimestampFormat.ISO_8601,
+                        false);

Review comment:
       use `true` to test.

##########
File path: flink-formats/flink-json/src/test/java/org/apache/flink/formats/json/maxwell/MaxwellJsonFormatFactoryTest.java
##########
@@ -67,7 +67,11 @@
     public void testSeDeSchema() {
         final MaxwellJsonDeserializationSchema expectedDeser =
                 new MaxwellJsonDeserializationSchema(
-                        ROW_TYPE, InternalTypeInfo.of(ROW_TYPE), true, TimestampFormat.ISO_8601);
+                        ROW_TYPE,
+                        InternalTypeInfo.of(ROW_TYPE),
+                        true,
+                        TimestampFormat.ISO_8601,
+                        false);

Review comment:
       use `true` .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org