You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2023/01/20 15:38:43 UTC

[GitHub] [avro] davidgm0 opened a new pull request, #2062: [ruby] Keep all union record errors and tag them accordingly

davidgm0 opened a new pull request, #2062:
URL: https://github.com/apache/avro/pull/2062

   I wasn't able to create an issue in JIRA, since I don't seem to be able to request a user. I am a first time contributor, apologies  in case something is not precise enough
   
   ## What is the purpose of the change
   
   Improving validation errors for unions of records
   When no valid union is found, this line selects the **first** error that it finds. If there would be 3 different possible schemas, there are 3 `failures`, however only the first one will be considered.
   
   https://github.com/apache/avro/blob/fc2a4e0e5d88755c776b97fa1872e70ffbe0d4bb/lang/ruby/lib/avro/schema_validator.rb#L199
   
   The returned error also does not give any context of the error, only about the invalid fields.
   
   As an example, consider we have 3 types of adresses `PersonalAddress`, `WorkAddress`, `SecondAddress` and we made a typo: We used the field `compan` instead of `company`, which belongs to `WorkAddress`. No type will work for the union.
   `{ "compan" => "acme inc.", "some_other_field" => "something else" }`
   
   Since `PersonalAddress` is the first type that fails, the errors of `PersonalAddress` are the ones that will be returned. Since `PersonalAddress` has different structure, all fields that aren't the same as in `WorkAddress` will be shown as errors. The error will be something like
   
   ```
   at .address extra field 'compan' - not in schema
   at .address extra field 'some_other_field' - not in schema
   ```
   
   This gets very confusing because 1) The issue is that no union matched 2) It is not clear because only one of the error is displayed.
   
   I'd be happy if this is fix in some other fashion, this is just an example. With we replace `detect` with `select` and we add the schema name to the error. Then the error will be like:
   
   ```
   PersonalAddress at .address extra field 'compan' - not in schema
   PersonalAddress at .address extra field 'some_other_field' - not in schema
   WorkAddress at .address extra field 'compan' - not in schema
   SecondAddress at .address extra field 'compan' - not in schema
   SecondAddress at .address extra field 'some_other_field' - not in schema
   ```
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - Modified existing test, existing tests pass
   
   
   ## Documentation
   
   - Does this pull request introduce a new feature?  no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on a diff in pull request #2062: [ruby] Keep all union record errors and tag them accordingly

Posted by "martin-g (via GitHub)" <gi...@apache.org>.
martin-g commented on code in PR #2062:
URL: https://github.com/apache/avro/pull/2062#discussion_r1084117029


##########
lang/ruby/lib/avro/schema_validator.rb:
##########
@@ -208,11 +213,16 @@ def validate_union(expected_schema, datum, path, result, options)
       def first_compatible_type(datum, expected_schema, path, failures, options = {})
         expected_schema.schemas.find do |schema|
           # Avoid expensive validation if we're just validating a nil
-          next datum.nil? if schema.type_sym == :null
+          if schema.type_sym == :null
+            next datum.nil?
+          end
 
           result = Result.new
           validate_recursive(schema, datum, path, result, options)
-          failures << { type: schema.type_sym, result: result } if result.failure?
+
+          failure = { type: schema.type_sym, result: result }
+          failure[:schema] = schema.name if schema.is_a?(Avro::Schema::RecordSchema)

Review Comment:
   What about the other named schemas : Enum and Fixed ?



##########
lang/ruby/lib/avro/schema_validator.rb:
##########
@@ -208,11 +213,16 @@ def validate_union(expected_schema, datum, path, result, options)
       def first_compatible_type(datum, expected_schema, path, failures, options = {})
         expected_schema.schemas.find do |schema|
           # Avoid expensive validation if we're just validating a nil
-          next datum.nil? if schema.type_sym == :null
+          if schema.type_sym == :null
+            next datum.nil?
+          end
 
           result = Result.new
           validate_recursive(schema, datum, path, result, options)
-          failures << { type: schema.type_sym, result: result } if result.failure?
+
+          failure = { type: schema.type_sym, result: result }
+          failure[:schema] = schema.name if schema.is_a?(Avro::Schema::RecordSchema)
+          failures << failure if result.failure?

Review Comment:
   IMO it would be cleaner if the whole block is wrapped in `if`, as the change you made above (L216).
   There is no need to create `failure` object if `result.failure?` is `false`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #2062: [ruby] Keep all union record errors and tag them accordingly

Posted by "martin-g (via GitHub)" <gi...@apache.org>.
martin-g commented on PR #2062:
URL: https://github.com/apache/avro/pull/2062#issuecomment-1400435300

   > I wasn't able to create an issue in JIRA
   
   JIRA issue is needed because this is what we use for the release changelog later.
   You can request a JIRA account at private@avro.apache.org. Send us your preferred username, display name and email address.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org