You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Dustin Spicuzza (JIRA)" <ji...@apache.org> on 2014/09/23 19:05:34 UTC

[jira] [Commented] (AVRO-1343) Python: validate too permissive on records with extra fields

    [ https://issues.apache.org/jira/browse/AVRO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145034#comment-14145034 ] 

Dustin Spicuzza commented on AVRO-1343:
---------------------------------------

It would be good to fix this, we didn't realize it was so permissive until we lost 3 hours to a typo in a field name. Additionally, the validation code as-is is pretty terrible when you're trying to figure out which field is wrong in complex structures. We actually subclass DatumWriter and replace the validate function with a custom function because of this issue. I could contribute a patch, but there doesn't appear to be activity on the python version of avro in awhile.

> Python: validate too permissive on records with extra fields
> ------------------------------------------------------------
>
>                 Key: AVRO-1343
>                 URL: https://issues.apache.org/jira/browse/AVRO-1343
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Jeremy Kahn
>            Assignee: Jeremy Kahn
>             Fix For: 1.8.0
>
>         Attachments: AVRO-1343-tests.patch, AVRO-1343-validate.patch
>
>
> Python's validator silently accepts (generic) records with extra fields and considers them valid.
> For example, {{io.validate}} silently considers that the schema:
> {noformat}{"type": "record",
>  "name": "Test",
>  "fields": [{"name": "f", "type": "long"}]}
> {noformat}
> should accept records like:
> {noformat}{'f': 5, 'extra_field': "abc"}{noformat}
> but this is problematic.
> This is *especially* problematic for encoding unions, because internally the Python serializer uses {{validate}} to find the appropriate schema with which to encode a given object.
> In the current implementation, union schema selection is the *last* schema that {{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't picky, this encoding will frequently guess wrong.
> I will attach two patches: one to the tests and one to the {{validate}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)