You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "Michael A. Smith" <mi...@smith-li.com> on 2019/11/06 13:21:31 UTC

Testing parsing canonical form string and integer rules

Another two things I'm struggling to understand in PCF are the strings and
integers rules

   - [STRINGS] For all JSON string literals in the schema text, replace any
   escaped characters (e.g., \uXXXX escapes) with their UTF-8 equivalents.
   - [INTEGERS] Eliminate quotes around and any leading zeros in front of
   JSON integer literals (which appear in the sizeattributes of fixed
    schemas).

These are clear enough on their faces, but I can't come up with a valid
test case for either one.

For strings, once you apply the strip rule, there don't seem to be any
parts left that could contain a Unicode escape. Names, for example, have a
very limited set of characters they can contain.

For integers, the allowed field types are literal numbers anyway, so I
don't see how they could have quotes around them, and I'd expect every
language implementation of json to remove leading zeroes before avro gets
close.

Can someone help me figure out how to test the Python implementation of PCF
with valid schema test cases for these rules?

Re: Testing parsing canonical form string and integer rules

Posted by Raymie Stata <rs...@yahoo.com.INVALID>.

These rules are written as if you're directly parsing the underlying
UTF text making up an Avro schema.  It sounds like you're using a JSON
parser, which most likely implements both of those rules for you,
leaving nothing in your code to test.

(If you're paranoid, you could add tests of the underlying parser
against these rules, which could guard against bugs introduced by a
new version of the parser, or a switch to a different library.  This
would be more motivated by STRINGS -- it's harder to imagine a JSON
parser being buggy for INTEGER -- but it doesn't seem like a high
priority in either case.)

On Wed, Nov 6, 2019 at 5:21 AM Michael A. Smith <mi...@smith-li.com> wrote:
>
> Another two things I'm struggling to understand in PCF are the strings and
> integers rules
>
>    - [STRINGS] For all JSON string literals in the schema text, replace any
>    escaped characters (e.g., \uXXXX escapes) with their UTF-8 equivalents.
>    - [INTEGERS] Eliminate quotes around and any leading zeros in front of
>    JSON integer literals (which appear in the sizeattributes of fixed
>     schemas).
>
> These are clear enough on their faces, but I can't come up with a valid
> test case for either one.
>
> For strings, once you apply the strip rule, there don't seem to be any
> parts left that could contain a Unicode escape. Names, for example, have a
> very limited set of characters they can contain.
>
> For integers, the allowed field types are literal numbers anyway, so I
> don't see how they could have quotes around them, and I'd expect every
> language implementation of json to remove leading zeroes before avro gets
> close.
>
> Can someone help me figure out how to test the Python implementation of PCF
> with valid schema test cases for these rules?