You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jeno I. Hajdu" <je...@gmail.com> on 2014/09/05 00:54:31 UTC

bytes and fixed handling in Python implementation

Hi,

I have run into an issue with bytes and fixed handling in the Python
implementation, but it might be due to my misunderstanding.

The spec has this bit: 'Default values for bytes and fixed fields are JSON
strings, where Unicode code points 0-255 are mapped to unsigned 8-bit byte
values 0-255. ' and the sample JSON value of '\u00ff', which translates
to u'\xff\xff' , extrapolating from this I expected I should be able to
feed in unicode (within the appropriate range) as well, not just plain
ascii string.

However that fails (in io.py), it first fails in validate() where bytes and
fixed values are expected to be str instances and if that is extended with
unicode then it fails in BinaryEncoder's write_bytes() which tries to do a
struct.pack and expects string as well.

A simple test I have tried in test_io.py :

  def test_fixed(self):
    print_test_name('TEST FIXED')

    datum_to_write = u'\xff\xff'
    datum_to_read = u'\xff\xff'

    test_schema = schema.parse('{"name": "test", "type": "fixed", "size":
2}')
    # test_schema = schema.parse('"bytes"')

    writer, encoder, datum_writer = write_datum(datum_to_write, test_schema)
    datum_read = read_datum(writer, test_schema, test_schema)
    self.assertEquals(datum_to_read, datum_read)

Alternatively in SCHEMAS_TO_VALIDATE I have tried with replacing the test
data for bytes (originally was '12345abcd' string) and fixed (was 'B'
string) with u'\xff', which reproduced the problem too.

Have I misunderstood something about how these types should be used or
should this work and this is a bug / issue with current design?

Thanks and BR,
Jeno