You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Scott Carey <sc...@apache.org> on 2013/01/08 10:08:08 UTC

Re: Support for char[] and short[] - Java

You can cast both short and char safely to int and back, and use Avro's int
type.  These will be variable length integer encoded and take 1 to 3 bytes
in binary form per short/char.
This will be clunky as a user to wrap char[] or short[] into List<Integer>
or int[] however.  Another option would be to extend the reader to look for
special meta-data in the schema that indicates that an array of int is to be
interpreted as shorts or chars.

Can you give an example where a char[] converted to utf8 bytes and back
results in a loss of data?  I was under the impression that UTF-16 surrogate
pairs are converted to proper UTF-8 sequences and back to surrogate pairs.
Or, are you using char to represent something else, as a two byte unsigned
quantity where interpreting as UTF-16 causes the problem?

On 12/23/12 10:30 PM, "Tarun Gupta" <ta...@technogica.com> wrote:

> Hi, 
> 
> I am new Avro but I did some basic research regarding how do we a support data
> types like Char arrays and Short arrays while defining the Avro schema. Issue
> # AVRO-249 sounded somewhat relevant but its about supporting Short using the
> reflection API. 
> 
> We are planning to use Avro for a Java based Client Server data exchange use
> case, note that our data model is expected to have "large arrays" of Short and
> Char, and performance is our 'key concern'. We can't use a string to store
> char[], because what we get back is different then what you put in, because of
> "UTF-16 normalization".
> 
> Thanks in Advance.
> Tarun Gupta