You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Tarun Gupta <ta...@technogica.com> on 2012/12/24 07:30:37 UTC

Support for char[] and short[] - Java

Hi,

I am new Avro but I did some basic research regarding how do we a support
data types like Char arrays and Short arrays while defining the Avro
schema. Issue # AVRO-249 sounded somewhat relevant but its about supporting
Short using the reflection API.

We are planning to use Avro for a Java based Client Server data exchange
use case, note that our data model is expected to have "large arrays" of
Short and Char, and performance is our 'key concern'. We can't use a string
to store char[], because what we get back is different then what you put
in, because of "UTF-16 normalization".

Thanks in Advance.
Tarun Gupta

Re: Support for char[] and short[] - Java

Posted by Scott Carey <sc...@apache.org>.
You can cast both short and char safely to int and back, and use Avro's int
type.  These will be variable length integer encoded and take 1 to 3 bytes
in binary form per short/char.
This will be clunky as a user to wrap char[] or short[] into List<Integer>
or int[] however.  Another option would be to extend the reader to look for
special meta-data in the schema that indicates that an array of int is to be
interpreted as shorts or chars.

Can you give an example where a char[] converted to utf8 bytes and back
results in a loss of data?  I was under the impression that UTF-16 surrogate
pairs are converted to proper UTF-8 sequences and back to surrogate pairs.
Or, are you using char to represent something else, as a two byte unsigned
quantity where interpreting as UTF-16 causes the problem?

On 12/23/12 10:30 PM, "Tarun Gupta" <ta...@technogica.com> wrote:

> Hi, 
> 
> I am new Avro but I did some basic research regarding how do we a support data
> types like Char arrays and Short arrays while defining the Avro schema. Issue
> # AVRO-249 sounded somewhat relevant but its about supporting Short using the
> reflection API. 
> 
> We are planning to use Avro for a Java based Client Server data exchange use
> case, note that our data model is expected to have "large arrays" of Short and
> Char, and performance is our 'key concern'. We can't use a string to store
> char[], because what we get back is different then what you put in, because of
> "UTF-16 normalization".
> 
> Thanks in Advance.
> Tarun Gupta