You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Sid Shetye <si...@outlook.com> on 2014/02/10 02:34:00 UTC

unsigned 32bit (uint) in Avro - C# ?

How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro? 

 

It's very bizarre that unsigned aren't discussed at
http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive 

 

 


Re: unsigned 32bit (uint) in Avro - C# ?

Posted by "Pritchard, Charles X. -ND" <Ch...@disney.com>.
On Feb 12, 2014, at 4:04 PM, Sid Shetye <si...@outlook.com>> wrote:

2. unsigned 32/64bit values have been extensively used as primitive types for over 3 decades (i.e. it's held it's ground. Heck, even core Java devs hate that unsigned doesn't exist. eg http://stackoverflow.com/questions/430346/why-doesnt-java-support-unsigned-ints)
3. All other workarounds simply add more friction to development when in reality, working with a primitive data type that's been around "forever" should be very transparent and very fluid.

It does add some friction — but — aren’t we in a space where the lowest common denominator has to be supported?
As you’re pointing out, #2 and #3 are about Java.

We’re not hitting an issue of serialization, afaik; if you’re looking for signed 32 bit, we’ve got that.
If you want unsigned, it seems to me that fixed is just fine for storage.

Do we agree that serialization is not the issue?
Issue I’m seeing here is with actual schema expressivity as well as APIs in other languages.

That’s an area where I’m simply taking this section to heart:
"Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.”

And that section to me screams out for a registry.
With a registry of attributes we could work around issues like this and still keep in sync with each other.

Heck, that “MD5” example in the manual is a great one:

{"type": "fixed", "size": 16, "name": "md5"}

We all know that means md5 — but it’s just untyped in Avro. A registry for things like “md5”, “uint32”, etc, would be a nice to have.
Then our silly selves can go ahead and implement more complex API/deserializers.


-Charles

RE: unsigned 32bit (uint) in Avro - C# ?

Posted by Sid Shetye <si...@outlook.com>.
Went through that thread. None are convincing from a design standpoint because:

1.  Avro is used in non-Java environments. The Avro IDL is language agnostic while the code-gen is language-specific. So the C# code-gen could spit out unsigned. Every language has limitations but not sure why Java's limitations should drive Avro's designs, despite the heritage. (it's going to grow into other languages, right?)
2. unsigned 32/64bit values have been extensively used as primitive types for over 3 decades (i.e. it's held it's ground. Heck, even core Java devs hate that unsigned doesn't exist. eg http://stackoverflow.com/questions/430346/why-doesnt-java-support-unsigned-ints)
3. All other workarounds simply add more friction to development when in reality, working with a primitive data type that's been around "forever" should be very transparent and very fluid.

Stepping off the soapbox, I also have a workaround for future readers. We cast uint<-> int after temporarily disabling arithmetic overflows, and then let Avro handle then as signed varints (aka zipzag varints). As example code: 

int avroInt32; // this is code-gen'd off the IDL
uint csharpUint32; // this is an app domain var 

// to avro DTO
avroInt32 = unchecked((int) csharpUint32);

// from Avro DTO
csharpUint32 = unchecked((uint)avroInt32 );

Pros:
a) Use the encoding compression inherent in varints (eg: stay under 4 bytes till 134,217,727)
b) Keep the application domain logic as unsigned (as it needs to be)
c) Minimize the glue logic / impedance when converting from app domain => DTO domain

Cons:
1) Specific glue code needed because Avro inherits Java's limitations
2) We're still wasting half of the addressable range since we're skipping every other possible varint encoding (reserved for -ve numbers) since we only see +ve numbers. Which means instead of hitting my 5th varint byte after 268,435,455, I now need that 5th byte at half that - 134,217,727. It's not *too* bad but seems wasteful to always transport a bit that's never used (bit 0, a zigzag varint's 'sign bit' will always be 0, carrying no informational content). 

Cheers
Sid

> From: harsh@cloudera.com
> Date: Wed, 12 Feb 2014 17:50:02 +0530
> Subject: Re: unsigned 32bit (uint) in Avro - C# ?
> To: user@avro.apache.org
> 
> See also this past thread on the topic perhaps:
> http://mail-archives.apache.org/mod_mbox/avro-user/201212.mbox/%3c50D38260.8060402@methodstudios.com%3e
> 
> On Mon, Feb 10, 2014 at 3:46 PM, Mika Ristimaki
> <mi...@gmail.com> wrote:
> > Hi,
> >
> > Java doesn't have unsigned primitives, so most likely Avro doesn't support
> > them directly either.
> >
> > -Mika
> >
> > On Feb 10, 2014, at 3:34 AM, Sid Shetye <si...@outlook.com> wrote:
> >
> > How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro?
> >
> > It's very bizarre that unsigned aren't discussed at
> > http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive
> >
> >
> >
> >
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

Re: unsigned 32bit (uint) in Avro - C# ?

Posted by Harsh J <ha...@cloudera.com>.
See also this past thread on the topic perhaps:
http://mail-archives.apache.org/mod_mbox/avro-user/201212.mbox/%3c50D38260.8060402@methodstudios.com%3e

On Mon, Feb 10, 2014 at 3:46 PM, Mika Ristimaki
<mi...@gmail.com> wrote:
> Hi,
>
> Java doesn't have unsigned primitives, so most likely Avro doesn't support
> them directly either.
>
> -Mika
>
> On Feb 10, 2014, at 3:34 AM, Sid Shetye <si...@outlook.com> wrote:
>
> How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro?
>
> It's very bizarre that unsigned aren't discussed at
> http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive
>
>
>
>



-- 
Harsh J

Re: unsigned 32bit (uint) in Avro - C# ?

Posted by Mika Ristimaki <mi...@gmail.com>.
Hi,

Java doesn't have unsigned primitives, so most likely Avro doesn't support them directly either. 

-Mika

On Feb 10, 2014, at 3:34 AM, Sid Shetye <si...@outlook.com> wrote:

> How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro?
>  
> It’s very bizarre that unsigned aren’t discussed at http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive
>  
>