You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Robert Minsk <ro...@methodstudios.com> on 2012/12/20 22:25:52 UTC

unsigned integer types

I am currently testing Avro for our network serialization for a mix of 
C++ and python programs.  I have noticed that Avro does not offer an 
unsigned 32-bit or unsigned 64-bit integer types.  How are people 
currently handling unsigned integers?  Are there any plans to add 
unsigned integer types?

-- 
Robert Minsk
Systems and Software Engineer

WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
730 Arizona Ave, Santa Monica, CA 90401
O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501 
<tel:+13104346501>

Los Angeles 
<http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles> 



This e-mail and any attachments are intended only for use by the addressee(s) named herein and may contain confidential information. If you are not the intended recipient of this e-mail, you are hereby notified any dissemination, distribution or copying of this email and any attachments is strictly prohibited. If you receive this email in error, please immediately notify the sender by return email and permanently delete the original, any copy and any printout thereof. The integrity and security of e-mail cannot be guaranteed.


Re: unsigned integer types

Posted by Doug Cutting <cu...@apache.org>.
Fixed, like record and enum, is a named type. In Java, a separate class is
defined for each fixed type. So that is the name the fixed type as opposed
to the fixed field within the record type. Does that make sense?

Doug
On Dec 21, 2012 11:50 AM, "Robert Minsk" <ro...@methodstudios.com>
wrote:

>  So what is the second required name field for?  In your example "myFixed".
>
> On 12/21/2012 11:42 AM, Doug Cutting wrote:
>
> That schema's not quite right.
>
>  A fixed schema looks like:
>
>   {"name": "foo", "type": "fixed", "size": 8}
>
>  A record field looks like:
>
>   {"name": "foo", "type": <schema>}
>
>  So a record with a fixed as a field would look like:
>
>  {
>     "type": "record",
>     "name": "recordWithFixed",
>     "fields" : [
>       {"name": "fixedValue", "type": {"name": "myFixed", "type": "fixed",
> "size": 8}}
>     ]
> }
>
>  Alternately you can perhaps just skip the record and use the fixed
> schema directly.
>
>  Doug
>
>
> On Fri, Dec 21, 2012 at 11:18 AM, Robert Minsk <
> robert.minsk@methodstudios.com> wrote:
>
>>  Fixed does not seem to work.
>>
>> avro-1.7.3
>>
>> fixed.json:
>> {
>>     "type": "record",
>>     "name": "fixed",
>>     "fields" : [
>>         {"name": "foo", "type": "fixed", "size": 8}
>>     ]
>> }
>>
>> ./avrogencpp -i fixed.json -o fixed.hh
>> Segmentation fault (core dumped)
>>
>> If I change the fixed.json to:
>> {
>>     "type": "record",
>>     "name": "test_fixed",
>>     "fields" : [
>>         {"name": "foo", "type": "fixed", "size": 8}
>>     ]
>> }
>>
>> ./avrogencpp -i fixed.json -o fixed.hh
>> Failed to parse or compile schema: Unknown type: fixed
>>
>>
>> On 12/21/2012 03:43 AM, Martin Kleppmann wrote:
>>
>> If your numbers are typically small, you can just use a signed type —
>> the sign bit's overhead is insignificant.
>>
>> If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
>> 2^32-1), e.g. because they are hashes or random numbers from that
>> range, you're best off using the 'fixed' type and specifying the
>> number of bytes you want. In this case 'fixed' is more efficient than
>> the variable-length encoding of int/long, because there is no overhead
>> for indicating the length; it's simply stored as that number of bytes,
>> and nothing else.
>>
>> Because those two options cover most use cases, I don't think there
>> are any plans to add unsigned int support to Avro.
>>
>> Martin
>>
>>
>> On 20 December 2012 13:25, Robert Minsk <ro...@methodstudios.com> <ro...@methodstudios.com>
>> wrote:
>>
>>  I am currently testing Avro for our network serialization for a mix of C++
>> and python programs.  I have noticed that Avro does not offer an unsigned
>> 32-bit or unsigned 64-bit integer types.  How are people currently handling
>> unsigned integers?  Are there any plans to add unsigned integer types?
>>
>> --
>> Robert Minsk
>> Systems and Software Engineer
>> WWW.METHODSTUDIOS.COM
>> 730 Arizona Ave, Santa Monica, CA 90401
>> O:+1 310 434 6500 // F:+1 310 434 6501
>>
>>
>>   --
>>  Robert Minsk
>>  Systems and Software Engineer
>>
>>  WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
>>  730 Arizona Ave, Santa Monica, CA 90401
>> O:+1 310 434 6500 <+13104346500> // F:+1 310 434 6501 <+13104346501>
>>
>>    [image: Los Angeles]<http://www.methodstudios.com/signature/url/los-angeles>
>> <http://www.methodstudios.com/signature/url/los-angeles>
>>
>>
>
> --
>  Robert Minsk
>  Systems and Software Engineer
>
>  WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
>  730 Arizona Ave, Santa Monica, CA 90401
> O:+1 310 434 6500 <+13104346500> // F:+1 310 434 6501 <+13104346501>
>
>   [image: Los Angeles]<http://www.methodstudios.com/signature/url/los-angeles>
> <http://www.methodstudios.com/signature/url/los-angeles>
>
>

Re: unsigned integer types

Posted by Robert Minsk <ro...@methodstudios.com>.
I see what the second name is for.  A bit confusing.

If I just have a union the second name is used on the set method. This 
so you can have multiple fixed value sizes in a union.

On 12/21/2012 11:50 AM, Robert Minsk wrote:
> So what is the second required name field for?  In your example "myFixed".
>
> On 12/21/2012 11:42 AM, Doug Cutting wrote:
>> That schema's not quite right.
>>
>> A fixed schema looks like:
>>
>>  {"name": "foo", "type": "fixed", "size": 8}
>>
>> A record field looks like:
>>
>>  {"name": "foo", "type": <schema>}
>>
>> So a record with a fixed as a field would look like:
>>
>> {
>> "type": "record",
>> "name": "recordWithFixed",
>> "fields" : [
>> {"name": "fixedValue", "type": {"name": "myFixed", "type": "fixed", 
>> "size": 8}}
>>     ]
>> }
>>
>> Alternately you can perhaps just skip the record and use the fixed 
>> schema directly.
>>
>> Doug
>>
>>
>> On Fri, Dec 21, 2012 at 11:18 AM, Robert Minsk 
>> <robert.minsk@methodstudios.com 
>> <ma...@methodstudios.com>> wrote:
>>
>>     Fixed does not seem to work.
>>
>>     avro-1.7.3
>>
>>     fixed.json:
>>     {
>>         "type": "record",
>>         "name": "fixed",
>>         "fields" : [
>>             {"name": "foo", "type": "fixed", "size": 8}
>>         ]
>>     }
>>
>>     ./avrogencpp -i fixed.json -o fixed.hh
>>     Segmentation fault (core dumped)
>>
>>     If I change the fixed.json to:
>>     {
>>         "type": "record",
>>         "name": "test_fixed",
>>         "fields" : [
>>             {"name": "foo", "type": "fixed", "size": 8}
>>         ]
>>     }
>>
>>     ./avrogencpp -i fixed.json -o fixed.hh
>>     Failed to parse or compile schema: Unknown type: fixed
>>
>>
>>     On 12/21/2012 03:43 AM, Martin Kleppmann wrote:
>>>     If your numbers are typically small, you can just use a signed type —
>>>     the sign bit's overhead is insignificant.
>>>
>>>     If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
>>>     2^32-1), e.g. because they are hashes or random numbers from that
>>>     range, you're best off using the 'fixed' type and specifying the
>>>     number of bytes you want. In this case 'fixed' is more efficient than
>>>     the variable-length encoding of int/long, because there is no overhead
>>>     for indicating the length; it's simply stored as that number of bytes,
>>>     and nothing else.
>>>
>>>     Because those two options cover most use cases, I don't think there
>>>     are any plans to add unsigned int support to Avro.
>>>
>>>     Martin
>>>
>>>
>>>     On 20 December 2012 13:25, Robert Minsk<ro...@methodstudios.com>  <ma...@methodstudios.com>
>>>     wrote:
>>>>     I am currently testing Avro for our network serialization for a mix of C++
>>>>     and python programs.  I have noticed that Avro does not offer an unsigned
>>>>     32-bit or unsigned 64-bit integer types.  How are people currently handling
>>>>     unsigned integers?  Are there any plans to add unsigned integer types?
>>>>
>>>>     --
>>>>     Robert Minsk
>>>>     Systems and Software Engineer
>>>>
>>>>     WWW.METHODSTUDIOS.COM  <http://WWW.METHODSTUDIOS.COM>
>>>>     730 Arizona Ave, Santa Monica, CA 90401
>>>>     O:+1 310 434 6500  <tel:%2B1%20310%20434%206500>  // F:+1 310 434 6501  <tel:%2B1%20310%20434%206501>
>>>>
>>>>
>>
>>     -- 
>>     Robert Minsk
>>     Systems and Software Engineer
>>
>>     WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
>>     730 Arizona Ave, Santa Monica, CA 90401
>>     O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501
>>     <tel:+13104346501>
>>
>>     Los Angeles
>>     <http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles>
>>
>>
>>
>
> -- 
> Robert Minsk
> Systems and Software Engineer
>
> WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
> 730 Arizona Ave, Santa Monica, CA 90401
> O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501 
> <tel:+13104346501>
>
> Los Angeles 
> <http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles> 
>
>

-- 
Robert Minsk
Systems and Software Engineer

WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
730 Arizona Ave, Santa Monica, CA 90401
O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501 
<tel:+13104346501>

Los Angeles 
<http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles> 



This e-mail and any attachments are intended only for use by the addressee(s) named herein and may contain confidential information. If you are not the intended recipient of this e-mail, you are hereby notified any dissemination, distribution or copying of this email and any attachments is strictly prohibited. If you receive this email in error, please immediately notify the sender by return email and permanently delete the original, any copy and any printout thereof. The integrity and security of e-mail cannot be guaranteed.


Re: unsigned integer types

Posted by Robert Minsk <ro...@methodstudios.com>.
So what is the second required name field for?  In your example "myFixed".

On 12/21/2012 11:42 AM, Doug Cutting wrote:
> That schema's not quite right.
>
> A fixed schema looks like:
>
>  {"name": "foo", "type": "fixed", "size": 8}
>
> A record field looks like:
>
>  {"name": "foo", "type": <schema>}
>
> So a record with a fixed as a field would look like:
>
> {
> "type": "record",
> "name": "recordWithFixed",
> "fields" : [
> {"name": "fixedValue", "type": {"name": "myFixed", "type": "fixed", 
> "size": 8}}
>     ]
> }
>
> Alternately you can perhaps just skip the record and use the fixed 
> schema directly.
>
> Doug
>
>
> On Fri, Dec 21, 2012 at 11:18 AM, Robert Minsk 
> <robert.minsk@methodstudios.com 
> <ma...@methodstudios.com>> wrote:
>
>     Fixed does not seem to work.
>
>     avro-1.7.3
>
>     fixed.json:
>     {
>         "type": "record",
>         "name": "fixed",
>         "fields" : [
>             {"name": "foo", "type": "fixed", "size": 8}
>         ]
>     }
>
>     ./avrogencpp -i fixed.json -o fixed.hh
>     Segmentation fault (core dumped)
>
>     If I change the fixed.json to:
>     {
>         "type": "record",
>         "name": "test_fixed",
>         "fields" : [
>             {"name": "foo", "type": "fixed", "size": 8}
>         ]
>     }
>
>     ./avrogencpp -i fixed.json -o fixed.hh
>     Failed to parse or compile schema: Unknown type: fixed
>
>
>     On 12/21/2012 03:43 AM, Martin Kleppmann wrote:
>>     If your numbers are typically small, you can just use a signed type —
>>     the sign bit's overhead is insignificant.
>>
>>     If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
>>     2^32-1), e.g. because they are hashes or random numbers from that
>>     range, you're best off using the 'fixed' type and specifying the
>>     number of bytes you want. In this case 'fixed' is more efficient than
>>     the variable-length encoding of int/long, because there is no overhead
>>     for indicating the length; it's simply stored as that number of bytes,
>>     and nothing else.
>>
>>     Because those two options cover most use cases, I don't think there
>>     are any plans to add unsigned int support to Avro.
>>
>>     Martin
>>
>>
>>     On 20 December 2012 13:25, Robert Minsk<ro...@methodstudios.com>  <ma...@methodstudios.com>
>>     wrote:
>>>     I am currently testing Avro for our network serialization for a mix of C++
>>>     and python programs.  I have noticed that Avro does not offer an unsigned
>>>     32-bit or unsigned 64-bit integer types.  How are people currently handling
>>>     unsigned integers?  Are there any plans to add unsigned integer types?
>>>
>>>     --
>>>     Robert Minsk
>>>     Systems and Software Engineer
>>>
>>>     WWW.METHODSTUDIOS.COM  <http://WWW.METHODSTUDIOS.COM>
>>>     730 Arizona Ave, Santa Monica, CA 90401
>>>     O:+1 310 434 6500  <tel:%2B1%20310%20434%206500>  // F:+1 310 434 6501  <tel:%2B1%20310%20434%206501>
>>>
>>>
>
>     -- 
>     Robert Minsk
>     Systems and Software Engineer
>
>     WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
>     730 Arizona Ave, Santa Monica, CA 90401
>     O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501
>     <tel:+13104346501>
>
>     Los Angeles
>     <http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles>
>
>
>

-- 
Robert Minsk
Systems and Software Engineer

WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
730 Arizona Ave, Santa Monica, CA 90401
O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501 
<tel:+13104346501>

Los Angeles 
<http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles> 



This e-mail and any attachments are intended only for use by the addressee(s) named herein and may contain confidential information. If you are not the intended recipient of this e-mail, you are hereby notified any dissemination, distribution or copying of this email and any attachments is strictly prohibited. If you receive this email in error, please immediately notify the sender by return email and permanently delete the original, any copy and any printout thereof. The integrity and security of e-mail cannot be guaranteed.


Re: unsigned integer types

Posted by Doug Cutting <cu...@apache.org>.
That schema's not quite right.

A fixed schema looks like:

 {"name": "foo", "type": "fixed", "size": 8}

A record field looks like:

 {"name": "foo", "type": <schema>}

So a record with a fixed as a field would look like:

{
    "type": "record",
    "name": "recordWithFixed",
    "fields" : [
      {"name": "fixedValue", "type": {"name": "myFixed", "type": "fixed",
"size": 8}}
    ]
}

Alternately you can perhaps just skip the record and use the fixed schema
directly.

Doug


On Fri, Dec 21, 2012 at 11:18 AM, Robert Minsk <
robert.minsk@methodstudios.com> wrote:

>  Fixed does not seem to work.
>
> avro-1.7.3
>
> fixed.json:
> {
>     "type": "record",
>     "name": "fixed",
>     "fields" : [
>         {"name": "foo", "type": "fixed", "size": 8}
>     ]
> }
>
> ./avrogencpp -i fixed.json -o fixed.hh
> Segmentation fault (core dumped)
>
> If I change the fixed.json to:
> {
>     "type": "record",
>     "name": "test_fixed",
>     "fields" : [
>         {"name": "foo", "type": "fixed", "size": 8}
>     ]
> }
>
> ./avrogencpp -i fixed.json -o fixed.hh
> Failed to parse or compile schema: Unknown type: fixed
>
>
> On 12/21/2012 03:43 AM, Martin Kleppmann wrote:
>
> If your numbers are typically small, you can just use a signed type —
> the sign bit's overhead is insignificant.
>
> If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
> 2^32-1), e.g. because they are hashes or random numbers from that
> range, you're best off using the 'fixed' type and specifying the
> number of bytes you want. In this case 'fixed' is more efficient than
> the variable-length encoding of int/long, because there is no overhead
> for indicating the length; it's simply stored as that number of bytes,
> and nothing else.
>
> Because those two options cover most use cases, I don't think there
> are any plans to add unsigned int support to Avro.
>
> Martin
>
>
> On 20 December 2012 13:25, Robert Minsk <ro...@methodstudios.com> <ro...@methodstudios.com>
> wrote:
>
>  I am currently testing Avro for our network serialization for a mix of C++
> and python programs.  I have noticed that Avro does not offer an unsigned
> 32-bit or unsigned 64-bit integer types.  How are people currently handling
> unsigned integers?  Are there any plans to add unsigned integer types?
>
> --
> Robert Minsk
> Systems and Software Engineer
> WWW.METHODSTUDIOS.COM
> 730 Arizona Ave, Santa Monica, CA 90401
> O:+1 310 434 6500 // F:+1 310 434 6501
>
>
> --
>  Robert Minsk
>  Systems and Software Engineer
>
>  WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
>  730 Arizona Ave, Santa Monica, CA 90401
> O:+1 310 434 6500 <+13104346500> // F:+1 310 434 6501 <+13104346501>
>
>   [image: Los Angeles]<http://www.methodstudios.com/signature/url/los-angeles>
> <http://www.methodstudios.com/signature/url/los-angeles>
>
>

Re: unsigned integer types

Posted by Robert Minsk <ro...@methodstudios.com>.
Fixed does not seem to work.

avro-1.7.3

fixed.json:
{
     "type": "record",
     "name": "fixed",
     "fields" : [
         {"name": "foo", "type": "fixed", "size": 8}
     ]
}

./avrogencpp -i fixed.json -o fixed.hh
Segmentation fault (core dumped)

If I change the fixed.json to:
{
     "type": "record",
     "name": "test_fixed",
     "fields" : [
         {"name": "foo", "type": "fixed", "size": 8}
     ]
}

./avrogencpp -i fixed.json -o fixed.hh
Failed to parse or compile schema: Unknown type: fixed

On 12/21/2012 03:43 AM, Martin Kleppmann wrote:
> If your numbers are typically small, you can just use a signed type —
> the sign bit's overhead is insignificant.
>
> If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
> 2^32-1), e.g. because they are hashes or random numbers from that
> range, you're best off using the 'fixed' type and specifying the
> number of bytes you want. In this case 'fixed' is more efficient than
> the variable-length encoding of int/long, because there is no overhead
> for indicating the length; it's simply stored as that number of bytes,
> and nothing else.
>
> Because those two options cover most use cases, I don't think there
> are any plans to add unsigned int support to Avro.
>
> Martin
>
>
> On 20 December 2012 13:25, Robert Minsk <ro...@methodstudios.com>
> wrote:
>> I am currently testing Avro for our network serialization for a mix of C++
>> and python programs.  I have noticed that Avro does not offer an unsigned
>> 32-bit or unsigned 64-bit integer types.  How are people currently handling
>> unsigned integers?  Are there any plans to add unsigned integer types?
>>
>> --
>> Robert Minsk
>> Systems and Software Engineer
>>
>> WWW.METHODSTUDIOS.COM
>> 730 Arizona Ave, Santa Monica, CA 90401
>> O:+1 310 434 6500 // F:+1 310 434 6501
>>
>>

-- 
Robert Minsk
Systems and Software Engineer

WWW.METHODSTUDIOS.COM <http://www.methodstudios.com>
730 Arizona Ave, Santa Monica, CA 90401
O:+1 310 434 6500 <tel:+13104346500> // F:+1 310 434 6501 
<tel:+13104346501>

Los Angeles 
<http://www.methodstudios.com/signature/url/los-angeles><http://www.methodstudios.com/signature/url/los-angeles> 



This e-mail and any attachments are intended only for use by the addressee(s) named herein and may contain confidential information. If you are not the intended recipient of this e-mail, you are hereby notified any dissemination, distribution or copying of this email and any attachments is strictly prohibited. If you receive this email in error, please immediately notify the sender by return email and permanently delete the original, any copy and any printout thereof. The integrity and security of e-mail cannot be guaranteed.


Re: unsigned integer types

Posted by Martin Kleppmann <ma...@rapportive.com>.
If your numbers are typically small, you can just use a signed type —
the sign bit's overhead is insignificant.

If your numbers typically use the full range of 0 to 2^64-1 (or 0 to
2^32-1), e.g. because they are hashes or random numbers from that
range, you're best off using the 'fixed' type and specifying the
number of bytes you want. In this case 'fixed' is more efficient than
the variable-length encoding of int/long, because there is no overhead
for indicating the length; it's simply stored as that number of bytes,
and nothing else.

Because those two options cover most use cases, I don't think there
are any plans to add unsigned int support to Avro.

Martin


On 20 December 2012 13:25, Robert Minsk <ro...@methodstudios.com>
wrote:
>
> I am currently testing Avro for our network serialization for a mix of C++
> and python programs.  I have noticed that Avro does not offer an unsigned
> 32-bit or unsigned 64-bit integer types.  How are people currently handling
> unsigned integers?  Are there any plans to add unsigned integer types?
>
> --
> Robert Minsk
> Systems and Software Engineer
>
> WWW.METHODSTUDIOS.COM
> 730 Arizona Ave, Santa Monica, CA 90401
> O:+1 310 434 6500 // F:+1 310 434 6501
>
>