You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by capo hatsoft <ja...@gmail.com> on 2014/07/30 04:34:38 UTC

Schema default values in C implimentation

I recently extended a tool used at my work by adding an Avro output module.
The module works fine except that it appears to ignore default values in
the schema.

My program does something like this:

presuming schemaBuffer contains

{"type":"record",
  "name":"test",
  "fields":[
    {"name":"foo", "type":"int"},
    {"name":"hat", "type":"int", "default":12},
    {"name":"bar", "type":"string"}
  ]
}

avro_schema_t schema;
avro_schema_from_json_length(schemaBuffer, schemaLen, &schema);

avro_value_iface_t * iface;
avro_value_t writer_value, field;
avro_file_writer_t avro_writer;

iface = avro_generic_class_from_schema(schema);
avro_generic_value_new(iface, &writer_value);
avro_file_writer_create_with_codec(outFilePath, schema, &avro_writer,
"defalte", blockSize);

//iterate over some data structure containing src data {
  //for int values {
    avro_value_set_int(&field, someIntValue);
  }
  //similar code for other types...
}
avro_file_writer_append_value(avro_writer, &writer_value)


//clean up
//flush etc on program exit

The result is that the program correctly creates an avro encoded file with
one record for each of my input records with all the correct values etc....

Except! The schema at the top of the output file created is different to
the input schema. It now looks like:

{"type":"record",
  "name":"test",
  "fields":[
    {"name":"foo", "type":"int"},
    {"name":"hat", "type":"int"},
    {"name":"bar", "type":"string"}
  ]
}

The default property just seems to be completely ignored by the schema
parser or otherwise not reproduced by the schema writer.

Having a look at the source code I came across this concerning struct in
schema.h:

struct avro_record_field_t {
	int index;
	char *name;
	avro_schema_t type;
	/*
	 * TODO: default values
	 */
};

So it appears that default values are not supported by Avro C?

I'm pretty confused however as the documentation at
http://avro.apache.org/docs/1.7.7/api/c/index.html states:

The C implementation supports:

   -

   binary encoding/decoding of all primitive and complex data types
   -

   storage to an Avro Object Container File
   -

   schema resolution, promotion and projection
   -

   validating and non-validating mode for writing Avro data

The C implementation is lacking:

   -

   RPC

Is the documentation wrong or am I just missing something?

I couldn't find any evidence that default values are supported after
reading over the source. If this feature still planned to be implemented?
Should the documentation be updated to reflect that the C implementation
does not support default values?

This is a blocker for me so I was considering extending Avro C to support
default values myself but I thought I should check with the mailing list
first.

Thanks in advance,
Chris.

RE: Schema default values in C implimentation

Posted by "Jeno I. Hajdu" <je...@gmail.com>.
Hi,

it's a blocker in my project as well and as noone was working on it afaik
at this point I started doing it. I am half way through it (turning the
default values from the schema json into avro values and adding them to the
record field schema struct, the other half is filling the gaps already
marked in the code with TODOs using this piece). Currently I am on
vacation, but I plan to finish this in August.

Regards,
Jeno
2014.07.30. 6:38 ezt írta ("Steve Roehrs" <St...@rlmgroup.com.au>):

> Hi Chris
>
> The C API doesn't support default values, or schema evolution.  It does
> support schema projection, where the reader schema has less fields than
> the writer schema.
>
> For this reason we switched to using the C++ API for our particular
> project. C++ Schema evolution has only recently been baselined in
> version 1.7.7 of Avro.
>
> I did some work on trying to implement it in the C version, but gave up,
> as I found the C code quite difficult to work with.
>
> I agree the documentation should be clarified.  Schema projection and
> type promotion are a subset of schema resolution - but schema evolution
> is definitely missing!
>
>
> Steve Roehrs
>
> Senior Software Engineer | Lockheed Martin
>
>
>
> | p: +61 8 7389 4525    | m: +61 4 3891 5622     | f: +61 8 7389 4551
>
> | w: www.rlmgroup.com.au | e: Steve.Roehrs@rlmgroup.com.au
>
> | Company address: 82-86 Woomera Ave, Edinburgh, SA 5111
>
> This email and any attachment to it remains the property of Lockheed
> Martin and is intended only to be read or used by the named addressee.
> It may contain information that is confidential, commercially valuable
> or subject to legal privilege.  If you receive this email in error,
> please immediately delete it and notify the sender.  Opinions,
> conclusions and other information in this message that do not relate to
> the official business of Lockheed Martin or any companies within
> Lockheed Martin shall be understood as neither given nor endorsed by
> them.
>
> -----Original Message-----
> From: capo hatsoft [mailto:jangoolie@gmail.com]
> Sent: Wednesday, July 30, 2014 12:05 PM
> To: dev@avro.apache.org
> Subject: Schema default values in C implimentation
>
> I recently extended a tool used at my work by adding an Avro output
> module.
> The module works fine except that it appears to ignore default values in
> the schema.
>
> My program does something like this:
>
> presuming schemaBuffer contains
>
> {"type":"record",
>   "name":"test",
>   "fields":[
>     {"name":"foo", "type":"int"},
>     {"name":"hat", "type":"int", "default":12},
>     {"name":"bar", "type":"string"}
>   ]
> }
>
> avro_schema_t schema;
> avro_schema_from_json_length(schemaBuffer, schemaLen, &schema);
>
> avro_value_iface_t * iface;
> avro_value_t writer_value, field;
> avro_file_writer_t avro_writer;
>
> iface = avro_generic_class_from_schema(schema);
> avro_generic_value_new(iface, &writer_value);
> avro_file_writer_create_with_codec(outFilePath, schema, &avro_writer,
> "defalte", blockSize);
>
> //iterate over some data structure containing src data {
>   //for int values {
>     avro_value_set_int(&field, someIntValue);
>   }
>   //similar code for other types...
> }
> avro_file_writer_append_value(avro_writer, &writer_value)
>
>
> //clean up
> //flush etc on program exit
>
> The result is that the program correctly creates an avro encoded file
> with
> one record for each of my input records with all the correct values
> etc....
>
> Except! The schema at the top of the output file created is different to
> the input schema. It now looks like:
>
> {"type":"record",
>   "name":"test",
>   "fields":[
>     {"name":"foo", "type":"int"},
>     {"name":"hat", "type":"int"},
>     {"name":"bar", "type":"string"}
>   ]
> }
>
> The default property just seems to be completely ignored by the schema
> parser or otherwise not reproduced by the schema writer.
>
> Having a look at the source code I came across this concerning struct in
> schema.h:
>
> struct avro_record_field_t {
>         int index;
>         char *name;
>         avro_schema_t type;
>         /*
>          * TODO: default values
>          */
> };
>
> So it appears that default values are not supported by Avro C?
>
> I'm pretty confused however as the documentation at
> http://avro.apache.org/docs/1.7.7/api/c/index.html states:
>
> The C implementation supports:
>
>    -
>
>    binary encoding/decoding of all primitive and complex data types
>    -
>
>    storage to an Avro Object Container File
>    -
>
>    schema resolution, promotion and projection
>    -
>
>    validating and non-validating mode for writing Avro data
>
> The C implementation is lacking:
>
>    -
>
>    RPC
>
> Is the documentation wrong or am I just missing something?
>
> I couldn't find any evidence that default values are supported after
> reading over the source. If this feature still planned to be
> implemented?
> Should the documentation be updated to reflect that the C implementation
> does not support default values?
>
> This is a blocker for me so I was considering extending Avro C to
> support
> default values myself but I thought I should check with the mailing list
> first.
>
> Thanks in advance,
> Chris.
>

RE: Schema default values in C implimentation

Posted by Steve Roehrs <St...@rlmgroup.com.au>.
Hi Chris

The C API doesn't support default values, or schema evolution.  It does
support schema projection, where the reader schema has less fields than
the writer schema.

For this reason we switched to using the C++ API for our particular
project. C++ Schema evolution has only recently been baselined in
version 1.7.7 of Avro.

I did some work on trying to implement it in the C version, but gave up,
as I found the C code quite difficult to work with.

I agree the documentation should be clarified.  Schema projection and
type promotion are a subset of schema resolution - but schema evolution
is definitely missing!


Steve Roehrs

Senior Software Engineer | Lockheed Martin

 

| p: +61 8 7389 4525    | m: +61 4 3891 5622     | f: +61 8 7389 4551

| w: www.rlmgroup.com.au | e: Steve.Roehrs@rlmgroup.com.au

| Company address: 82-86 Woomera Ave, Edinburgh, SA 5111

This email and any attachment to it remains the property of Lockheed
Martin and is intended only to be read or used by the named addressee.
It may contain information that is confidential, commercially valuable
or subject to legal privilege.  If you receive this email in error,
please immediately delete it and notify the sender.  Opinions,
conclusions and other information in this message that do not relate to
the official business of Lockheed Martin or any companies within
Lockheed Martin shall be understood as neither given nor endorsed by
them.

-----Original Message-----
From: capo hatsoft [mailto:jangoolie@gmail.com] 
Sent: Wednesday, July 30, 2014 12:05 PM
To: dev@avro.apache.org
Subject: Schema default values in C implimentation

I recently extended a tool used at my work by adding an Avro output
module.
The module works fine except that it appears to ignore default values in
the schema.

My program does something like this:

presuming schemaBuffer contains

{"type":"record",
  "name":"test",
  "fields":[
    {"name":"foo", "type":"int"},
    {"name":"hat", "type":"int", "default":12},
    {"name":"bar", "type":"string"}
  ]
}

avro_schema_t schema;
avro_schema_from_json_length(schemaBuffer, schemaLen, &schema);

avro_value_iface_t * iface;
avro_value_t writer_value, field;
avro_file_writer_t avro_writer;

iface = avro_generic_class_from_schema(schema);
avro_generic_value_new(iface, &writer_value);
avro_file_writer_create_with_codec(outFilePath, schema, &avro_writer,
"defalte", blockSize);

//iterate over some data structure containing src data {
  //for int values {
    avro_value_set_int(&field, someIntValue);
  }
  //similar code for other types...
}
avro_file_writer_append_value(avro_writer, &writer_value)


//clean up
//flush etc on program exit

The result is that the program correctly creates an avro encoded file
with
one record for each of my input records with all the correct values
etc....

Except! The schema at the top of the output file created is different to
the input schema. It now looks like:

{"type":"record",
  "name":"test",
  "fields":[
    {"name":"foo", "type":"int"},
    {"name":"hat", "type":"int"},
    {"name":"bar", "type":"string"}
  ]
}

The default property just seems to be completely ignored by the schema
parser or otherwise not reproduced by the schema writer.

Having a look at the source code I came across this concerning struct in
schema.h:

struct avro_record_field_t {
	int index;
	char *name;
	avro_schema_t type;
	/*
	 * TODO: default values
	 */
};

So it appears that default values are not supported by Avro C?

I'm pretty confused however as the documentation at
http://avro.apache.org/docs/1.7.7/api/c/index.html states:

The C implementation supports:

   -

   binary encoding/decoding of all primitive and complex data types
   -

   storage to an Avro Object Container File
   -

   schema resolution, promotion and projection
   -

   validating and non-validating mode for writing Avro data

The C implementation is lacking:

   -

   RPC

Is the documentation wrong or am I just missing something?

I couldn't find any evidence that default values are supported after
reading over the source. If this feature still planned to be
implemented?
Should the documentation be updated to reflect that the C implementation
does not support default values?

This is a blocker for me so I was considering extending Avro C to
support
default values myself but I thought I should check with the mailing list
first.

Thanks in advance,
Chris.