You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Titouan Christophe <ti...@railnova.eu> on 2020/01/16 17:33:16 UTC

avro-c error: "Datum too large for file block size"

Hello everyone,

I am new to the avro-c library, and trying to get some experience with 
it. At the moment, I wrote a very simple program that is supposed to 
write a single record of a simple schema to a file.

The program source is over there: 
https://gist.github.com/titouanc/0df61b807d06ca7611cc6708f12fc938.

When I run my program, I obtain the following output:

Unable to write Avro record to file: Datum too large for file block size

which is produced by 
https://gist.github.com/titouanc/0df61b807d06ca7611cc6708f12fc938#file-test-avro-c-L34 
. The only reference to a similar error message is 
http://apache-avro.679487.n3.nabble.com/Value-too-large-for-file-block-size-td4028424.html 
but i really doubt I have the same issue, as the size of my record 
should be much more lower than 16kiB.


To write this small piece of code, I draw inspiration from the quickstop 
example 
(https://github.com/apache/avro/blob/release-1.9.1/lang/c/examples/quickstop.c). 
If I compile and run this example, it runs just fine.


Finally, I made a small modification to my program to use the same 
schema as the quickstop example. Here is the modified version: 
https://gist.github.com/titouanc/b18c0c54657db4e1f0361e0be9f710f3.
This one actually works perfectly !


Could anyone help me to understand this issue ?

Best regards,

Titouan Christophe

Re: avro-c error: "Datum too large for file block size"

Posted by Titouan Christophe <ti...@railnova.eu>.
Hello Dan and all,

On 1/16/20 7:00 PM, Dan Schmitt wrote:
> Only difference I can see is the null default/union possibly not being
> handled well by avro_record_set
> or avro_record(schema)
> 
> Without reading the source I'd expect avro_record(schema) to default
> the default union values to null

Yes, I was assuming that a nullable record field would be NULL if no 
value was provided.

> leading to some sort of let's keep reading this memory issue because
> we don't know where the end is.
> 
> You can test if that's not happening by setting hours in your original
> program, and/or explicitly setting
> the fields to null types.

Thank you for this insight ! I wrote a few more variations [1] of my 
program:
- with or without nullable fields
- filling them explicitely with avro_null() or a value
- also with another union type: long/double

In the end, I did not manage to have a working program that constructs a 
record with an union type, and write it to a file, but it works with 
non-union types (see [2]).

Maybe there is an additional step to perform when constructing or 
encoding union types in avro-c ?

> 
> Probably room for improvement on the C++ side (should validate/throw
> if the avro_dataum_t isn't valid
> or matching the writer schema instead of doing whatever it's doing, in
> addition to having the null/union
> default work for avro_record(schema)).
>
I created a repository with all the variation of my test program:
[1] 
https://github.com/titouanc/test-avro-record/tree/bd4f63824489d0b5802cb05bcbd6f9e1b3251a7c

The tests results are visible there:
[2] 
https://github.com/titouanc/test-avro-record/commit/bd4f63824489d0b5802cb05bcbd6f9e1b3251a7c/checks?check_suite_id=405071388#step:4:1


Best regards,

Titouan

Re: avro-c error: "Datum too large for file block size"

Posted by Dan Schmitt <da...@gmail.com>.
Only difference I can see is the null default/union possibly not being
handled well by avro_record_set
or avro_record(schema)

Without reading the source I'd expect avro_record(schema) to default
the default union values to null
leading to some sort of let's keep reading this memory issue because
we don't know where the end is.

You can test if that's not happening by setting hours in your original
program, and/or explicitly setting
the fields to null types.

Probably room for improvement on the C++ side (should validate/throw
if the avro_dataum_t isn't valid
or matching the writer schema instead of doing whatever it's doing, in
addition to having the null/union
default work for avro_record(schema)).

On Thu, Jan 16, 2020 at 11:33 AM Titouan Christophe
<ti...@railnova.eu> wrote:
>
> Hello everyone,
>
> I am new to the avro-c library, and trying to get some experience with
> it. At the moment, I wrote a very simple program that is supposed to
> write a single record of a simple schema to a file.
>
> The program source is over there:
> https://gist.github.com/titouanc/0df61b807d06ca7611cc6708f12fc938.
>
> When I run my program, I obtain the following output:
>
> Unable to write Avro record to file: Datum too large for file block size
>
> which is produced by
> https://gist.github.com/titouanc/0df61b807d06ca7611cc6708f12fc938#file-test-avro-c-L34
> . The only reference to a similar error message is
> http://apache-avro.679487.n3.nabble.com/Value-too-large-for-file-block-size-td4028424.html
> but i really doubt I have the same issue, as the size of my record
> should be much more lower than 16kiB.
>
>
> To write this small piece of code, I draw inspiration from the quickstop
> example
> (https://github.com/apache/avro/blob/release-1.9.1/lang/c/examples/quickstop.c).
> If I compile and run this example, it runs just fine.
>
>
> Finally, I made a small modification to my program to use the same
> schema as the quickstop example. Here is the modified version:
> https://gist.github.com/titouanc/b18c0c54657db4e1f0361e0be9f710f3.
> This one actually works perfectly !
>
>
> Could anyone help me to understand this issue ?
>
> Best regards,
>
> Titouan Christophe