You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Mika Ristimaki <mi...@gmail.com> on 2013/10/16 17:49:39 UTC

Reading an avro data file with avro c

Hi all,

I encountered a possible bug in the Avro C API. If the following is done, it seems that the Avro data file reader can not read the file correctly

while (has values to write) {
	Open file for writing
	Write a value to the file
	Close the writer.
}

However, the following can be read just fine

Open file for writing
while (has values to write) {
	Write a value to the file
}
Close the file

Here it is assumed that reading and writing is done with the C API. The Java API can read data files written in C in both ways.

Is this expected behaviour, a bug or am I just missing something? See the attached C program that reproduces this problem.

Thanks
-Mika 



Re: Reading an avro data file with avro c

Posted by Doug Cutting <cu...@apache.org>.
On Fri, Oct 18, 2013 at 8:28 AM, Mika Ristimaki
<mi...@gmail.com> wrote:
> Should I reopen that, or create a new one?

Please create a new issue.

> If I do the change, is it enough to add the patch to the Jira ticket?

Yes.  See:

https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute

Thanks,

Doug

Re: Reading an avro data file with avro c

Posted by Mika Ristimaki <mi...@gmail.com>.
Hi again,

Found the reason for this. When the file is written with the method as described below (in my earlier mail), the avro data file has multiple data blocks with block count 1. Now when the file is read, EOF is checked after each sync block with.

int avro_reader_is_eof(avro_reader_t reader)
{
	if (is_file_io(reader)) {
		return feof(avro_reader_to_file(reader)->fp);
	}
	return 0;
}


However at this point the whole file is already read to memory (but all the bytes have not yet been consumed), so feof returns non zero value.
There is a really easy fix for this though, by changing the EOF check function to

int avro_reader_is_eof(avro_reader_t reader)
{
	if (is_file_io(reader)) {
        	struct _avro_reader_file_t *file_reader = avro_reader_to_file(reader);
        	if (feof(file_reader->fp)) {
           		 return file_reader->end == file_reader->cur;
        	}
	}
	return 0;
}

How should I go forward to commit this change to the avro repo. I found an old issue (https://issues.apache.org/jira/browse/AVRO-1238) where some improvements to EOF handling has been done. Should I reopen that, or create a new one? If I do the change, is it enough to add the patch to the Jira ticket?

-Mika

On Oct 16, 2013, at 6:49 PM, Mika Ristimaki <mi...@gmail.com> wrote:

> Hi all,
> 
> I encountered a possible bug in the Avro C API. If the following is done, it seems that the Avro data file reader can not read the file correctly
> 
> while (has values to write) {
> 	Open file for writing
> 	Write a value to the file
> 	Close the writer.
> }
> 
> However, the following can be read just fine
> 
> Open file for writing
> while (has values to write) {
> 	Write a value to the file
> }
> Close the file
> 
> Here it is assumed that reading and writing is done with the C API. The Java API can read data files written in C in both ways.
> 
> Is this expected behaviour, a bug or am I just missing something? See the attached C program that reproduces this problem.
> 
> Thanks
> -Mika 
> 
> 
> <main.c>