You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by marius <m....@googlemail.com> on 2015/08/12 16:15:30 UTC

avro RAM usage

Hey,

i am currently doing some performance tests for my BSc thesis and i 
wondered how exactly the parsing of avro files when reading them works. 
 From my understanding the data is read block by block from the file 
(rather than datum by datum) and then the datums are deserialized. Is 
this correct (this would mean that the memory usage of avro is depending 
on the block size rather than the datum size of each datum) or does it 
depend on the used implementation?

My second question is if there is a way to read the file datum by datum. 
I want to create an index which stores the byte offsets of the avro file 
so i can use e.g. seek() to go to that position and deserialize the 
following datum. Is this even possible or can i only start at positions 
with sync marker?

Greetings and thanks

Marius