You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Ravindra <ra...@gmail.com> on 2011/09/25 16:10:15 UTC

Avro C issue with multi-threading

Hi,

I'm using avro C library to serialize and de-serialize data. I'm seeing
issues(seg-faults) when I use it with multi-threading, the same code works
fine when I run it in single threaded mode. To give some details about my
code, on init I read the schema from a file and create a global
avro_schema_t object. Multiple threads then use this global variable(schema)
to serialize and de-serialize data. The schema itself is never modified
during run-time in my code. From whatever I understood by going through avro
code, I don't think the schema is modified by avro code either during
serializing/de-serializing. If this is in-fact the case, the schema is
essentially a read-only global and should be fine with multiple threads
accessing it. I haven't specifically found any documentation that claims
that avro C is thread safe, It would be really helpful if someone who as
used avro C in a multi-threaded environment could share their experience.
And also, let me know if what I am trying is infact possible.

Stack strace at seg fault
#0  0x00007f8029609175 in raise () from /lib/libc.so.6
#1  0x00007f802960a590 in abort () from /lib/libc.so.6
#2  0x00007f802964456b in ?? () from /lib/libc.so.6
#3  0x00007f8029649b36 in ?? () from /lib/libc.so.6
#4  0x00007f802964e8ec in free () from /lib/libc.so.6
#5  0x00007f8029d61cb5 in avro_default_allocator (ud=<value optimized out>,
ptr=0x4746, osize=6, nsize=18446744073709551615) at allocation.c:31
#6  0x00007f8029d5f0d9 in avro_resolver_free_cycles (consumer=0x79bf80,
freeing=0x771800) at resolver.c:102
#7  0x00007f8029d5f0d9 in avro_resolver_free_cycles (consumer=0x7479c0,
freeing=0x771800) at resolver.c:102
#8  0x00007f8029d5f174 in avro_resolver_free (consumer=0x7479c0) at
resolver.c:120
#9  0x00007f8029d5f8e7 in try_record (resolvers=0x773320, resolver=<value
optimized out>, wschema=<value optimized out>, rschema=0x6aa920,
    root_rschema=<value optimized out>) at resolver.c:1133
#10 0x00007f8029d5fd18 in avro_resolver_new_memoized (resolvers=0x773320,
wschema=0x6aa920, rschema=0x6aa920) at resolver.c:1359
#11 0x00007f8029d60966 in avro_resolver_new (wschema=0x6aa920,
rschema=0x6aa920) at resolver.c:1379
#12 0x00007f8029d5d1aa in avro_read_data (reader=0xd205a0,
writers_schema=0x6aa920, readers_schema=0x6, datum=0x7f800bfee218) at
datum_read.c:362
#13 0x000000000040e426 in deserialize_fcap_blob (fcapBlob=0x7b6810
"\002\206F\350\360\307\347\t\004", fcapBlob_len=43, result_obj=0x797a90)



Awaiting replies, thanks in advance.

--
View this message in context: http://apache-avro.679487.n3.nabble.com/Avro-C-issue-with-multi-threading-tp3366671p3366671.html
Sent from the Avro - Developers mailing list archive at Nabble.com.

Re: Avro C issue with multi-threading

Posted by Douglas Creager <dc...@dcreager.net>.
> I'm using avro C library to serialize and de-serialize data. I'm seeing
> issues(seg-faults) when I use it with multi-threading, the same code works
> fine when I run it in single threaded mode. To give some details about my
> code, on init I read the schema from a file and create a global
> avro_schema_t object. Multiple threads then use this global variable(schema)
> to serialize and de-serialize data. The schema itself is never modified
> during run-time in my code. From whatever I understood by going through avro
> code, I don't think the schema is modified by avro code either during
> serializing/de-serializing. If this is in-fact the case, the schema is
> essentially a read-only global and should be fine with multiple threads
> accessing it. I haven't specifically found any documentation that claims
> that avro C is thread safe, It would be really helpful if someone who as
> used avro C in a multi-threaded environment could share their experience.
> And also, let me know if what I am trying is infact possible.

Which library version are you using?  Anything in the 1.5 branch or earlier doesn't make any guarantees about thread safety.  Awhile back I checked in a patch for AVRO-746 [1] that made the various incref and decref functions thread-safe, but this was only applied to the Subversion HEAD, and not back-ported to 1.5.  You're right that the contents of the schema objects aren't modified during serialization or deserialization, but some of the helper objects that are created do update the reference counts of any schemas pointers that they hold.  Without the AVRO-746 patch, you could easily have race conditions that would cause the schema objects to be freed while there were still references to them.

Can you try the latest Subversion HEAD and see if that fixes the segfaults?  Also note that even with HEAD, it's only the incref and decref functions that are thread-safe.  If you're doing any updates or modifications to an Avro object, it should only be used within a single thread.  And if any object is used in multiple threads, you can only read from it.

An alternative, if you can't use HEAD, is to create a separate copy of the schema for each thread, using avro_schema_copy.

[1] https://issues.apache.org/jira/browse/AVRO-746

cheers
–doug