You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Vivek Nadkarni (JIRA)" <ji...@apache.org> on 2012/09/15 00:38:09 UTC

[jira] [Updated] (AVRO-1167) Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.

     [ https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Nadkarni updated AVRO-1167:
---------------------------------

    Attachment: AVRO-1167-TEST.patch

The test in AVRO-1167-TEST.patch shows the memory leak caused by avro_schema_copy() when an AVRO_LINK is copied. I have provided a non-recursive schema containing an AVRO_LINK in this test. Once AVRO-766 is fixed, this issue should be revisited for recursive schemas as well.

The valgrind output is below:

==15602== 1,410 (24 direct, 1,386 indirect) bytes in 1 blocks are definitely lost in loss record 23 of 23
==15602==    at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
==15602==    by 0x4C29019: realloc (vg_replace_malloc.c:525)
==15602==    by 0x40D364: avro_default_allocator (allocation.c:36)
==15602==    by 0x405428: avro_schema_link (schema.c:663)
==15602==    by 0x406946: avro_schema_copy (schema.c:1260)
==15602==    by 0x406722: avro_schema_copy (schema.c:1165)
==15602==    by 0x403EEA: main (test_avro_1167.c:74)
==15602== 
==15602== LEAK SUMMARY:
==15602==    definitely lost: 96 bytes in 3 blocks
==15602==    indirectly lost: 2,748 bytes in 28 blocks
==15602==      possibly lost: 0 bytes in 0 blocks
==15602==    still reachable: 0 bytes in 0 blocks
==15602==         suppressed: 0 bytes in 0 blocks
==15602== 
==15602== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4)


                
> Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
> -------------------------------------------------------------
>
>                 Key: AVRO-1167
>                 URL: https://issues.apache.org/jira/browse/AVRO-1167
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.7.1
>         Environment: Ubuntu Linux 11.10
>            Reporter: Vivek Nadkarni
>         Attachments: AVRO-1167-TEST.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a new_schema, it sets the target of the new_link to the target of the old_link in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element in the old_schema. 
> While this is currently safe, since the reference count of the target in the old_schema is incremented, we are not really making a "copy" of the schema.
> There is a "TODO" in the code, which says that we should make a avro_schema_copy() of the target in old_schema instead of linking directly to it. However, this solution of making a copy would result in a few problems:
> 1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs are intended to be internal links inside a self-contained schema. The code introduces unnecessary (and potentially disallowed) external dependencies in an Avro schema. 
> 2. The purpose of copying a schema that we want to decouple the old_schema from the new_schema. The two copies may have different owners, we may want to deallocate old schema etc.
> 3. If the schema is recursive, then the code would enter an infinite recursion loop.
> It appears to me that the "correct" solution would be to replicate the entire structure of the current schema, including the internal links. This means that if old_link_A points to old_target_B, then new_link_A should point to new_target_B in the new schema. Note that there should only be one copy of new_target_B in the new schema, even if there are multiple links pointing to new_target_B - i.e. we should not make a new copy for each link.
> In order to implement this proper copying of links, we would need to keep a lookup table of pairs of old and new schemas as they are being created, as well as a list of all the AVRO_LINKs that are copied. Then as a post-copy step, we would go and fix up all the AVRO_LINKs to point to the appropriate targets. This is the way the schema is constructed in the first place in avro_schema_from_json().
> An inefficient way to obtain the correct result from avro_schema_copy() would be to perform an avro_schema_to_json() followed by an avro_schema_from_json().
> Note: I have not implemented a fix for this issue, but I am documenting this issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 can be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira