You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Vivek Nadkarni (JIRA)" <ji...@apache.org> on 2012/09/15 00:27:08 UTC

[jira] [Created] (AVRO-1167) Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.

Vivek Nadkarni created AVRO-1167:
------------------------------------

             Summary: Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
                 Key: AVRO-1167
                 URL: https://issues.apache.org/jira/browse/AVRO-1167
             Project: Avro
          Issue Type: Bug
          Components: c
    Affects Versions: 1.7.1
         Environment: Ubuntu Linux 11.10
            Reporter: Vivek Nadkarni


When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a new_schema, it sets the target of the new_link to the target of the old_link in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element in the old_schema. 

While this is currently safe, since the reference count of the target in the old_schema is incremented, we are not really making a "copy" of the schema.

There is a "TODO" in the code, which says that we should make a avro_schema_copy() of the target in old_schema instead of linking directly to it. However, this solution of making a copy would result in a few problems:

1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs are intended to be internal links inside a self-contained schema. The code introduces unnecessary (and potentially disallowed) external dependencies in an Avro schema. 

2. The purpose of copying a schema that we want to decouple the old_schema from the new_schema. The two copies may have different owners, we may want to deallocate old schema etc.

3. If the schema is recursive, then the code would enter an infinite recursion loop.

It appears to me that the "correct" solution would be to replicate the entire structure of the current schema, including the internal links. This means that if old_link_A points to old_target_B, then new_link_A should point to new_target_B in the new schema. Note that there should only be one copy of new_target_B in the new schema, even if there are multiple links pointing to new_target_B - i.e. we should not make a new copy for each link.

In order to implement this proper copying of links, we would need to keep a lookup table of pairs of old and new schemas as they are being created, as well as a list of all the AVRO_LINKs that are copied. Then as a post-copy step, we would go and fix up all the AVRO_LINKs to point to the appropriate targets. This is the way the schema is constructed in the first place in avro_schema_from_json().

An inefficient way to obtain the correct result from avro_schema_copy() would be to perform an avro_schema_to_json() followed by an avro_schema_from_json().

Note: I have not implemented a fix for this issue, but I am documenting this issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 can be fixed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1167) Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.

Posted by "Vivek Nadkarni (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Nadkarni updated AVRO-1167:
---------------------------------

    Attachment: AVRO-1167-TEST.patch

The test in AVRO-1167-TEST.patch shows the memory leak caused by avro_schema_copy() when an AVRO_LINK is copied. I have provided a non-recursive schema containing an AVRO_LINK in this test. Once AVRO-766 is fixed, this issue should be revisited for recursive schemas as well.

The valgrind output is below:

==15602== 1,410 (24 direct, 1,386 indirect) bytes in 1 blocks are definitely lost in loss record 23 of 23
==15602==    at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
==15602==    by 0x4C29019: realloc (vg_replace_malloc.c:525)
==15602==    by 0x40D364: avro_default_allocator (allocation.c:36)
==15602==    by 0x405428: avro_schema_link (schema.c:663)
==15602==    by 0x406946: avro_schema_copy (schema.c:1260)
==15602==    by 0x406722: avro_schema_copy (schema.c:1165)
==15602==    by 0x403EEA: main (test_avro_1167.c:74)
==15602== 
==15602== LEAK SUMMARY:
==15602==    definitely lost: 96 bytes in 3 blocks
==15602==    indirectly lost: 2,748 bytes in 28 blocks
==15602==      possibly lost: 0 bytes in 0 blocks
==15602==    still reachable: 0 bytes in 0 blocks
==15602==         suppressed: 0 bytes in 0 blocks
==15602== 
==15602== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4)


                
> Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
> -------------------------------------------------------------
>
>                 Key: AVRO-1167
>                 URL: https://issues.apache.org/jira/browse/AVRO-1167
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.7.1
>         Environment: Ubuntu Linux 11.10
>            Reporter: Vivek Nadkarni
>         Attachments: AVRO-1167-TEST.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a new_schema, it sets the target of the new_link to the target of the old_link in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element in the old_schema. 
> While this is currently safe, since the reference count of the target in the old_schema is incremented, we are not really making a "copy" of the schema.
> There is a "TODO" in the code, which says that we should make a avro_schema_copy() of the target in old_schema instead of linking directly to it. However, this solution of making a copy would result in a few problems:
> 1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs are intended to be internal links inside a self-contained schema. The code introduces unnecessary (and potentially disallowed) external dependencies in an Avro schema. 
> 2. The purpose of copying a schema that we want to decouple the old_schema from the new_schema. The two copies may have different owners, we may want to deallocate old schema etc.
> 3. If the schema is recursive, then the code would enter an infinite recursion loop.
> It appears to me that the "correct" solution would be to replicate the entire structure of the current schema, including the internal links. This means that if old_link_A points to old_target_B, then new_link_A should point to new_target_B in the new schema. Note that there should only be one copy of new_target_B in the new schema, even if there are multiple links pointing to new_target_B - i.e. we should not make a new copy for each link.
> In order to implement this proper copying of links, we would need to keep a lookup table of pairs of old and new schemas as they are being created, as well as a list of all the AVRO_LINKs that are copied. Then as a post-copy step, we would go and fix up all the AVRO_LINKs to point to the appropriate targets. This is the way the schema is constructed in the first place in avro_schema_from_json().
> An inefficient way to obtain the correct result from avro_schema_copy() would be to perform an avro_schema_to_json() followed by an avro_schema_from_json().
> Note: I have not implemented a fix for this issue, but I am documenting this issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 can be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira