You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/12/24 00:01:29 UTC

[jira] Created: (AVRO-266) Union as java.lang.Object prevents object reuse

Union as java.lang.Object prevents object reuse
-----------------------------------------------

                 Key: AVRO-266
                 URL: https://issues.apache.org/jira/browse/AVRO-266
             Project: Avro
          Issue Type: Improvement
    Affects Versions: 1.2.0
            Reporter: Todd Lipcon


Because Unions end up deserializing as java.lang.Object and using the object's type to differentiate the union constituents, object reuse is pretty hard to achieve. I don't have a specific benchmark, but I think this will hurt performance for logging applications where every record in a large file is a union, and the type tends to change for each record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-266) Union as java.lang.Object prevents object reuse

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795394#action_12795394 ] 

Todd Lipcon commented on AVRO-266:
----------------------------------

bq. I'm not sure that you'd actually want object-reuse, since you might be carrying around that 4MB payload in one of those records for way longer than you need to

Good point about leakage here - reminds me of the recent Hadoop JIRA with the IPC buffers never shrinking.

I think the solution here is that we should document the reuse behavior, and suggest that, if the payloads are large and memory is tight, they should not reuse objects. Passing null into the deserialization calls allocates new objects, currently, and we should leave that option.

> Union as java.lang.Object prevents object reuse
> -----------------------------------------------
>
>                 Key: AVRO-266
>                 URL: https://issues.apache.org/jira/browse/AVRO-266
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Todd Lipcon
>
> Because Unions end up deserializing as java.lang.Object and using the object's type to differentiate the union constituents, object reuse is pretty hard to achieve. I don't have a specific benchmark, but I think this will hurt performance for logging applications where every record in a large file is a union, and the type tends to change for each record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-266) Union as java.lang.Object prevents object reuse

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795391#action_12795391 ] 

Philip Zeyliger commented on AVRO-266:
--------------------------------------

This is related to AVRO-248 (naming unions).

I agree with Todd that for the specific API it probably makes sense to generate container objects for the union.  It probably makes sense to special case null.

Does that actually help re-use?  Say you have a log with 17 types of records, some of which are big.  I'm not sure that you'd actually want object-reuse, since you might be carrying around that 4MB payload in one of those records for way longer than you need to.  Who's responsible for clearing the other 16 branches?  Is it the caller's responsibility?

-- Philip

> Union as java.lang.Object prevents object reuse
> -----------------------------------------------
>
>                 Key: AVRO-266
>                 URL: https://issues.apache.org/jira/browse/AVRO-266
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Todd Lipcon
>
> Because Unions end up deserializing as java.lang.Object and using the object's type to differentiate the union constituents, object reuse is pretty hard to achieve. I don't have a specific benchmark, but I think this will hurt performance for logging applications where every record in a large file is a union, and the type tends to change for each record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-266) Union as java.lang.Object prevents object reuse

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794258#action_12794258 ] 

Todd Lipcon commented on AVRO-266:
----------------------------------

I think the best solution to this would be to deserialize unions as a tagged class, with one member field per type plus either an enum, int, or Schema reference "selectedType" field. We can still provide a .getObject() accessor which returns java.lang.Object. This would allow really easy reuse of the constituent records.

> Union as java.lang.Object prevents object reuse
> -----------------------------------------------
>
>                 Key: AVRO-266
>                 URL: https://issues.apache.org/jira/browse/AVRO-266
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Todd Lipcon
>
> Because Unions end up deserializing as java.lang.Object and using the object's type to differentiate the union constituents, object reuse is pretty hard to achieve. I don't have a specific benchmark, but I think this will hurt performance for logging applications where every record in a large file is a union, and the type tends to change for each record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.