You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/08/06 00:47:17 UTC

[jira] Commented: (AVRO-557) Speed up one-time data decoding

    [ https://issues.apache.org/jira/browse/AVRO-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895855#action_12895855 ] 

Doug Cutting commented on AVRO-557:
-----------------------------------

> hashes on schemas are expensive. (I did separately push for schema's to be immutable so that their hashcodes could be memoized, but that hasn't been done yet).

Schema's are mostly immutable.  You can add properties, and there's setFields(), which is called once during the construction of a record.  In the latter case, it's an error to use the schema in any way before setFields() has been called.  In the former, we could reset the memoized hashCode whenever a property is added.  So I think we could reasonably memoize Schema#hashCode(), and it could be a good performance win.

But perhaps we shouldn't call hashCode() anyway, but instead use an identity map.  If we used Google collections we could have a weak, identity cache of the form Map<Schema,Map<Schema,Resolver>>.  If someone adds a property to the schema after its used that changed what the resolver does then they'd have trouble, but (a) they shouldn't do that; and (b) the resolver doesn't currently use any user-defined schema properties anyway, and I don't expect it ever will.

Would identity work for your use cases, Kevin?  Do you have a table of schemas somewhere, or are you parsing them anew each time?

> Speed up one-time data decoding
> -------------------------------
>
>                 Key: AVRO-557
>                 URL: https://issues.apache.org/jira/browse/AVRO-557
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.3.2
>            Reporter: Kevin Oliver
>            Assignee: Kevin Oliver
>             Fix For: 1.4.0
>
>         Attachments: AVRO-557.patch
>
>
> There are big gains to be had in performance when using a BinaryDecoder and a GenericDatumReader just one time. This is due to the relatively expensive parsing and initialization that came with 1.3. Patch with example code and a Perf harness to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.