You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2013/09/10 18:45:32 UTC

[jira] [Commented] (AVRO-1332) Improve C# DatumReader performance

    [ https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763191#comment-13763191 ] 

Doug Cutting commented on AVRO-1332:
------------------------------------

The batch size=1000 seems like the case we care most about.  This is faster in all cases except when serializing a complex record.  Any idea why this is now slower?  Also, I wonder if some of the resolution can be cached, in order to keep the smaller batch sizes more competitive too?
                
> Improve C# DatumReader performance
> ----------------------------------
>
>                 Key: AVRO-1332
>                 URL: https://issues.apache.org/jira/browse/AVRO-1332
>             Project: Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David McIntosh
>            Priority: Minor
>              Labels: performance
>         Attachments: AVRO-1332-2.patch, AVRO-1332.patch
>
>
> The current implementations of the C# datum readers perform resolution of the reader and writer schema on every call to Read. In my tests this was causing it to perform poorly when reading a large number of records (slower than parsing the same data from delimited text files). It would be more efficient if the reader only needed to resolve the schemas once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira