You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Raymie Stata (Commented) (JIRA)" <ji...@apache.org> on 2012/01/25 08:19:40 UTC

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

    [ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192909#comment-13192909 ] 

Raymie Stata commented on AVRO-1006:
------------------------------------

An _Avro schema fingerprint_ is a hash of an Avro schema.  Within a collection of even a million schemas, the probability of a collision is still around 0.001%.  Thus, fingerprints can be used in place of schemas.

One motivating use-case for fingerprints is a pub/sub message bus.  On a pub/sub bus, since multiple writers can publish to the same topic using different schemas, each message must be associated with its schema.  Rather than include the actual schema with every message, one can instead include the fingerprint of the schema, which would be smaller.  When a reader encounters a fingerprint it hasn't seen before, it can look it up and cache it.  (The attached document describes possible lookup mechanisms.)

The proposed approach to fingerprinting is pretty straight forward.  First, we convert Avro schemas into a _canonical form._  Two equivalent schemas always have the same canonical form.  Once we have the canonical form, we simply take a 64-bit "Rabin fingerprint" (a CRC) of that text.

                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint that can be used as a key in various contexts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira