You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/07/16 11:32:04 UTC
[jira] [Commented] (AVRO-1704) Standardized format for encoding
messages with Avro
[ https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629488#comment-14629488 ]
ASF GitHub Bot commented on AVRO-1704:
--------------------------------------
GitHub user dasch opened a pull request:
https://github.com/apache/avro/pull/43
AVRO-1704: Standardized format for encoding messages with Avro
This is a proof of concept implementation of [AVRO-1704](https://issues.apache.org/jira/browse/AVRO-1704).
- The fingerprint implementation is mocked out.
- Only 64-bit fingerprints are supported.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dasch/avro dasch/message-format
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/avro/pull/43.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #43
----
commit 5765e59879e2c70ec2095dd666105d26e0d592fc
Author: Daniel Schierbeck <da...@zendesk.com>
Date: 2015-07-16T09:05:38Z
Add the Avro::Message format
commit f1286548ebf0e2b8ef50d604251fcfbd70137b8b
Author: Daniel Schierbeck <da...@zendesk.com>
Date: 2015-07-16T09:28:03Z
Add SchemaStore
Currently it's using a mock fingerprint implementation and only stores
64-bit fingerprints.
----
> Standardized format for encoding messages with Avro
> ---------------------------------------------------
>
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
> Issue Type: Improvement
> Reporter: Daniel Schierbeck
>
> I'm currently using the Datafile format for encoding messages that are written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, meaning that I can read and write data with minimal effort across the various languages in use in my organization. If there was a standardized format for encoding single values that was optimized for out-of-band schema transfer, I would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode datums in this format, as well as a MessageReader that, given a SchemaStore, would be able to decode datums. The reader would decode the fingerprint and ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed library users to inject custom backends. A simple, file system based one could be provided out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)