You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by rs...@apache.org on 2023/03/01 19:03:05 UTC

[avro] branch master updated: docs: Change index.md to add a schema for data blocks (#2042)

This is an automated email from the ASF dual-hosted git repository.

rskraba pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/avro.git


The following commit(s) were added to refs/heads/master by this push:
     new 5b2c27956 docs: Change index.md to add a schema for data blocks (#2042)
5b2c27956 is described below

commit 5b2c27956b601c580af277614397b2e43e0ba9f9
Author: dpcollins-google <40...@users.noreply.github.com>
AuthorDate: Wed Mar 1 14:02:55 2023 -0500

    docs: Change index.md to add a schema for data blocks (#2042)
    
    Also make data file schemas valid json (no trailing commas)
---
 doc/content/en/docs/++version++/Specification/_index.md | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/doc/content/en/docs/++version++/Specification/_index.md b/doc/content/en/docs/++version++/Specification/_index.md
index c6716466d..df641e2db 100755
--- a/doc/content/en/docs/++version++/Specification/_index.md
+++ b/doc/content/en/docs/++version++/Specification/_index.md
@@ -472,7 +472,18 @@ A file data block consists of:
 * The serialized objects. If a codec is specified, this is compressed by that codec.
 * The file's 16-byte sync marker.
 
-Thus, each block's binary data can be efficiently extracted or skipped without deserializing the contents. The combination of block size, object counts, and sync markers enable detection of corrupt blocks and help ensure data integrity.
+A file data block is thus described by the following schema:
+```json
+{"type": "record", "name": "org.apache.avro.file.DataBlock",
+ "fields" : [
+   {"name": "count", "type": "long"},
+   {"name": "data", "type": "bytes"},
+   {"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}}
+  ]
+}
+```
+
+Each block's binary data can be efficiently extracted or skipped without deserializing the contents. The combination of block size, object counts, and sync markers enable detection of corrupt blocks and help ensure data integrity.
 
 ### Required Codecs