You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/02/27 12:06:45 UTC

[GitHub] Fokko commented on a change in pull request #7508: [hotfix][docs] Add example for the BulkFormat of the StreamingFileSink

Fokko commented on a change in pull request #7508: [hotfix][docs] Add example for the BulkFormat of the StreamingFileSink
URL: https://github.com/apache/flink/pull/7508#discussion_r260719292
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.md
 ##########
 @@ -110,12 +110,45 @@ interactions of bucket assigners and rolling policies.
 
 In the above example we used an `Encoder` that can encode or serialize each
 record individually. The streaming file sink also supports bulk-encoded output
-formats such as [Apache Parquet](http://parquet.apache.org). To use these,
-instead of `StreamingFileSink.forRowFormat()` you would use
+formats such as [Apache Parquet](http://parquet.apache.org) or [Apache Avro](https://avro.apache.org/).
 
 Review comment:
   The Avro is row-encoded, but if you write it to a file, it is still in blocks: https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files
   
   Avro includes a simple object container file format. A file has a schema, and all objects stored in the file must be written according to that schema, using binary encoding. Objects are stored in blocks that may be compressed. 
   A file data block consists of:
   
   - A long indicating the count of objects in this block.
   - A long indicating the size in bytes of the serialized objects in the current block, after any codec is applied
   - The serialized objects. If a codec is specified, this is compressed by that codec.
   - The file's 16-byte sync marker.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services