You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Hari Shreedharan (JIRA)" <ji...@apache.org> on 2013/11/05 02:02:17 UTC
[jira] [Created] (AVRO-1393) SyncInterval logic always causes
blocks to be larger than the sync interval
Hari Shreedharan created AVRO-1393:
--------------------------------------
Summary: SyncInterval logic always causes blocks to be larger than the sync interval
Key: AVRO-1393
URL: https://issues.apache.org/jira/browse/AVRO-1393
Project: Avro
Issue Type: Bug
Reporter: Hari Shreedharan
If sync interval in the container file is set to be exactly block size, then the sync marker will be slightly larger than the block as we check the size of the file only after writing data to the stream. This means that sync interval is essentially the smallest interval between sync markers.
Since we cannot predict the serialized size of the datum, we can never know how much data will overflow the block. Whatever the case, this might be more expensive than expected especially on systems like HDFS.
Fixing this is difficult without breaking a bunch of interfaces, so opening this jira for discussion with people with more knowledge of the code.
--
This message was sent by Atlassian JIRA
(v6.1#6144)