You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2018/06/18 23:23:00 UTC

[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash

    [ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516456#comment-16516456 ] 

Mike Percy commented on KUDU-2260:
----------------------------------

[~wdberkeley] looked into this a bit today after this appeared again in the wild and found [this thread|https://plus.google.com/+KentonVarda/posts/JDwHfAiLGNQ] where Ted T'so discusses this situation and notes that ext4 may flush the file size before the data makes it to disk. The one guarantee you get is that when that happens you will read NULL bytes at the end of the file (instead of some garbage data). So it seems like we should look for trailing NULL records at the end of these files and ignore them when opening log block containers.

One thing that wasn't clear from my reading of that thread is whether the writes need to be sector-aligned to avoid torn writes or whether the filesystem will avoid crossing a sector boundary in all cases for a single write that is less than sector bytes long.

> Log block manager should handle null bytes in metadata on crash
> ---------------------------------------------------------------
>
>                 Key: KUDU-2260
>                 URL: https://issues.apache.org/jira/browse/KUDU-2260
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>            Reporter: Mike Percy
>            Priority: Major
>
> The log block manager currently may leave null bytes at the end of the metadata log file if there is a system crash in the middle of a write. The log block manager should detect null bytes at the end of a metadata entry on startup and potentially truncate the entry or close the container.
> Currently, it prints an error along the following lines:
> {code}
> F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not read records from container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length checksum does not match: Incorrect checksum in file /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: Checksum does not match. Expected: 0. Actual: 1323915147
> {code}
> At the time of writing, the workaround for this issue is to truncate the affected file at the start of the incomplete entry in the file. While this may leave orphaned blocks, this should be safe because if the metadata entry was never successfully written then it should not have been considered durable, either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)