You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/06/30 18:24:10 UTC
[jira] [Commented] (KUDU-1508) Log block manager triggers ext4 hole punching bug in el6

    [ https://issues.apache.org/jira/browse/KUDU-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357612#comment-15357612 ] 

Todd Lipcon commented on KUDU-1508:
-----------------------------------

To summarize the bug:
- an ext4 file is made up of a set of extents
- the extents are stored in a b-tree with 4KB "pages". Apparently after accounting for headers, etc, the root page can hold 340 extent pointers.
- If you have more than 340 extents in a file, then the root page ends up holding 340 pointers to other interior nodes, each of which has 340 extent pointers (just like you'd expect with a btree). https://digital-forensics.sans.org/blog/2011/03/28/digital-forensics-understanding-ext4-part-3-extent-trees is a good reference
- In our case of the log block manager, we can end up with a lot of extents in a file due to hole punching. Imagine a 1GB container file with 1000x1MB blocks. If every odd block is deleted, we'd need 500 extents after we've hole-punched the deleted blocks.
- This would normally be fine, except that the referenced bug means that ext4 forgot to update the interior node pointers, which causes an inconsistency

It seems that 'fsck' is fine at fixing the inconsistency, and we haven't seen any runtime issues due to this bug. It may be entirely harmless. That said, it's problematic because when systems reboot they sometimes run fsck and may need manual intervention to tell fsck to fix the issue.

I looked through the kernel changelog and unfortunately this isn't fixed in any version of el6. It is, however, fixed in el7 and probably any Ubuntu from the last several years (it was fixed upstream in Dec 2012).

So, it seems we have a few choices here regarding this issue:

a) *Do nothing*- if indeed the problem is a 'harmless' ext4 corruption fixable by fsck, then we can just document this as an el6 issue, ask RedHat to backport this patch into the next maintenance kernel, and let users know that they may have to look out for this particular error if fsck runs.
b) *Try to avoid multi-level extent trees*- if we limit the number of blocks per container to a smaller number (say 300) then it's quite unlikely to meet this issue. It's not a sure thing (the system could have arbitrary amounts of fragmentation) but it is easy to implement and probably would make the issue rare enough to not be a problem.
c) *Recommend xfs on el6* - XFS has performed better in most of the tests I've run, and also doesn't not exhibit this bug. However, it's a lot to ask of new users who are installing Kudu on existing clusters that are running ext4.
d) *Avoid hole punching* - we could spend the time to build a block manager implementation that doesn't rely on hole punching. This is likely a lot of work.








> Log block manager triggers ext4 hole punching bug in el6
> --------------------------------------------------------
>
>                 Key: KUDU-1508
>                 URL: https://issues.apache.org/jira/browse/KUDU-1508
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I've experienced many times that when I reboot an el6 node that was running Kudu tservers, fsck reports issues like:
> data6 contains a file system with errors, check forced.
> data6: Interior extent node level 0 of inode 5259348:
> Logical start 154699 does not match logical start 2623046 at next level.  
> After some investigation, I've determined that this is due to an ext4 kernel bug: https://patchwork.ozlabs.org/patch/206123/
> Details in a comment to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)