You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2017/03/22 02:08:41 UTC
[jira] [Updated] (KUDU-1508) Log block manager triggers ext4 hole punching bug in el6

     [ https://issues.apache.org/jira/browse/KUDU-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adar Dembo updated KUDU-1508:
-----------------------------
    Attachment: replay_container.py
                pbc_dump.txt
                filefrag.txt
                debugfs.txt

First some stats:
* Six node cluster running el6.6.
* Each node has 12 2T drives formatted as ext4 with a 2048 byte block size.
* The first node runs the Kudu master, while the rest run tservers.
* All 12 drives in the Kudu master's node were clean. I'll skip them for the remainder of the analysis.
* Each of the remaining five nodes has ~120,000 containers, the vast majority of which are full.
* In total, three machines have two corrupt inodes each, one machine has three corrupt inodes, and one has 12 corrupt inodes.

I focused on one of the inodes on the machine with 12 corrupt inodes. It is indeed a container data file. It was limited to 1353 blocks (full 'kudu pbc dump' output attached) as per the investigation done by commit 4923a74. Of those, 1078 were deleted, leaving 275 live blocks. filefrag shows that the file has 214 extents (full output attached), and debugfs (full output attached) shows one level 0 interior node and four level 1 interior nodes to support that extent tree.

I wrote a script (attached) to replay the container as the LBM would have written it. The script parses a LBM container metadata file via dump_all_blocks.py (see KUDU-1856), and adheres to LBM semantics in many ways during replay, including preallocation, hole punching (with proper alignment), and fdatasync. Obviously it doesn't have access to the original data so it just writes out strings of zeroes.

I created a 16G loopback-mounted ext4 filesystem with a 2048 byte block size and replayed this container in it. When unmounted and fsck'ed, I couldn't reproduce the corruption.

In terms of next steps, I could investigate some of the other containers to see if they're different from the one I arbitrarily chose. We could also decide that there were so few occurrences of this relative to the total number of blocks written that it's not an issue that warrants our attention. Or we could take more drastic action. Please chime in if you have thoughts.


> Log block manager triggers ext4 hole punching bug in el6
> --------------------------------------------------------
>
>                 Key: KUDU-1508
>                 URL: https://issues.apache.org/jira/browse/KUDU-1508
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Assignee: Adar Dembo
>            Priority: Blocker
>             Fix For: 1.2.0
>
>         Attachments: debugfs.txt, e9f83e4acef3405f99d01914317351ce.metadata, filefrag.txt, pbc_dump.txt, replay_container.py
>
>
> I've experienced many times that when I reboot an el6 node that was running Kudu tservers, fsck reports issues like:
> data6 contains a file system with errors, check forced.
> data6: Interior extent node level 0 of inode 5259348:
> Logical start 154699 does not match logical start 2623046 at next level.  
> After some investigation, I've determined that this is due to an ext4 kernel bug: https://patchwork.ozlabs.org/patch/206123/
> Details in a comment to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)