You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/03/20 06:02:00 UTC

[jira] [Commented] (KUDU-1989) kudu-tserver met checksum mismatch after node crash and restart.

    [ https://issues.apache.org/jira/browse/KUDU-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405847#comment-16405847 ] 

Todd Lipcon commented on KUDU-1989:
-----------------------------------

Saw this issue again on a cluster. The tail of the metadata file looks like:

I0319 22:56:08.389683 71257 pb_util.cc:264] Reading PB with version 2 starting at offset 1955
33	block_id { id: 14957875 } op_type: DELETE timestamp_us: 1514256262030601
I0319 22:56:08.389689 71257 pb_util.cc:264] Reading PB with version 2 starting at offset 1989
Corruption: Data length checksum does not match: Incorrect checksum in file /data/4/kudu/data/a9264be259c44604a82726cdb04b9e09.metadata at offset 1993: Checksum does not match. Expected: 0. Actual: 1214729159

00007a0: fc42 a216 0000 0088 e846 650a 0909 333d  .B.......Fe...3=
00007b0: e400 0000 0000 1002 1889 cae3 94d4 a6d8  ................
00007c0: 02b4 cdc4 bf00 0000 0000 0000 0000 0000  ................
00007d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00007e0: 0000 0000 0000 00                        .......

The '0000' start exactly at the protobuf boundary. The mtime on this file is 2017-12-25 18:44:35.786898579 and 'last' shows it rebooted around that time:
reboot   system boot  2.6.32-573.26.1. Mon Dec 25 18:49 - 12:54 (2+18:05)   


> kudu-tserver met checksum mismatch after node crash and restart.
> ----------------------------------------------------------------
>
>                 Key: KUDU-1989
>                 URL: https://issues.apache.org/jira/browse/KUDU-1989
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>            Reporter: zhangsong
>            Priority: Major
>
> kudu-tserver version: 1.0.0
> 1 firstly node crashed 
> 2 when trying to restart the kudu-tserver , found it could not be restarted successfully.
> 3 log content in kudu-tserver.FATAL:
> "
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> F0421 16:01:09.283123 20127 tablet_server_main.cc:55] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not read records from container /export/servers/kudu/1.0-sp/tserver_data/data/a22af504ca16421aad511b14c51130a9: Data length checksum does not match: Incorrect checksum in file /export/servers/kudu/1.0-sp/tserver_data/data/a22af504ca16421aad511b14c51130a9.metadata at offset 753661: Checksum does not match. Expected: 843507848. Actual: 1699145864
> "
> Not sure if this has been reported , create it here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)