You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "John Vines (JIRA)" <ji...@apache.org> on 2013/05/04 01:18:22 UTC

[jira] [Resolved] (ACCUMULO-1374) Sudden Death of master, gc, and tservers

     [ https://issues.apache.org/jira/browse/ACCUMULO-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Vines resolved ACCUMULO-1374.
----------------------------------

    Resolution: Invalid
      Assignee:     (was: Eric Newton)

PEBCAK

{code}
grep -i kill /var/log/syslog | tail
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.570901] Out of memory: Kill process 2318 (java) score 480 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.570931] Killed process 2318 (java) total-vm:5369512kB, anon-rss:3655040kB, file-rss:0kB
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.676155] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.676196]  [<ffffffff81119745>] oom_kill_process+0x85/0xb0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.698754] Out of memory: Kill process 1342 (java) score 169 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.698776] Killed process 1342 (java) total-vm:3176364kB, anon-rss:1287772kB, file-rss:0kB
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.735364] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.735403]  [<ffffffff81119745>] oom_kill_process+0x85/0xb0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.758067] Out of memory: Kill process 1512 (java) score 60 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.758093] Killed process 1512 (java) total-vm:2531416kB, anon-rss:461072kB, file-rss:0kB
{code}
                
> Sudden Death of master, gc, and tservers
> ----------------------------------------
>
>                 Key: ACCUMULO-1374
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1374
>             Project: Accumulo
>          Issue Type: Bug
>          Components: gc, master, tserver
>         Environment: 1.5, svn#1470047 & 1477382 - both in standalone instance on ec2 on ubuntu and small cluster on bare metal CentOs
>            Reporter: John Vines
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> I wish I could provide more information. This has happened once on a bare metal centos cluster while running vanilla continuous ingest of svn#1470047. There was nothing reported in the logs when one of the tservers just died after the system had been up for ~1 day. The out and err files were sparse, and the master only reported that it had lost connection with the tserver at the point when the tserver just stopped logging (it was overnight, so this was not witnessed until morning).
> It recently happened again on a standalone instance on ec2 running ubuntu and svn#1477382. The instance had been running for ~7 hours. This time the gc, master, and tserver died. The gc died first, and then 2m:48s later the master died. 200ms later the tserver died. Again, there was no output in any of the out or err files for the processes. The logs also have no errors or warnings in them, just abrupt stops. The processes came up fine once restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira