You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Created) (JIRA)" <ji...@apache.org> on 2012/03/06 23:15:57 UTC
[jira] [Created] (ACCUMULO-449) Failed log copy is not restarted
Failed log copy is not restarted
--------------------------------
Key: ACCUMULO-449
URL: https://issues.apache.org/jira/browse/ACCUMULO-449
Project: Accumulo
Issue Type: Bug
Components: logger, master
Reporter: Keith Turner
Assignee: Eric Newton
Fix For: 1.4.0
I shut a single node instance down uncleanly. When I restarted it the logger did not have enough memory to preform the log sort, it got an OOME and died. I edited accumulo-env.sh and gave the logger process more memory. I restarted the logger process. However, the log recovery never restarted.
The master was continually printing message like the following.
{noformat}
06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050 from xxx.xxx.xxx.xxx:11224 (for 1210.306000 seconds) 0.0
{noformat}
After 20m I restarted the master and then log recovery proceeded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-449) Failed log copy is not restarted
Posted by "Eric Newton (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223861#comment-13223861 ]
Eric Newton commented on ACCUMULO-449:
--------------------------------------
It does restart, but it takes a long time to timeout (an hour?!?). We need to use an API to get the status from the logger: using HDFS to communicate is too much of a kludge.
> Failed log copy is not restarted
> --------------------------------
>
> Key: ACCUMULO-449
> URL: https://issues.apache.org/jira/browse/ACCUMULO-449
> Project: Accumulo
> Issue Type: Bug
> Components: logger, master
> Reporter: Keith Turner
> Assignee: Eric Newton
> Labels: 14_qa_bug
> Fix For: 1.4.0
>
>
> I shut a single node instance down uncleanly. When I restarted it the logger did not have enough memory to preform the log sort, it got an OOME and died. I edited accumulo-env.sh and gave the logger process more memory. I restarted the logger process. However, the log recovery never restarted.
> The master was continually printing message like the following.
> {noformat}
> 06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050 from xxx.xxx.xxx.xxx:11224 (for 1210.306000 seconds) 0.0
> {noformat}
> After 20m I restarted the master and then log recovery proceeded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ACCUMULO-449) Failed log copy is not restarted
Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner updated ACCUMULO-449:
----------------------------------
Affects Version/s: 1.4.0
1.3.5
Fix Version/s: (was: 1.4.0)
1.4.1
Should probably notice that the logger lost its zookeeper lock.
> Failed log copy is not restarted
> --------------------------------
>
> Key: ACCUMULO-449
> URL: https://issues.apache.org/jira/browse/ACCUMULO-449
> Project: Accumulo
> Issue Type: Bug
> Components: logger, master
> Affects Versions: 1.3.5, 1.4.0
> Reporter: Keith Turner
> Assignee: Eric Newton
> Labels: 14_qa_bug
> Fix For: 1.4.1
>
>
> I shut a single node instance down uncleanly. When I restarted it the logger did not have enough memory to preform the log sort, it got an OOME and died. I edited accumulo-env.sh and gave the logger process more memory. I restarted the logger process. However, the log recovery never restarted.
> The master was continually printing message like the following.
> {noformat}
> 06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050 from xxx.xxx.xxx.xxx:11224 (for 1210.306000 seconds) 0.0
> {noformat}
> After 20m I restarted the master and then log recovery proceeded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-449) Failed log copy is not restarted
Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223749#comment-13223749 ]
Keith Turner commented on ACCUMULO-449:
---------------------------------------
Originally the logger had a max of 128m of heap. I think I copied accumulo-env.sh.512MBBstandalone-native-example. I upped the logger heap to 512m max.
> Failed log copy is not restarted
> --------------------------------
>
> Key: ACCUMULO-449
> URL: https://issues.apache.org/jira/browse/ACCUMULO-449
> Project: Accumulo
> Issue Type: Bug
> Components: logger, master
> Reporter: Keith Turner
> Assignee: Eric Newton
> Labels: 14_qa_bug
> Fix For: 1.4.0
>
>
> I shut a single node instance down uncleanly. When I restarted it the logger did not have enough memory to preform the log sort, it got an OOME and died. I edited accumulo-env.sh and gave the logger process more memory. I restarted the logger process. However, the log recovery never restarted.
> The master was continually printing message like the following.
> {noformat}
> 06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050 from xxx.xxx.xxx.xxx:11224 (for 1210.306000 seconds) 0.0
> {noformat}
> After 20m I restarted the master and then log recovery proceeded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira