You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vcl.apache.org by "Nathaniel Sherry (JIRA)" <ji...@apache.org> on 2013/05/29 18:40:20 UTC
[jira] [Commented] (VCL-693) VCL Cluster Reinstall Fails

    [ https://issues.apache.org/jira/browse/VCL-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669413#comment-13669413 ] 

Nathaniel Sherry commented on VCL-693:
--------------------------------------

After applying the following patch:

"add the following line to the beginning of the reload_image subroutine in new.pm:
delete_computerloadlog_reservation($reservation_id, '!begin');
Just above the 'my $node_status;' line.  Then restart the vcld service."

As per request, I checked vcld.log when reinstalling a cluster, after the request state flipped to "reserved", and did see the warning:

2013-05-29 12:21:02|24682|3281:3648|reserved|utils.pm:reservation_being_processed(8638)|computerloadlog 'begin' entry does NOT exist for reservation 3648
2013-05-29 12:21:02|29123|3281:3647|reserved|DataStructure.pm:_initialize(674)|retrieved data for imagerevision ID: 1
2013-05-29 12:21:02|29123|3281:3647|reserved|DataStructure.pm:_initialize(690)|retrieved data for image ID: 1
2013-05-29 12:21:02|24682|3281:3648|reserved|utils.pm:run_command(8710)|executed command: pgrep -fl 'vcld [0-9]+:3648 ', pid: 29124, exit status: 0, output:
|24682|3281:3648|reserved| 4892 vcld 3281:3648 reserved ecco-vm-x21>ecco-vmhost-x6 linux-TestImage2-16-v2 nsasherr (cluster=child)
2013-05-29 12:21:02|24682|3281:3648|reserved|utils.pm:is_management_node_process_running(8933)|found matching process: 4892 vcld 3281:3648 reserved ecco-vm-x21>ecco-vmhost-x6 linux-TestImage2-16-v2 nsasherr (clu
ster=child)
2013-05-29 12:21:02|24682|3281:3648|reserved|utils.pm:is_management_node_process_running(8939)|process is running, identifier: 'vcld [0-9]+:3648 ', returning array containing PIDs: 4892

|24682|3281:3648|reserved| ---- WARNING ---- 
|24682|3281:3648|reserved| 2013-05-29 12:21:02|24682|3281:3648|reserved|utils.pm:reservation_being_processed(8651)|computerloadlog 'begin' entry does NOT exist but running process was found: 4892, assuming reservation is currently being processed
|24682|3281:3648|reserved| ( 0) utils.pm, reservation_being_processed (line: 8651)
|24682|3281:3648|reserved| (-1) vcld, main (line: 273)


|24682|3281:3648|reserved| ---- WARNING ---- 
|24682|3281:3648|reserved| 2013-05-29 12:21:02|24682|3281:3648|reserved|vcld:main(277)|reservation 3648 is already being processed
|24682|3281:3648|reserved| ( 0) vcld, main (line: 277)


Where reservation 3648 is the child reservation.

And dumped the contents of the compuerloadlog:
http://pastebin.com/13jR0hCS

                
> VCL Cluster Reinstall Fails
> ---------------------------
>
>                 Key: VCL-693
>                 URL: https://issues.apache.org/jira/browse/VCL-693
>             Project: VCL
>          Issue Type: Bug
>          Components: vcld (backend)
>    Affects Versions: 2.3.1
>         Environment: CentOS, libvirt/kvm
>            Reporter: Nathaniel Sherry
>              Labels: cluster, vcld
>
> It seems that when I reinstall a cluster from the "Current Reservations" page, the post_reserve scripts don't get run on the child nodes. I think the issue is that vcld isn't spawning a child process for the child nodes when invoking reserved.pm after finishing new.pm during the reinstallation. The comptuer state of the child is also left at "Reloading". 
> This results in the "Connect" button never reappearing.
> DB - http://pastebin.com/waHeP2wd
> Log - http://pastebin.com/EsqZhbTi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira