You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Chip Childers <ch...@sungard.com> on 2013/05/20 22:15:14 UTC

[VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

All,

As discussed on another thread [1], we identified a bug
(CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
are not configured to sync their time with either the host HV or an NTP
service.  That bug affects the system VMs for all three primary HVs (KVM, 
Xen and vSphere).  Patches have been committed addressing vSphere and
KVM.  It appears that a correction for Xen would require the re-build of
a system VM image and a full round of regression testing that image.

Given that the discussion thread has not resulted in a consensus on this
issue, I unfortunately believe that the only path forward is to call for 
a formal VOTE.

Please respond with one of the following:

+1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
+0: don't care one way or the other
-1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved

-chip

[1] http://markmail.org/message/rw7vciq3r33biasb

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On Mon, May 20, 2013 at 04:15:14PM -0400, Chip Childers wrote:
> All,
> 
> As discussed on another thread [1], we identified a bug
> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> are not configured to sync their time with either the host HV or an NTP
> service.  That bug affects the system VMs for all three primary HVs (KVM, 
> Xen and vSphere).  Patches have been committed addressing vSphere and
> KVM.  It appears that a correction for Xen would require the re-build of
> a system VM image and a full round of regression testing that image.
> 
> Given that the discussion thread has not resulted in a consensus on this
> issue, I unfortunately believe that the only path forward is to call for 
> a formal VOTE.
> 
> Please respond with one of the following:
> 
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
> 
> -chip
> 
> [1] http://markmail.org/message/rw7vciq3r33biasb

I believe we are arriving at something resembling a consensus on this
issue, but I'll keep the thread alive for the full 72 hours and then
summarize.

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Joe Brockmeier <jz...@zonker.net>.
> On Mon, May 20, 2013, at 03:15 PM, Chip Childers wrote:
> Please respond with one of the following:
> 
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until
> CLOUDSTACK-2492 has been fully resolved

So, it's not an option, but I'm -0 on this. I do care, and I'm not
really in favor of releasing without this being addressed but I'm not
quite at -1. 

Best,

jzb
-- 
Joe Brockmeier
jzb@zonker.net
Twitter: @jzb
http://www.dissociatedpress.net/

Re: [VOTE][RESULTS] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On Thu, May 23, 2013 at 08:51:56PM +0000, Musayev, Ilya wrote:
> +1, we would need to add this bug to "known issues"

Done

RE: [VOTE][RESULTS] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by "Musayev, Ilya" <im...@webmd.net>.
+1, we would need to add this bug to "known issues"

> -----Original Message-----
> From: Chip Childers [mailto:chip.childers@sungard.com]
> Sent: Thursday, May 23, 2013 1:01 PM
> To: dev@cloudstack.apache.org
> Subject: [VOTE][RESULTS] Move forward with 4.1 without a Xen-specific fix
> for CLOUDSTACK-2492?
> 
> On Mon, May 20, 2013 at 04:15:14PM -0400, Chip Childers wrote:
> > All,
> >
> > As discussed on another thread [1], we identified a bug
> > (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
> VMs
> > are not configured to sync their time with either the host HV or an
> > NTP service.  That bug affects the system VMs for all three primary
> > HVs (KVM, Xen and vSphere).  Patches have been committed addressing
> > vSphere and KVM.  It appears that a correction for Xen would require
> > the re-build of a system VM image and a full round of regression testing
> that image.
> >
> > Given that the discussion thread has not resulted in a consensus on
> > this issue, I unfortunately believe that the only path forward is to
> > call for a formal VOTE.
> >
> > Please respond with one of the following:
> >
> > +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> > +resolved
> > +0: don't care one way or the other
> > -1: do *not* proceed with any further 4.1 release candidates until
> > CLOUDSTACK-2492 has been fully resolved
> >
> > -chip
> >
> > [1] http://markmail.org/message/rw7vciq3r33biasb
> 
> Great discussion on this thread.  I've summarized the votes and the
> comments below:
> 
> +1: 7 votes
> Chiradeep, Ahmad, David, Marcus, Sebastien, Chip, Mathias
> 
> +0: 0 votes
> 
> -0: 1 vote
> Joe
> 
> -1: 3 votes
> John, Francois, Outback
> 
> Given the discussion that ensued and the results of the voting, I will *no
> longer block the release for this issue*.  I understand that this is not the
> favorite resolution for everyone, but do understand that we are resolving
> this problem officially for our 4.2.0 release.  Using the experimental system
> VM images will be an option for users that want the S3 functionality within
> 4.1.
> 
> -chip



[VOTE][RESULTS] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On Mon, May 20, 2013 at 04:15:14PM -0400, Chip Childers wrote:
> All,
> 
> As discussed on another thread [1], we identified a bug
> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> are not configured to sync their time with either the host HV or an NTP
> service.  That bug affects the system VMs for all three primary HVs (KVM, 
> Xen and vSphere).  Patches have been committed addressing vSphere and
> KVM.  It appears that a correction for Xen would require the re-build of
> a system VM image and a full round of regression testing that image.
> 
> Given that the discussion thread has not resulted in a consensus on this
> issue, I unfortunately believe that the only path forward is to call for 
> a formal VOTE.
> 
> Please respond with one of the following:
> 
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
> 
> -chip
> 
> [1] http://markmail.org/message/rw7vciq3r33biasb

Great discussion on this thread.  I've summarized the votes and the
comments below:

+1: 7 votes
Chiradeep, Ahmad, David, Marcus, Sebastien, Chip, Mathias

+0: 0 votes

-0: 1 vote
Joe

-1: 3 votes
John, Francois, Outback

Given the discussion that ensued and the results of the voting, I 
will *no longer block the release for this issue*.  I understand 
that this is not the favorite resolution for everyone, but do
understand that we are resolving this problem officially for our 4.2.0
release.  Using the experimental system VM images will be an option for
users that want the S3 functionality within 4.1.

-chip

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Francois Gaudreault <fg...@cloudops.com>.
I am not sure if I am allowed to vote here but...

I guess the SSVM has been built using HVM instead of PV?  If the SSVM is 
PV, it should sync domU -> dom0. It might also require some hotfixes on 
the XenServer side if you are using XS 6.0.2 (hotfix 18) that addresses 
some clock drift issues. Because of the potential impact on other APIs 
(swift?), I would say: -1.

On 2013-05-21 4:20 PM, Chip Childers wrote:
> On Mon, May 20, 2013 at 04:15:14PM -0400, Chip Childers wrote:
>> All,
>>
>> As discussed on another thread [1], we identified a bug
>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>> are not configured to sync their time with either the host HV or an NTP
>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>> Xen and vSphere).  Patches have been committed addressing vSphere and
>> KVM.  It appears that a correction for Xen would require the re-build of
>> a system VM image and a full round of regression testing that image.
>>
>> Given that the discussion thread has not resulted in a consensus on this
>> issue, I unfortunately believe that the only path forward is to call for
>> a formal VOTE.
>>
>> Please respond with one of the following:
>>
>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
>> +0: don't care one way or the other
>> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
>>
>> -chip
>>
>> [1] http://markmail.org/message/rw7vciq3r33biasb
> We need more people to voice their opinions here please.
>
>


-- 
Francois Gaudreault
Architecte de Solution Cloud | Cloud Solutions Architect
fgaudreault@cloudops.com
514-629-6775
- - -
CloudOps
420 rue Guy
Montréal QC  H3J 1S6
www.cloudops.com
@CloudOps_


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On Mon, May 20, 2013 at 04:15:14PM -0400, Chip Childers wrote:
> All,
> 
> As discussed on another thread [1], we identified a bug
> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> are not configured to sync their time with either the host HV or an NTP
> service.  That bug affects the system VMs for all three primary HVs (KVM, 
> Xen and vSphere).  Patches have been committed addressing vSphere and
> KVM.  It appears that a correction for Xen would require the re-build of
> a system VM image and a full round of regression testing that image.
> 
> Given that the discussion thread has not resulted in a consensus on this
> issue, I unfortunately believe that the only path forward is to call for 
> a formal VOTE.
> 
> Please respond with one of the following:
> 
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
> 
> -chip
> 
> [1] http://markmail.org/message/rw7vciq3r33biasb

We need more people to voice their opinions here please.

RE: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Sangeetha Hariharan <Sa...@citrix.com>.
Templates for  IPV6 feature in 4.1 is different from the 4.2 templates. They are hosted here - http://cloudstack.apt-get.eu/systemvm/ as 4.1.0-experimental-ipv6-* .
Also support was extended only for KVM and Xenserver.

-Thanks
Sangeetha

-----Original Message-----
From: Marcus Sorensen [mailto:shadowsor@gmail.com] 
Sent: Tuesday, May 21, 2013 4:27 PM
To: dev@cloudstack.apache.org
Subject: Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

I'm not sure how well tested they are, but they're already more or less compatible. The idea was floated to provide ipv6 preview with instructions to use the 4.2 template.
On May 21, 2013 5:09 PM, "John Burwell" <jb...@basho.com> wrote:

> Chiradeep,
>
> Is it possible to "back port" the 4.2 system VMs to 4.1?  What would 
> be involved in such an effort?
>
> Thanks,
> -John
>
> On May 21, 2013, at 7:07 PM, Chiradeep Vittal 
> <Ch...@citrix.com>
> wrote:
>
> > The latest 4.2 systemvms do have ntp built in. The earlier comment 
> > about HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS 
> > Linux vms, there is no sync between domU and dom0.
> >
> > On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
> >
> >> +1, it seems that it is no worse off then it ever has been, aside 
> >> +from
> >> the caveat that newer features are beginning to rely on it. I do 
> >> agree though that it could perhaps be rolled into the newer system 
> >> vm, as an option for people to use at their own risk.
> >>
> >> Of course, if someone wants to patch it up and get testing going, 
> >> I'm all for that as well. I just don't see holding things up.
> >>
> >> On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com>
> wrote:
> >>> David,
> >>>
> >>> I am willing to do the work.  However, as I understand the 
> >>> circumstances, a complete build process for the system VMs has not 
> >>> been released.  If I am incorrect in my understanding, I will do 
> >>> the work necessary to fix the problem.
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
> >>>
> >>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers 
> >>>> <ch...@sungard.com> wrote:
> >>>>> All,
> >>>>>
> >>>>> As discussed on another thread [1], we identified a bug
> >>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the 
> >>>>> System VMs are not configured to sync their time with either the 
> >>>>> host HV or an NTP service.  That bug affects the system VMs for 
> >>>>> all three primary HVs (KVM, Xen and vSphere).  Patches have been 
> >>>>> committed addressing vSphere and KVM.  It appears that a 
> >>>>> correction for Xen would require the re-build of a system VM 
> >>>>> image and a full round of regression testing that image.
> >>>>>
> >>>>> Given that the discussion thread has not resulted in a consensus 
> >>>>> on this issue, I unfortunately believe that the only path 
> >>>>> forward is to call for a formal VOTE.
> >>>>>
> >>>>> Please respond with one of the following:
> >>>>>
> >>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 
> >>>>> +being
> >>>>> resolved
> >>>>> +0: don't care one way or the other
> >>>>> -1: do *not* proceed with any further 4.1 release candidates 
> >>>>> until
> >>>>> CLOUDSTACK-2492 has been fully resolved
> >>>>>
> >>>>> -chip
> >>>>>
> >>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
> >>>>
> >>>>
> >>>> So it appalls me that this problem exists. If I understand 
> >>>> correctly, from folks who commercially support derivatives of 
> >>>> ACS. Lack of time synchronization has been a factor in major 
> >>>> outages, but that's typically been between the hypervisors and management servers.
> >>>> Regardless we realize (or should) that time is important for so 
> >>>> many reasons (encryption, logs, and scores of other reasons)
> >>>>
> >>>> But when the rubber meets the road - here are the two points that 
> >>>> decide it for me.
> >>>>
> >>>> 1. This is not a new problem. It's bad, it shouldn't exist, but 
> >>>> it does, and it has for some time it would seem. That suggests 
> >>>> it's not catastrophic, and hasn't yet blocked folks from getting 
> >>>> things done with CloudStack.
> >>>>
> >>>> 2. I see no one stepping up to do the work. I am not personally a 
> >>>> fan of issuing what is the effective equivalent of an 'unfunded mandate'.
> >>>> The problem isn't just one of building a new SSVM - it's one of 
> >>>> testing it, and repeating all of the validation that has already 
> >>>> been done with the existing sysvm.
> >>>>
> >>>> Perhaps there is a middle ground (we have a default sysvm, but 
> >>>> perhaps like we are doing with the IPv6-enabled sysvm we have a 
> >>>> time-enabled sysvm available for folks.
> >>>>
> >>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I 
> >>>> hate that we are seeing this problem, but with no one stepping up 
> >>>> to do all of the work, I'm not quite ready to hold a release 
> >>>> hostage waiting to find such a person.
> >>>>
> >>>> --David
> >>>
> >
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Chiradeep,

It seems that we have a solution to both clock drift and IPv6 for
system VMs.  As such, it sounds we have a compelling reason to pull
this work back 4.1.

What QA is required?  How long will it take?  What can we do as a
community to help?

Thanks,
-John

On May 21, 2013, at 7:54 PM, Chiradeep Vittal
<Ch...@citrix.com> wrote:

> They are compatible, but face the same problem - lack of QA.
>
> On 5/21/13 4:26 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
>> I'm not sure how well tested they are, but they're already more or less
>> compatible. The idea was floated to provide ipv6 preview with instructions
>> to use the 4.2 template.
>> On May 21, 2013 5:09 PM, "John Burwell" <jb...@basho.com> wrote:
>>
>>> Chiradeep,
>>>
>>> Is it possible to "back port" the 4.2 system VMs to 4.1?  What would be
>>> involved in such an effort?
>>>
>>> Thanks,
>>> -John
>>>
>>> On May 21, 2013, at 7:07 PM, Chiradeep Vittal
>>> <Ch...@citrix.com>
>>> wrote:
>>>
>>>> The latest 4.2 systemvms do have ntp built in. The earlier comment
>>> about
>>>> HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS Linux vms,
>>>> there is no sync between domU and dom0.
>>>>
>>>> On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>>>>
>>>>> +1, it seems that it is no worse off then it ever has been, aside
>>> from
>>>>> the caveat that newer features are beginning to rely on it. I do
>>> agree
>>>>> though that it could perhaps be rolled into the newer system vm, as
>>> an
>>>>> option for people to use at their own risk.
>>>>>
>>>>> Of course, if someone wants to patch it up and get testing going, I'm
>>>>> all for that as well. I just don't see holding things up.
>>>>>
>>>>> On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com>
>>> wrote:
>>>>>> David,
>>>>>>
>>>>>> I am willing to do the work.  However, as I understand the
>>>>>> circumstances, a complete build process for the system VMs has not
>>> been
>>>>>> released.  If I am incorrect in my understanding, I will do the work
>>>>>> necessary to fix the problem.
>>>>>>
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
>>>>>>
>>>>>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>>>>>>> <ch...@sungard.com> wrote:
>>>>>>>> All,
>>>>>>>>
>>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
>>> VMs
>>>>>>>> are not configured to sync their time with either the host HV or
>>> an
>>>>>>>> NTP
>>>>>>>> service.  That bug affects the system VMs for all three primary
>>> HVs
>>>>>>>> (KVM,
>>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
>>> and
>>>>>>>> KVM.  It appears that a correction for Xen would require the
>>> re-build
>>>>>>>> of
>>>>>>>> a system VM image and a full round of regression testing that
>>> image.
>>>>>>>>
>>>>>>>> Given that the discussion thread has not resulted in a consensus
>>> on
>>>>>>>> this
>>>>>>>> issue, I unfortunately believe that the only path forward is to
>>> call
>>>>>>>> for
>>>>>>>> a formal VOTE.
>>>>>>>>
>>>>>>>> Please respond with one of the following:
>>>>>>>>
>>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>> being
>>>>>>>> resolved
>>>>>>>> +0: don't care one way or the other
>>>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>>
>>>>>>>> -chip
>>>>>>>>
>>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>>
>>>>>>>
>>>>>>> So it appalls me that this problem exists. If I understand
>>> correctly,
>>>>>>> from folks who commercially support derivatives of ACS. Lack of
>>> time
>>>>>>> synchronization has been a factor in major outages, but that's
>>>>>>> typically been between the hypervisors and management servers.
>>>>>>> Regardless we realize (or should) that time is important for so
>>> many
>>>>>>> reasons (encryption, logs, and scores of other reasons)
>>>>>>>
>>>>>>> But when the rubber meets the road - here are the two points that
>>>>>>> decide it for me.
>>>>>>>
>>>>>>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
>>>>>>> does, and it has for some time it would seem. That suggests it's
>>> not
>>>>>>> catastrophic, and hasn't yet blocked folks from getting things done
>>>>>>> with CloudStack.
>>>>>>>
>>>>>>> 2. I see no one stepping up to do the work. I am not personally a
>>> fan
>>>>>>> of issuing what is the effective equivalent of an 'unfunded
>>> mandate'.
>>>>>>> The problem isn't just one of building a new SSVM - it's one of
>>>>>>> testing it, and repeating all of the validation that has already
>>> been
>>>>>>> done with the existing sysvm.
>>>>>>>
>>>>>>> Perhaps there is a middle ground (we have a default sysvm, but
>>> perhaps
>>>>>>> like we are doing with the IPv6-enabled sysvm we have a
>>> time-enabled
>>>>>>> sysvm available for folks.
>>>>>>>
>>>>>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I
>>> hate
>>>>>>> that we are seeing this problem, but with no one stepping up to do
>>> all
>>>>>>> of the work, I'm not quite ready to hold a release hostage waiting
>>> to
>>>>>>> find such a person.
>>>>>>>
>>>>>>> --David
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
They are compatible, but face the same problem - lack of QA.

On 5/21/13 4:26 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>I'm not sure how well tested they are, but they're already more or less
>compatible. The idea was floated to provide ipv6 preview with instructions
>to use the 4.2 template.
>On May 21, 2013 5:09 PM, "John Burwell" <jb...@basho.com> wrote:
>
>> Chiradeep,
>>
>> Is it possible to "back port" the 4.2 system VMs to 4.1?  What would be
>> involved in such an effort?
>>
>> Thanks,
>> -John
>>
>> On May 21, 2013, at 7:07 PM, Chiradeep Vittal
>><Ch...@citrix.com>
>> wrote:
>>
>> > The latest 4.2 systemvms do have ntp built in. The earlier comment
>>about
>> > HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS Linux vms,
>> > there is no sync between domU and dom0.
>> >
>> > On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>> >
>> >> +1, it seems that it is no worse off then it ever has been, aside
>>from
>> >> the caveat that newer features are beginning to rely on it. I do
>>agree
>> >> though that it could perhaps be rolled into the newer system vm, as
>>an
>> >> option for people to use at their own risk.
>> >>
>> >> Of course, if someone wants to patch it up and get testing going, I'm
>> >> all for that as well. I just don't see holding things up.
>> >>
>> >> On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com>
>> wrote:
>> >>> David,
>> >>>
>> >>> I am willing to do the work.  However, as I understand the
>> >>> circumstances, a complete build process for the system VMs has not
>>been
>> >>> released.  If I am incorrect in my understanding, I will do the work
>> >>> necessary to fix the problem.
>> >>>
>> >>> Thanks,
>> >>> -John
>> >>>
>> >>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
>> >>>
>> >>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>> >>>> <ch...@sungard.com> wrote:
>> >>>>> All,
>> >>>>>
>> >>>>> As discussed on another thread [1], we identified a bug
>> >>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
>>VMs
>> >>>>> are not configured to sync their time with either the host HV or
>>an
>> >>>>> NTP
>> >>>>> service.  That bug affects the system VMs for all three primary
>>HVs
>> >>>>> (KVM,
>> >>>>> Xen and vSphere).  Patches have been committed addressing vSphere
>>and
>> >>>>> KVM.  It appears that a correction for Xen would require the
>>re-build
>> >>>>> of
>> >>>>> a system VM image and a full round of regression testing that
>>image.
>> >>>>>
>> >>>>> Given that the discussion thread has not resulted in a consensus
>>on
>> >>>>> this
>> >>>>> issue, I unfortunately believe that the only path forward is to
>>call
>> >>>>> for
>> >>>>> a formal VOTE.
>> >>>>>
>> >>>>> Please respond with one of the following:
>> >>>>>
>> >>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>being
>> >>>>> resolved
>> >>>>> +0: don't care one way or the other
>> >>>>> -1: do *not* proceed with any further 4.1 release candidates until
>> >>>>> CLOUDSTACK-2492 has been fully resolved
>> >>>>>
>> >>>>> -chip
>> >>>>>
>> >>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>> >>>>
>> >>>>
>> >>>> So it appalls me that this problem exists. If I understand
>>correctly,
>> >>>> from folks who commercially support derivatives of ACS. Lack of
>>time
>> >>>> synchronization has been a factor in major outages, but that's
>> >>>> typically been between the hypervisors and management servers.
>> >>>> Regardless we realize (or should) that time is important for so
>>many
>> >>>> reasons (encryption, logs, and scores of other reasons)
>> >>>>
>> >>>> But when the rubber meets the road - here are the two points that
>> >>>> decide it for me.
>> >>>>
>> >>>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
>> >>>> does, and it has for some time it would seem. That suggests it's
>>not
>> >>>> catastrophic, and hasn't yet blocked folks from getting things done
>> >>>> with CloudStack.
>> >>>>
>> >>>> 2. I see no one stepping up to do the work. I am not personally a
>>fan
>> >>>> of issuing what is the effective equivalent of an 'unfunded
>>mandate'.
>> >>>> The problem isn't just one of building a new SSVM - it's one of
>> >>>> testing it, and repeating all of the validation that has already
>>been
>> >>>> done with the existing sysvm.
>> >>>>
>> >>>> Perhaps there is a middle ground (we have a default sysvm, but
>>perhaps
>> >>>> like we are doing with the IPv6-enabled sysvm we have a
>>time-enabled
>> >>>> sysvm available for folks.
>> >>>>
>> >>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I
>>hate
>> >>>> that we are seeing this problem, but with no one stepping up to do
>>all
>> >>>> of the work, I'm not quite ready to hold a release hostage waiting
>>to
>> >>>> find such a person.
>> >>>>
>> >>>> --David
>> >>>
>> >
>>
>>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
I'm not sure how well tested they are, but they're already more or less
compatible. The idea was floated to provide ipv6 preview with instructions
to use the 4.2 template.
On May 21, 2013 5:09 PM, "John Burwell" <jb...@basho.com> wrote:

> Chiradeep,
>
> Is it possible to "back port" the 4.2 system VMs to 4.1?  What would be
> involved in such an effort?
>
> Thanks,
> -John
>
> On May 21, 2013, at 7:07 PM, Chiradeep Vittal <Ch...@citrix.com>
> wrote:
>
> > The latest 4.2 systemvms do have ntp built in. The earlier comment about
> > HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS Linux vms,
> > there is no sync between domU and dom0.
> >
> > On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
> >
> >> +1, it seems that it is no worse off then it ever has been, aside from
> >> the caveat that newer features are beginning to rely on it. I do agree
> >> though that it could perhaps be rolled into the newer system vm, as an
> >> option for people to use at their own risk.
> >>
> >> Of course, if someone wants to patch it up and get testing going, I'm
> >> all for that as well. I just don't see holding things up.
> >>
> >> On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com>
> wrote:
> >>> David,
> >>>
> >>> I am willing to do the work.  However, as I understand the
> >>> circumstances, a complete build process for the system VMs has not been
> >>> released.  If I am incorrect in my understanding, I will do the work
> >>> necessary to fix the problem.
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
> >>>
> >>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> >>>> <ch...@sungard.com> wrote:
> >>>>> All,
> >>>>>
> >>>>> As discussed on another thread [1], we identified a bug
> >>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> >>>>> are not configured to sync their time with either the host HV or an
> >>>>> NTP
> >>>>> service.  That bug affects the system VMs for all three primary HVs
> >>>>> (KVM,
> >>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
> >>>>> KVM.  It appears that a correction for Xen would require the re-build
> >>>>> of
> >>>>> a system VM image and a full round of regression testing that image.
> >>>>>
> >>>>> Given that the discussion thread has not resulted in a consensus on
> >>>>> this
> >>>>> issue, I unfortunately believe that the only path forward is to call
> >>>>> for
> >>>>> a formal VOTE.
> >>>>>
> >>>>> Please respond with one of the following:
> >>>>>
> >>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> >>>>> resolved
> >>>>> +0: don't care one way or the other
> >>>>> -1: do *not* proceed with any further 4.1 release candidates until
> >>>>> CLOUDSTACK-2492 has been fully resolved
> >>>>>
> >>>>> -chip
> >>>>>
> >>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
> >>>>
> >>>>
> >>>> So it appalls me that this problem exists. If I understand correctly,
> >>>> from folks who commercially support derivatives of ACS. Lack of time
> >>>> synchronization has been a factor in major outages, but that's
> >>>> typically been between the hypervisors and management servers.
> >>>> Regardless we realize (or should) that time is important for so many
> >>>> reasons (encryption, logs, and scores of other reasons)
> >>>>
> >>>> But when the rubber meets the road - here are the two points that
> >>>> decide it for me.
> >>>>
> >>>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
> >>>> does, and it has for some time it would seem. That suggests it's not
> >>>> catastrophic, and hasn't yet blocked folks from getting things done
> >>>> with CloudStack.
> >>>>
> >>>> 2. I see no one stepping up to do the work. I am not personally a fan
> >>>> of issuing what is the effective equivalent of an 'unfunded mandate'.
> >>>> The problem isn't just one of building a new SSVM - it's one of
> >>>> testing it, and repeating all of the validation that has already been
> >>>> done with the existing sysvm.
> >>>>
> >>>> Perhaps there is a middle ground (we have a default sysvm, but perhaps
> >>>> like we are doing with the IPv6-enabled sysvm we have a time-enabled
> >>>> sysvm available for folks.
> >>>>
> >>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
> >>>> that we are seeing this problem, but with no one stepping up to do all
> >>>> of the work, I'm not quite ready to hold a release hostage waiting to
> >>>> find such a person.
> >>>>
> >>>> --David
> >>>
> >
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Chiradeep,

Is it possible to "back port" the 4.2 system VMs to 4.1?  What would be involved in such an effort?

Thanks,
-John

On May 21, 2013, at 7:07 PM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> The latest 4.2 systemvms do have ntp built in. The earlier comment about
> HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS Linux vms,
> there is no sync between domU and dom0.
> 
> On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:
> 
>> +1, it seems that it is no worse off then it ever has been, aside from
>> the caveat that newer features are beginning to rely on it. I do agree
>> though that it could perhaps be rolled into the newer system vm, as an
>> option for people to use at their own risk.
>> 
>> Of course, if someone wants to patch it up and get testing going, I'm
>> all for that as well. I just don't see holding things up.
>> 
>> On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com> wrote:
>>> David,
>>> 
>>> I am willing to do the work.  However, as I understand the
>>> circumstances, a complete build process for the system VMs has not been
>>> released.  If I am incorrect in my understanding, I will do the work
>>> necessary to fix the problem.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
>>> 
>>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>>>> <ch...@sungard.com> wrote:
>>>>> All,
>>>>> 
>>>>> As discussed on another thread [1], we identified a bug
>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>>> are not configured to sync their time with either the host HV or an
>>>>> NTP
>>>>> service.  That bug affects the system VMs for all three primary HVs
>>>>> (KVM,
>>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>>> KVM.  It appears that a correction for Xen would require the re-build
>>>>> of
>>>>> a system VM image and a full round of regression testing that image.
>>>>> 
>>>>> Given that the discussion thread has not resulted in a consensus on
>>>>> this
>>>>> issue, I unfortunately believe that the only path forward is to call
>>>>> for
>>>>> a formal VOTE.
>>>>> 
>>>>> Please respond with one of the following:
>>>>> 
>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>> resolved
>>>>> +0: don't care one way or the other
>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>> 
>>>>> -chip
>>>>> 
>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>> 
>>>> 
>>>> So it appalls me that this problem exists. If I understand correctly,
>>>> from folks who commercially support derivatives of ACS. Lack of time
>>>> synchronization has been a factor in major outages, but that's
>>>> typically been between the hypervisors and management servers.
>>>> Regardless we realize (or should) that time is important for so many
>>>> reasons (encryption, logs, and scores of other reasons)
>>>> 
>>>> But when the rubber meets the road - here are the two points that
>>>> decide it for me.
>>>> 
>>>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
>>>> does, and it has for some time it would seem. That suggests it's not
>>>> catastrophic, and hasn't yet blocked folks from getting things done
>>>> with CloudStack.
>>>> 
>>>> 2. I see no one stepping up to do the work. I am not personally a fan
>>>> of issuing what is the effective equivalent of an 'unfunded mandate'.
>>>> The problem isn't just one of building a new SSVM - it's one of
>>>> testing it, and repeating all of the validation that has already been
>>>> done with the existing sysvm.
>>>> 
>>>> Perhaps there is a middle ground (we have a default sysvm, but perhaps
>>>> like we are doing with the IPv6-enabled sysvm we have a time-enabled
>>>> sysvm available for folks.
>>>> 
>>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
>>>> that we are seeing this problem, but with no one stepping up to do all
>>>> of the work, I'm not quite ready to hold a release hostage waiting to
>>>> find such a person.
>>>> 
>>>> --David
>>> 
> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
The latest 4.2 systemvms do have ntp built in. The earlier comment about
HVM is incorrect. It is PV (PVOPS, to be exact). With PVOPS Linux vms,
there is no sync between domU and dom0.

On 5/21/13 2:45 PM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>+1, it seems that it is no worse off then it ever has been, aside from
>the caveat that newer features are beginning to rely on it. I do agree
>though that it could perhaps be rolled into the newer system vm, as an
>option for people to use at their own risk.
>
>Of course, if someone wants to patch it up and get testing going, I'm
>all for that as well. I just don't see holding things up.
>
>On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com> wrote:
>> David,
>>
>> I am willing to do the work.  However, as I understand the
>>circumstances, a complete build process for the system VMs has not been
>>released.  If I am incorrect in my understanding, I will do the work
>>necessary to fix the problem.
>>
>> Thanks,
>> -John
>>
>> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
>>
>>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>>> <ch...@sungard.com> wrote:
>>>> All,
>>>>
>>>> As discussed on another thread [1], we identified a bug
>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>> are not configured to sync their time with either the host HV or an
>>>>NTP
>>>> service.  That bug affects the system VMs for all three primary HVs
>>>>(KVM,
>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>> KVM.  It appears that a correction for Xen would require the re-build
>>>>of
>>>> a system VM image and a full round of regression testing that image.
>>>>
>>>> Given that the discussion thread has not resulted in a consensus on
>>>>this
>>>> issue, I unfortunately believe that the only path forward is to call
>>>>for
>>>> a formal VOTE.
>>>>
>>>> Please respond with one of the following:
>>>>
>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>resolved
>>>> +0: don't care one way or the other
>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>CLOUDSTACK-2492 has been fully resolved
>>>>
>>>> -chip
>>>>
>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>
>>>
>>> So it appalls me that this problem exists. If I understand correctly,
>>> from folks who commercially support derivatives of ACS. Lack of time
>>> synchronization has been a factor in major outages, but that's
>>> typically been between the hypervisors and management servers.
>>> Regardless we realize (or should) that time is important for so many
>>> reasons (encryption, logs, and scores of other reasons)
>>>
>>> But when the rubber meets the road - here are the two points that
>>> decide it for me.
>>>
>>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
>>> does, and it has for some time it would seem. That suggests it's not
>>> catastrophic, and hasn't yet blocked folks from getting things done
>>> with CloudStack.
>>>
>>> 2. I see no one stepping up to do the work. I am not personally a fan
>>> of issuing what is the effective equivalent of an 'unfunded mandate'.
>>> The problem isn't just one of building a new SSVM - it's one of
>>> testing it, and repeating all of the validation that has already been
>>> done with the existing sysvm.
>>>
>>> Perhaps there is a middle ground (we have a default sysvm, but perhaps
>>> like we are doing with the IPv6-enabled sysvm we have a time-enabled
>>> sysvm available for folks.
>>>
>>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
>>> that we are seeing this problem, but with no one stepping up to do all
>>> of the work, I'm not quite ready to hold a release hostage waiting to
>>> find such a person.
>>>
>>> --David
>>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
+1, it seems that it is no worse off then it ever has been, aside from
the caveat that newer features are beginning to rely on it. I do agree
though that it could perhaps be rolled into the newer system vm, as an
option for people to use at their own risk.

Of course, if someone wants to patch it up and get testing going, I'm
all for that as well. I just don't see holding things up.

On Tue, May 21, 2013 at 3:33 PM, John Burwell <jb...@basho.com> wrote:
> David,
>
> I am willing to do the work.  However, as I understand the circumstances, a complete build process for the system VMs has not been released.  If I am incorrect in my understanding, I will do the work necessary to fix the problem.
>
> Thanks,
> -John
>
> On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:
>
>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>> <ch...@sungard.com> wrote:
>>> All,
>>>
>>> As discussed on another thread [1], we identified a bug
>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>> are not configured to sync their time with either the host HV or an NTP
>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>> KVM.  It appears that a correction for Xen would require the re-build of
>>> a system VM image and a full round of regression testing that image.
>>>
>>> Given that the discussion thread has not resulted in a consensus on this
>>> issue, I unfortunately believe that the only path forward is to call for
>>> a formal VOTE.
>>>
>>> Please respond with one of the following:
>>>
>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
>>> +0: don't care one way or the other
>>> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
>>>
>>> -chip
>>>
>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>
>>
>> So it appalls me that this problem exists. If I understand correctly,
>> from folks who commercially support derivatives of ACS. Lack of time
>> synchronization has been a factor in major outages, but that's
>> typically been between the hypervisors and management servers.
>> Regardless we realize (or should) that time is important for so many
>> reasons (encryption, logs, and scores of other reasons)
>>
>> But when the rubber meets the road - here are the two points that
>> decide it for me.
>>
>> 1. This is not a new problem. It's bad, it shouldn't exist, but it
>> does, and it has for some time it would seem. That suggests it's not
>> catastrophic, and hasn't yet blocked folks from getting things done
>> with CloudStack.
>>
>> 2. I see no one stepping up to do the work. I am not personally a fan
>> of issuing what is the effective equivalent of an 'unfunded mandate'.
>> The problem isn't just one of building a new SSVM - it's one of
>> testing it, and repeating all of the validation that has already been
>> done with the existing sysvm.
>>
>> Perhaps there is a middle ground (we have a default sysvm, but perhaps
>> like we are doing with the IPv6-enabled sysvm we have a time-enabled
>> sysvm available for folks.
>>
>> Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
>> that we are seeing this problem, but with no one stepping up to do all
>> of the work, I'm not quite ready to hold a release hostage waiting to
>> find such a person.
>>
>> --David
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
David,

I am willing to do the work.  However, as I understand the circumstances, a complete build process for the system VMs has not been released.  If I am incorrect in my understanding, I will do the work necessary to fix the problem.

Thanks,
-John

On May 21, 2013, at 5:29 PM, David Nalley <da...@gnsa.us> wrote:

> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> <ch...@sungard.com> wrote:
>> All,
>> 
>> As discussed on another thread [1], we identified a bug
>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>> are not configured to sync their time with either the host HV or an NTP
>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>> Xen and vSphere).  Patches have been committed addressing vSphere and
>> KVM.  It appears that a correction for Xen would require the re-build of
>> a system VM image and a full round of regression testing that image.
>> 
>> Given that the discussion thread has not resulted in a consensus on this
>> issue, I unfortunately believe that the only path forward is to call for
>> a formal VOTE.
>> 
>> Please respond with one of the following:
>> 
>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
>> +0: don't care one way or the other
>> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
>> 
>> -chip
>> 
>> [1] http://markmail.org/message/rw7vciq3r33biasb
> 
> 
> So it appalls me that this problem exists. If I understand correctly,
> from folks who commercially support derivatives of ACS. Lack of time
> synchronization has been a factor in major outages, but that's
> typically been between the hypervisors and management servers.
> Regardless we realize (or should) that time is important for so many
> reasons (encryption, logs, and scores of other reasons)
> 
> But when the rubber meets the road - here are the two points that
> decide it for me.
> 
> 1. This is not a new problem. It's bad, it shouldn't exist, but it
> does, and it has for some time it would seem. That suggests it's not
> catastrophic, and hasn't yet blocked folks from getting things done
> with CloudStack.
> 
> 2. I see no one stepping up to do the work. I am not personally a fan
> of issuing what is the effective equivalent of an 'unfunded mandate'.
> The problem isn't just one of building a new SSVM - it's one of
> testing it, and repeating all of the validation that has already been
> done with the existing sysvm.
> 
> Perhaps there is a middle ground (we have a default sysvm, but perhaps
> like we are doing with the IPv6-enabled sysvm we have a time-enabled
> sysvm available for folks.
> 
> Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
> that we are seeing this problem, but with no one stepping up to do all
> of the work, I'm not quite ready to hold a release hostage waiting to
> find such a person.
> 
> --David


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by David Nalley <da...@gnsa.us>.
On Mon, May 20, 2013 at 4:15 PM, Chip Childers
<ch...@sungard.com> wrote:
> All,
>
> As discussed on another thread [1], we identified a bug
> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> are not configured to sync their time with either the host HV or an NTP
> service.  That bug affects the system VMs for all three primary HVs (KVM,
> Xen and vSphere).  Patches have been committed addressing vSphere and
> KVM.  It appears that a correction for Xen would require the re-build of
> a system VM image and a full round of regression testing that image.
>
> Given that the discussion thread has not resulted in a consensus on this
> issue, I unfortunately believe that the only path forward is to call for
> a formal VOTE.
>
> Please respond with one of the following:
>
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until CLOUDSTACK-2492 has been fully resolved
>
> -chip
>
> [1] http://markmail.org/message/rw7vciq3r33biasb


So it appalls me that this problem exists. If I understand correctly,
from folks who commercially support derivatives of ACS. Lack of time
synchronization has been a factor in major outages, but that's
typically been between the hypervisors and management servers.
Regardless we realize (or should) that time is important for so many
reasons (encryption, logs, and scores of other reasons)

But when the rubber meets the road - here are the two points that
decide it for me.

1. This is not a new problem. It's bad, it shouldn't exist, but it
does, and it has for some time it would seem. That suggests it's not
catastrophic, and hasn't yet blocked folks from getting things done
with CloudStack.

2. I see no one stepping up to do the work. I am not personally a fan
of issuing what is the effective equivalent of an 'unfunded mandate'.
The problem isn't just one of building a new SSVM - it's one of
testing it, and repeating all of the validation that has already been
done with the existing sysvm.

Perhaps there is a middle ground (we have a default sysvm, but perhaps
like we are doing with the IPv6-enabled sysvm we have a time-enabled
sysvm available for folks.

Regardless - you called a vote, so I'll reluctantly cast a +1 - I hate
that we are seeing this problem, but with no one stepping up to do all
of the work, I'm not quite ready to hold a release hostage waiting to
find such a person.

--David

RE: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Animesh Chaturvedi <an...@citrix.com>.

> -----Original Message-----
> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> Sent: Wednesday, May 22, 2013 9:57 AM
> To: dev@cloudstack.apache.org
> Subject: Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for
> CLOUDSTACK-2492?
> 
> THere's literally hundreds of other features that work just fine on XCP/Xen.
> What you are complaining against is the nature of time-based releases. The
> drag from 4.1 is jeopardizing 4.2, 4.3, etc, making a mockery of our stated goal
> of 3 releases a year.
> 
> There's developers who had to pull out their features from 4.1 because it
> wasn't ready by 1/31. Nobody whined asking for a couple more weeks. And
> that was the right thing to do. Now we have developers racing to meet the
> 5/31 deadline for 4.2 and they are being dragged into the quagmire of 4.1,
> which is a perfectly fine release for 99% of the users out there.
> 
[Animesh>] I agree we have to draw the line at some point or we can keep tinkering to make it perfect and then some new issue will come up, we have maintenance release to iron out issues anyway that we should leverage.
> 
> On 5/21/13 7:39 PM, "Outback Dingo" <ou...@gmail.com> wrote:
> 
> >On Tue, May 21, 2013 at 10:17 PM, Chiradeep Vittal <
> >Chiradeep.Vittal@citrix.com> wrote:
> >
> >> Outback, it would be helpful to understand the harm you are facing
> >>without  this fix.
> >> Are you operating a CloudStack cloud already? Have you lost Vms/ lost
> >>data  / faced unexplained crashes, or found your cloud unavailable due
> >>to this?
> >> Note that this bug has been there since 2.2
> >>
> >>
> >It would break a current migration path to s3 storage capabilities
> >currently being rolled out for XEN based hypervisors as it was
> >mentioned in the thread. This negates our and others capabilities to be
> >inline with other Hypervisors, and having to wait until a fix/patch can
> >be applied. It also negates current infrastructure design for
> >commercial and private clouds based on XEN/XCP for a more robust
> >storage infrastructure then is currently capable.
> >
> >IMHO, aside from the technical details, your basically telling all XEN
> >infrastructure, too bad. no new s3 infrastructure for you, from my
> >perspective this is both bad practice, and again, leaves XEN/XCP users
> >wanting, and waiting again.....
> >
> >
> >> On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
> >>
> >> >On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> >> ><ch...@sungard.com>wrote:
> >> >
> >> >> All,
> >> >>
> >> >> As discussed on another thread [1], we identified a bug
> >> >> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
> >> >> VMs are not configured to sync their time with either the host HV
> >> >> or an
> >>NTP
> >> >> service.  That bug affects the system VMs for all three primary
> >> >>HVs (KVM,  Xen and vSphere).  Patches have been committed
> >> >>addressing vSphere and  KVM.  It appears that a correction for Xen
> >> >>would require the
> >>re-build of
> >> >> a system VM image and a full round of regression testing that image.
> >> >>
> >> >> Given that the discussion thread has not resulted in a consensus
> >> >> on
> >>this
> >> >> issue, I unfortunately believe that the only path forward is to
> >> >> call
> >>for
> >> >> a formal VOTE.
> >> >>
> >> >> Please respond with one of the following:
> >> >>
> >> >> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
> >> >> +being
> >> >> resolved
> >> >> +0: don't care one way or the other
> >> >> -1: do *not* proceed with any further 4.1 release candidates until
> >> >> CLOUDSTACK-2492 has been fully resolved
> >> >>
> >> >>
> >> >-1  do *not* proceed
> >> >
> >> >
> >> >> -chip
> >> >>
> >> >> [1] http://markmail.org/message/rw7vciq3r33biasb
> >> >>
> >>
> >>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
I understand the sentiment, but as far as new features are concerned, there
is some precedent for new features being "Xen only" or "KVM only" for a
release. That said, I've got about four 4.1 environments running, and have
been using April and May builds of the 4.2 template on all of them, and
haven't run into any bugs that haven't been patched and picked into 4.1. I
think there's a good chance that the barrier to getting it to pass QA is
relatively low, and we've already got a volunteer.

Does anyone have a running environment older than 4.1, other than a Dev
environment that gets rebuilt often, where we can verify that clock skew
has been the norm for the last few years worth of releases on Xen?
On May 21, 2013 8:39 PM, "Outback Dingo" <ou...@gmail.com> wrote:

> On Tue, May 21, 2013 at 10:17 PM, Chiradeep Vittal <
> Chiradeep.Vittal@citrix.com> wrote:
>
> > Outback, it would be helpful to understand the harm you are facing
> without
> > this fix.
> > Are you operating a CloudStack cloud already? Have you lost Vms/ lost
> data
> > / faced unexplained crashes, or found your cloud unavailable due to this?
> > Note that this bug has been there since 2.2
> >
> >
> It would break a current migration path to s3 storage capabilities
> currently being rolled out for XEN based hypervisors
> as it was mentioned in the thread. This negates our and others capabilities
> to be inline with other Hypervisors, and
> having to wait until a fix/patch can be applied. It also negates current
> infrastructure design for commercial
> and private clouds based on XEN/XCP for a more robust storage
> infrastructure then is currently capable.
>
> IMHO, aside from the technical details, your basically telling all XEN
> infrastructure, too bad. no new s3 infrastructure for you, from my
> perspective this is both bad practice, and again, leaves XEN/XCP users
> wanting, and waiting again.....
>
>
> > On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
> >
> > >On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> > ><ch...@sungard.com>wrote:
> > >
> > >> All,
> > >>
> > >> As discussed on another thread [1], we identified a bug
> > >> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> > >> are not configured to sync their time with either the host HV or an
> NTP
> > >> service.  That bug affects the system VMs for all three primary HVs
> > >>(KVM,
> > >> Xen and vSphere).  Patches have been committed addressing vSphere and
> > >> KVM.  It appears that a correction for Xen would require the re-build
> of
> > >> a system VM image and a full round of regression testing that image.
> > >>
> > >> Given that the discussion thread has not resulted in a consensus on
> this
> > >> issue, I unfortunately believe that the only path forward is to call
> for
> > >> a formal VOTE.
> > >>
> > >> Please respond with one of the following:
> > >>
> > >> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> > >> resolved
> > >> +0: don't care one way or the other
> > >> -1: do *not* proceed with any further 4.1 release candidates until
> > >> CLOUDSTACK-2492 has been fully resolved
> > >>
> > >>
> > >-1  do *not* proceed
> > >
> > >
> > >> -chip
> > >>
> > >> [1] http://markmail.org/message/rw7vciq3r33biasb
> > >>
> >
> >
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
And just for my own clarification, the only S3 functionality it jeopardizes
for 4.1 is the cross-zone template sync, correct? Not any S3 based primary
or secondary storage, right?
On May 21, 2013 8:39 PM, "Outback Dingo" <ou...@gmail.com> wrote:

> On Tue, May 21, 2013 at 10:17 PM, Chiradeep Vittal <
> Chiradeep.Vittal@citrix.com> wrote:
>
> > Outback, it would be helpful to understand the harm you are facing
> without
> > this fix.
> > Are you operating a CloudStack cloud already? Have you lost Vms/ lost
> data
> > / faced unexplained crashes, or found your cloud unavailable due to this?
> > Note that this bug has been there since 2.2
> >
> >
> It would break a current migration path to s3 storage capabilities
> currently being rolled out for XEN based hypervisors
> as it was mentioned in the thread. This negates our and others capabilities
> to be inline with other Hypervisors, and
> having to wait until a fix/patch can be applied. It also negates current
> infrastructure design for commercial
> and private clouds based on XEN/XCP for a more robust storage
> infrastructure then is currently capable.
>
> IMHO, aside from the technical details, your basically telling all XEN
> infrastructure, too bad. no new s3 infrastructure for you, from my
> perspective this is both bad practice, and again, leaves XEN/XCP users
> wanting, and waiting again.....
>
>
> > On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
> >
> > >On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> > ><ch...@sungard.com>wrote:
> > >
> > >> All,
> > >>
> > >> As discussed on another thread [1], we identified a bug
> > >> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> > >> are not configured to sync their time with either the host HV or an
> NTP
> > >> service.  That bug affects the system VMs for all three primary HVs
> > >>(KVM,
> > >> Xen and vSphere).  Patches have been committed addressing vSphere and
> > >> KVM.  It appears that a correction for Xen would require the re-build
> of
> > >> a system VM image and a full round of regression testing that image.
> > >>
> > >> Given that the discussion thread has not resulted in a consensus on
> this
> > >> issue, I unfortunately believe that the only path forward is to call
> for
> > >> a formal VOTE.
> > >>
> > >> Please respond with one of the following:
> > >>
> > >> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> > >> resolved
> > >> +0: don't care one way or the other
> > >> -1: do *not* proceed with any further 4.1 release candidates until
> > >> CLOUDSTACK-2492 has been fully resolved
> > >>
> > >>
> > >-1  do *not* proceed
> > >
> > >
> > >> -chip
> > >>
> > >> [1] http://markmail.org/message/rw7vciq3r33biasb
> > >>
> >
> >
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
THere's literally hundreds of other features that work just fine on
XCP/Xen. 
What you are complaining against is the nature of time-based releases. The
drag from 4.1 is jeopardizing 4.2, 4.3, etc, making a mockery of our
stated goal of 3 releases a year.

There's developers who had to pull out their features from 4.1 because it
wasn't ready by 1/31. Nobody whined asking for a couple more weeks. And
that was the right thing to do. Now we have developers racing to meet the
5/31 deadline for 4.2 and they are being dragged into the quagmire of 4.1,
which is a perfectly fine release for 99% of the users out there.


On 5/21/13 7:39 PM, "Outback Dingo" <ou...@gmail.com> wrote:

>On Tue, May 21, 2013 at 10:17 PM, Chiradeep Vittal <
>Chiradeep.Vittal@citrix.com> wrote:
>
>> Outback, it would be helpful to understand the harm you are facing
>>without
>> this fix.
>> Are you operating a CloudStack cloud already? Have you lost Vms/ lost
>>data
>> / faced unexplained crashes, or found your cloud unavailable due to
>>this?
>> Note that this bug has been there since 2.2
>>
>>
>It would break a current migration path to s3 storage capabilities
>currently being rolled out for XEN based hypervisors
>as it was mentioned in the thread. This negates our and others
>capabilities
>to be inline with other Hypervisors, and
>having to wait until a fix/patch can be applied. It also negates current
>infrastructure design for commercial
>and private clouds based on XEN/XCP for a more robust storage
>infrastructure then is currently capable.
>
>IMHO, aside from the technical details, your basically telling all XEN
>infrastructure, too bad. no new s3 infrastructure for you, from my
>perspective this is both bad practice, and again, leaves XEN/XCP users
>wanting, and waiting again.....
>
>
>> On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
>>
>> >On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>> ><ch...@sungard.com>wrote:
>> >
>> >> All,
>> >>
>> >> As discussed on another thread [1], we identified a bug
>> >> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>> >> are not configured to sync their time with either the host HV or an
>>NTP
>> >> service.  That bug affects the system VMs for all three primary HVs
>> >>(KVM,
>> >> Xen and vSphere).  Patches have been committed addressing vSphere and
>> >> KVM.  It appears that a correction for Xen would require the
>>re-build of
>> >> a system VM image and a full round of regression testing that image.
>> >>
>> >> Given that the discussion thread has not resulted in a consensus on
>>this
>> >> issue, I unfortunately believe that the only path forward is to call
>>for
>> >> a formal VOTE.
>> >>
>> >> Please respond with one of the following:
>> >>
>> >> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>> >> resolved
>> >> +0: don't care one way or the other
>> >> -1: do *not* proceed with any further 4.1 release candidates until
>> >> CLOUDSTACK-2492 has been fully resolved
>> >>
>> >>
>> >-1  do *not* proceed
>> >
>> >
>> >> -chip
>> >>
>> >> [1] http://markmail.org/message/rw7vciq3r33biasb
>> >>
>>
>>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Outback Dingo <ou...@gmail.com>.
On Tue, May 21, 2013 at 10:17 PM, Chiradeep Vittal <
Chiradeep.Vittal@citrix.com> wrote:

> Outback, it would be helpful to understand the harm you are facing without
> this fix.
> Are you operating a CloudStack cloud already? Have you lost Vms/ lost data
> / faced unexplained crashes, or found your cloud unavailable due to this?
> Note that this bug has been there since 2.2
>
>
It would break a current migration path to s3 storage capabilities
currently being rolled out for XEN based hypervisors
as it was mentioned in the thread. This negates our and others capabilities
to be inline with other Hypervisors, and
having to wait until a fix/patch can be applied. It also negates current
infrastructure design for commercial
and private clouds based on XEN/XCP for a more robust storage
infrastructure then is currently capable.

IMHO, aside from the technical details, your basically telling all XEN
infrastructure, too bad. no new s3 infrastructure for you, from my
perspective this is both bad practice, and again, leaves XEN/XCP users
wanting, and waiting again.....


> On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
>
> >On Mon, May 20, 2013 at 4:15 PM, Chip Childers
> ><ch...@sungard.com>wrote:
> >
> >> All,
> >>
> >> As discussed on another thread [1], we identified a bug
> >> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> >> are not configured to sync their time with either the host HV or an NTP
> >> service.  That bug affects the system VMs for all three primary HVs
> >>(KVM,
> >> Xen and vSphere).  Patches have been committed addressing vSphere and
> >> KVM.  It appears that a correction for Xen would require the re-build of
> >> a system VM image and a full round of regression testing that image.
> >>
> >> Given that the discussion thread has not resulted in a consensus on this
> >> issue, I unfortunately believe that the only path forward is to call for
> >> a formal VOTE.
> >>
> >> Please respond with one of the following:
> >>
> >> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> >> resolved
> >> +0: don't care one way or the other
> >> -1: do *not* proceed with any further 4.1 release candidates until
> >> CLOUDSTACK-2492 has been fully resolved
> >>
> >>
> >-1  do *not* proceed
> >
> >
> >> -chip
> >>
> >> [1] http://markmail.org/message/rw7vciq3r33biasb
> >>
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Chirpadeep,

Have clusters previous versions actually been checked for this issue
or are we stating that based on code review?  I can say that in
testing done earlier this year that the SSVM was syncing with the host
on devcloud because I would hit situations where I would hit S3 clock
sync issues.  The problem was remedied by syncing the host clock every
time.  Now, syncing the host clock has no effect on the SSVM.  It is
entirely possible that I got lucky or some other coincidence made it
appear to be functional, but I think we need to verify the assumption
that this issue is present in older releaes.

Thanks,
-John




On May 21, 2013, at 10:18 PM, Chiradeep Vittal
<Ch...@citrix.com> wrote:

> Outback, it would be helpful to understand the harm you are facing without
> this fix.
> Are you operating a CloudStack cloud already? Have you lost Vms/ lost data
> / faced unexplained crashes, or found your cloud unavailable due to this?
> Note that this bug has been there since 2.2
>
>
> On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:
>
>> On Mon, May 20, 2013 at 4:15 PM, Chip Childers
>> <ch...@sungard.com>wrote:
>>
>>> All,
>>>
>>> As discussed on another thread [1], we identified a bug
>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>> are not configured to sync their time with either the host HV or an NTP
>>> service.  That bug affects the system VMs for all three primary HVs
>>> (KVM,
>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>> KVM.  It appears that a correction for Xen would require the re-build of
>>> a system VM image and a full round of regression testing that image.
>>>
>>> Given that the discussion thread has not resulted in a consensus on this
>>> issue, I unfortunately believe that the only path forward is to call for
>>> a formal VOTE.
>>>
>>> Please respond with one of the following:
>>>
>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>> resolved
>>> +0: don't care one way or the other
>>> -1: do *not* proceed with any further 4.1 release candidates until
>>> CLOUDSTACK-2492 has been fully resolved
>> -1  do *not* proceed
>>
>>
>>> -chip
>>>
>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Outback, it would be helpful to understand the harm you are facing without
this fix.
Are you operating a CloudStack cloud already? Have you lost Vms/ lost data
/ faced unexplained crashes, or found your cloud unavailable due to this?
Note that this bug has been there since 2.2


On 5/21/13 5:59 PM, "Outback Dingo" <ou...@gmail.com> wrote:

>On Mon, May 20, 2013 at 4:15 PM, Chip Childers
><ch...@sungard.com>wrote:
>
>> All,
>>
>> As discussed on another thread [1], we identified a bug
>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>> are not configured to sync their time with either the host HV or an NTP
>> service.  That bug affects the system VMs for all three primary HVs
>>(KVM,
>> Xen and vSphere).  Patches have been committed addressing vSphere and
>> KVM.  It appears that a correction for Xen would require the re-build of
>> a system VM image and a full round of regression testing that image.
>>
>> Given that the discussion thread has not resulted in a consensus on this
>> issue, I unfortunately believe that the only path forward is to call for
>> a formal VOTE.
>>
>> Please respond with one of the following:
>>
>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>> resolved
>> +0: don't care one way or the other
>> -1: do *not* proceed with any further 4.1 release candidates until
>> CLOUDSTACK-2492 has been fully resolved
>>
>>
>-1  do *not* proceed
>
>
>> -chip
>>
>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Outback Dingo <ou...@gmail.com>.
On Mon, May 20, 2013 at 4:15 PM, Chip Childers <ch...@sungard.com>wrote:

> All,
>
> As discussed on another thread [1], we identified a bug
> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> are not configured to sync their time with either the host HV or an NTP
> service.  That bug affects the system VMs for all three primary HVs (KVM,
> Xen and vSphere).  Patches have been committed addressing vSphere and
> KVM.  It appears that a correction for Xen would require the re-build of
> a system VM image and a full round of regression testing that image.
>
> Given that the discussion thread has not resulted in a consensus on this
> issue, I unfortunately believe that the only path forward is to call for
> a formal VOTE.
>
> Please respond with one of the following:
>
> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> resolved
> +0: don't care one way or the other
> -1: do *not* proceed with any further 4.1 release candidates until
> CLOUDSTACK-2492 has been fully resolved
>
>
-1  do *not* proceed


> -chip
>
> [1] http://markmail.org/message/rw7vciq3r33biasb
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
No, they couldn't have set that since this flag is not available on Debian
2.6.32

On 5/20/13 5:26 PM, "John Burwell" <jb...@basho.com> wrote:

>Chip,
>
>Previous releases of CloudStack may have set
>/proc/sys/xen/independent_wallclock in the cloud-early-config script
>which will properly sync clock for paravirtualized VMs.  However, NTP is
>only solution that correct clock drift for both para and full virtualized
>VMs.  Admittedly, I haven't looked in the history to see if this strategy
>was previously employed.
>
>Thanks,
>-John
>
>
>On May 20, 2013, at 8:18 PM, Chip Childers <ch...@sungard.com>
>wrote:
>
>> On May 20, 2013, at 7:14 PM, John Burwell <jb...@basho.com> wrote:
>> 
>>> All,
>>> 
>>> While it is tough to do, I must cast a -1 for the following reasons:
>>> 
>>> Given that system VMs write files, this defect makes every file
>>>created/modified timestamp unreliable.
>>> Operational log correlation/debugging is nearly impossible since the
>>>clock is out of sync.
>>> It renders S3-backed Secondary Storage unreliable/useless
>>> 
>>> As Ahmad pointed out, there are likely other instabilities/defects
>>>lurking due to this issue that we haven't discovered.
>>> 
>>> I think we also need to determine whether or not this issue was
>>>introduced in 4.1.  If not, we should consider back porting these fixes.
>> 
>> It can't be this, because the system VM's for 4.1 are the exact same
>> images since 3.x releases.
>> 
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On May 20, 2013, at 5:29 PM, Ahmad Emneina <ae...@gmail.com> wrote:
>>> 
>>>> I'm +0 on this, dont want to hold up a release with a neg 1 vote. My
>>>> opinion is that time sync is critical piece for system vm's. Having
>>>>the
>>>> wrong time can lead to system vm's booting and waiting for manual
>>>> intervention via consistency checks (potential blocker bug IMO).
>>>> 
>>>> 
>>>> On Mon, May 20, 2013 at 2:03 PM, Chiradeep Vittal <
>>>> Chiradeep.Vittal@citrix.com> wrote:
>>>> 
>>>>> +1
>>>>> 
>>>>> On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com>
>>>>>wrote:
>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> As discussed on another thread [1], we identified a bug
>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
>>>>>>VMs
>>>>>> are not configured to sync their time with either the host HV or an
>>>>>>NTP
>>>>>> service.  That bug affects the system VMs for all three primary HVs
>>>>>>(KVM,
>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
>>>>>>and
>>>>>> KVM.  It appears that a correction for Xen would require the
>>>>>>re-build of
>>>>>> a system VM image and a full round of regression testing that image.
>>>>>> 
>>>>>> Given that the discussion thread has not resulted in a consensus on
>>>>>>this
>>>>>> issue, I unfortunately believe that the only path forward is to
>>>>>>call for
>>>>>> a formal VOTE.
>>>>>> 
>>>>>> Please respond with one of the following:
>>>>>> 
>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>>>>>being
>>>>>> resolved
>>>>>> +0: don't care one way or the other
>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>> 
>>>>>> -chip
>>>>>> 
>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>> 
>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Chip,

Previous releases of CloudStack may have set /proc/sys/xen/independent_wallclock in the cloud-early-config script which will properly sync clock for paravirtualized VMs.  However, NTP is only solution that correct clock drift for both para and full virtualized VMs.  Admittedly, I haven't looked in the history to see if this strategy was previously employed.

Thanks,
-John


On May 20, 2013, at 8:18 PM, Chip Childers <ch...@sungard.com> wrote:

> On May 20, 2013, at 7:14 PM, John Burwell <jb...@basho.com> wrote:
> 
>> All,
>> 
>> While it is tough to do, I must cast a -1 for the following reasons:
>> 
>> Given that system VMs write files, this defect makes every file created/modified timestamp unreliable.
>> Operational log correlation/debugging is nearly impossible since the clock is out of sync.
>> It renders S3-backed Secondary Storage unreliable/useless
>> 
>> As Ahmad pointed out, there are likely other instabilities/defects lurking due to this issue that we haven't discovered.
>> 
>> I think we also need to determine whether or not this issue was introduced in 4.1.  If not, we should consider back porting these fixes.
> 
> It can't be this, because the system VM's for 4.1 are the exact same
> images since 3.x releases.
> 
>> 
>> Thanks,
>> -John
>> 
>> On May 20, 2013, at 5:29 PM, Ahmad Emneina <ae...@gmail.com> wrote:
>> 
>>> I'm +0 on this, dont want to hold up a release with a neg 1 vote. My
>>> opinion is that time sync is critical piece for system vm's. Having the
>>> wrong time can lead to system vm's booting and waiting for manual
>>> intervention via consistency checks (potential blocker bug IMO).
>>> 
>>> 
>>> On Mon, May 20, 2013 at 2:03 PM, Chiradeep Vittal <
>>> Chiradeep.Vittal@citrix.com> wrote:
>>> 
>>>> +1
>>>> 
>>>> On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> As discussed on another thread [1], we identified a bug
>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>>> are not configured to sync their time with either the host HV or an NTP
>>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>>> a system VM image and a full round of regression testing that image.
>>>>> 
>>>>> Given that the discussion thread has not resulted in a consensus on this
>>>>> issue, I unfortunately believe that the only path forward is to call for
>>>>> a formal VOTE.
>>>>> 
>>>>> Please respond with one of the following:
>>>>> 
>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>> resolved
>>>>> +0: don't care one way or the other
>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>> 
>>>>> -chip
>>>>> 
>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On May 20, 2013, at 7:14 PM, John Burwell <jb...@basho.com> wrote:

> All,
>
> While it is tough to do, I must cast a -1 for the following reasons:
>
> Given that system VMs write files, this defect makes every file created/modified timestamp unreliable.
> Operational log correlation/debugging is nearly impossible since the clock is out of sync.
> It renders S3-backed Secondary Storage unreliable/useless
>
> As Ahmad pointed out, there are likely other instabilities/defects lurking due to this issue that we haven't discovered.
>
> I think we also need to determine whether or not this issue was introduced in 4.1.  If not, we should consider back porting these fixes.

It can't be this, because the system VM's for 4.1 are the exact same
images since 3.x releases.

>
> Thanks,
> -John
>
> On May 20, 2013, at 5:29 PM, Ahmad Emneina <ae...@gmail.com> wrote:
>
>> I'm +0 on this, dont want to hold up a release with a neg 1 vote. My
>> opinion is that time sync is critical piece for system vm's. Having the
>> wrong time can lead to system vm's booting and waiting for manual
>> intervention via consistency checks (potential blocker bug IMO).
>>
>>
>> On Mon, May 20, 2013 at 2:03 PM, Chiradeep Vittal <
>> Chiradeep.Vittal@citrix.com> wrote:
>>
>>> +1
>>>
>>> On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>>
>>>> All,
>>>>
>>>> As discussed on another thread [1], we identified a bug
>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>> are not configured to sync their time with either the host HV or an NTP
>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>> a system VM image and a full round of regression testing that image.
>>>>
>>>> Given that the discussion thread has not resulted in a consensus on this
>>>> issue, I unfortunately believe that the only path forward is to call for
>>>> a formal VOTE.
>>>>
>>>> Please respond with one of the following:
>>>>
>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>> resolved
>>>> +0: don't care one way or the other
>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>> CLOUDSTACK-2492 has been fully resolved
>>>>
>>>> -chip
>>>>
>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
All,

While it is tough to do, I must cast a -1 for the following reasons:

Given that system VMs write files, this defect makes every file created/modified timestamp unreliable.
Operational log correlation/debugging is nearly impossible since the clock is out of sync.
It renders S3-backed Secondary Storage unreliable/useless

As Ahmad pointed out, there are likely other instabilities/defects lurking due to this issue that we haven't discovered.

I think we also need to determine whether or not this issue was introduced in 4.1.  If not, we should consider back porting these fixes.

Thanks,
-John

On May 20, 2013, at 5:29 PM, Ahmad Emneina <ae...@gmail.com> wrote:

> I'm +0 on this, dont want to hold up a release with a neg 1 vote. My
> opinion is that time sync is critical piece for system vm's. Having the
> wrong time can lead to system vm's booting and waiting for manual
> intervention via consistency checks (potential blocker bug IMO).
> 
> 
> On Mon, May 20, 2013 at 2:03 PM, Chiradeep Vittal <
> Chiradeep.Vittal@citrix.com> wrote:
> 
>> +1
>> 
>> On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>> 
>>> All,
>>> 
>>> As discussed on another thread [1], we identified a bug
>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>> are not configured to sync their time with either the host HV or an NTP
>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>> KVM.  It appears that a correction for Xen would require the re-build of
>>> a system VM image and a full round of regression testing that image.
>>> 
>>> Given that the discussion thread has not resulted in a consensus on this
>>> issue, I unfortunately believe that the only path forward is to call for
>>> a formal VOTE.
>>> 
>>> Please respond with one of the following:
>>> 
>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>> resolved
>>> +0: don't care one way or the other
>>> -1: do *not* proceed with any further 4.1 release candidates until
>>> CLOUDSTACK-2492 has been fully resolved
>>> 
>>> -chip
>>> 
>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>> 
>> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Ahmad Emneina <ae...@gmail.com>.
I'm +0 on this, dont want to hold up a release with a neg 1 vote. My
opinion is that time sync is critical piece for system vm's. Having the
wrong time can lead to system vm's booting and waiting for manual
intervention via consistency checks (potential blocker bug IMO).


On Mon, May 20, 2013 at 2:03 PM, Chiradeep Vittal <
Chiradeep.Vittal@citrix.com> wrote:

> +1
>
> On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>
> >All,
> >
> >As discussed on another thread [1], we identified a bug
> >(CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
> >are not configured to sync their time with either the host HV or an NTP
> >service.  That bug affects the system VMs for all three primary HVs (KVM,
> >Xen and vSphere).  Patches have been committed addressing vSphere and
> >KVM.  It appears that a correction for Xen would require the re-build of
> >a system VM image and a full round of regression testing that image.
> >
> >Given that the discussion thread has not resulted in a consensus on this
> >issue, I unfortunately believe that the only path forward is to call for
> >a formal VOTE.
> >
> >Please respond with one of the following:
> >
> >+1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
> >resolved
> >+0: don't care one way or the other
> >-1: do *not* proceed with any further 4.1 release candidates until
> >CLOUDSTACK-2492 has been fully resolved
> >
> >-chip
> >
> >[1] http://markmail.org/message/rw7vciq3r33biasb
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
+1

On 5/20/13 1:15 PM, "Chip Childers" <ch...@sungard.com> wrote:

>All,
>
>As discussed on another thread [1], we identified a bug
>(CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>are not configured to sync their time with either the host HV or an NTP
>service.  That bug affects the system VMs for all three primary HVs (KVM,
>Xen and vSphere).  Patches have been committed addressing vSphere and
>KVM.  It appears that a correction for Xen would require the re-build of
>a system VM image and a full round of regression testing that image.
>
>Given that the discussion thread has not resulted in a consensus on this
>issue, I unfortunately believe that the only path forward is to call for
>a formal VOTE.
>
>Please respond with one of the following:
>
>+1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>resolved
>+0: don't care one way or the other
>-1: do *not* proceed with any further 4.1 release candidates until
>CLOUDSTACK-2492 has been fully resolved
>
>-chip
>
>[1] http://markmail.org/message/rw7vciq3r33biasb


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Francois Gaudreault <fg...@cloudops.com>.
Maybe I am wrong, but are we debating around a problem that can be fixed by
adding a simple package to the systemvm?


On Wed, May 22, 2013 at 1:01 PM, Chiradeep Vittal <
Chiradeep.Vittal@citrix.com> wrote:

> As the author of the original systemvm (and current contributor to the
> systemvm), I can confidently state that this issue has been there since
> 2.2.0.
> The issue is that the Debian 2.6.32 kernel is a PVOPS kernel. All PVOPs
> kernels require ntp to keep time sync.
> http://www.gossamer-threads.com/lists/xen/users/234750
>
> On 5/22/13 9:56 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
> >If this were creating a new bug, for example "oh, your VPCs won't work
> >anymore for this release", or "here's a new UI, but it's really buggy
> >and barely functional" then I'd agree with this train of thought.
> >Instead, we are saying "we recently found out that since 2.2.x
> >cloudstack has had this behavior, and it will be fixed in 4.2"*.
> >That's a totally different thing. If 4.1 ends up being a poor quality
> >release that everyone remembers compared to others, it's not going to
> >be because we didn't address something that has been around for
> >several releases, that nobody has noticed.
> >
> >* Assuming we verify that it's not a regression, which I'm still very
> >interested in knowing
> >
> >On Wed, May 22, 2013 at 9:51 AM, John Burwell <jb...@basho.com> wrote:
> >> Marcus,
> >>
> >> I would say that the only thing for an open source project worse than
> >>not releasing is releasing a poor quality release.  A late release with
> >>high quality is soon forgotten.  An on-time or late release with poor
> >>quality lingers in folks memory. The KDE project made the near fatal
> >>mistake of following the same logic when they release 4.0, and the
> >>reputation of KDE 4.x continues to suffer from it to this day.
> >>CloudStack is trusted to run at the core our user's operations.  In my
> >>view, if we err, we should err on the side of quality to avoid of
> >>erosion of that trust.  If we ever lost that trust, our new features
> >>would never be evaluated.
> >
> >>
> >> Thanks,
> >> -John
> >>
> >> On May 22, 2013, at 11:18 AM, Marcus Sorensen <sh...@gmail.com>
> >>wrote:
> >>
> >>> Thanks for the response. Time sync is certainly an issue, I think one
> >>> of the things we are trying to gauge is whether the system vm
> >>> functionality has been impacted by time sync such that anyone has
> >>> noticed or cared.  That's not to detract from the point that having
> >>> time sync is optimal, and affects a lot of things, but functionally,
> >>> back to my item #1, can we confirm that earlier versions have gotten
> >>> out of sync, and if so, do we have bug reports showing that it has
> >>> mattered?
> >>>
> >>>  To counter the argument, there are plenty of people looking for the
> >>> features in 4.1, that wouldn't choose cloudstack because it's not
> >>> released yet. Then there's the delay impact to 4.2, and keeping all of
> >>> those features out of the hands of people as well.
> >>>
> >>> For me, the fear is that we end up pushing 4.1 back to or near where
> >>> 4.2 would have been otherwise released, at which point we haven't
> >>> really accomplished anything but delayed the release of the working
> >>> features in 4.1.
> >>>
> >>>
> >>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com>
> >>>wrote:
> >>>> Marcus,
> >>>>
> >>>> For me, S3 integration and Xen feature parity are not the primary
> >>>>reasons that this defect should remain a blocker.  Time
> >>>>synchronization is a basic and essential assumption for systems such
> >>>>as CloudStack.  This defect yields file and log timestamps from
> >>>>secondary storage that are unreliable -- impacting customers in an
> >>>>accredited environment (e.g. SOX) or that rely on those timestamps for
> >>>>any downstream operations.  It also stands as a significant impediment
> >>>>to operational debugging.  Additionally, as others have pointed out,
> >>>>time drifts also impact encryption, and possibly handshake operations
> >>>>between the systems VMs and management server.  While I appreciate and
> >>>>fully support a time-based release cycle, there has to be a quality
> >>>>threshold for any release.  Looking at it from an operations
> >>>>perspective, failure to maintain time sync across components is
> >>>>unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a
> >>>>4.1.0 if the known issues list stated that the system VMs could not
> >>>>maintain time sync?", and, without hesitation, I would answer, "No.",
> >>>>and follow it up quickly, "Oh no, I hope the release I have in
> >>>>production doesn't have this problem."
> >>>>
> >>>> Thanks,
> >>>> -John
> >>>>
> >>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com>
> >>>>wrote:
> >>>>
> >>>>> I feel like we need to clarify what's at risk here. Not to disrespect
> >>>>> anyone's opinion, but I'm just not getting where this is being
> >>>>> considered a major feature.  I think the very idea of Xen not having
> >>>>> feature parity (regardless of the feature) is distasteful to a lot of
> >>>>> us, and it should be. But consider that we are already two months
> >>>>> behind on a four month release cycle, and it sounds like fixing this
> >>>>> could take a month (if no issues are found, two weeks to qual the new
> >>>>> template). We run a time-based release, not a feature-based release.
> >>>>> Not all features are expected to be fully functional to get out the
> >>>>> door. Isn't the correct option to just mark the feature experimental,
> >>>>> tell them to run the newer template at their risk if they want it?
> >>>>>
> >>>>> 1) We need to verify whether this bug has been around for a long
> >>>>>time,
> >>>>> because it will tell us how much it really matters and thus whether
> >>>>>or
> >>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
> >>>>> issues not related to new features.
> >>>>>
> >>>>> 2) We need to reiterate exactly what features are being affected. The
> >>>>> original e-mail lists 'S3 integration' as the only feature affected.
> >>>>> As far as I understand it, the actual feature impacted is a
> >>>>>'secondary
> >>>>> storage sync', if you have multiple zones, multiple secondary
> >>>>> storages, this backs up and handles the copying of templates, etc so
> >>>>> you don't have to manually register them everywhere.
> >>>>>
> >>>>> I appreciate John's work for getting that secondary storage sync
> >>>>> feature in place. I really wish we would have noticed the issue
> >>>>> earlier on, then we may not be having this discussion. That said, no
> >>>>> disrespect intended toward John, I'm having a hard time understanding
> >>>>> how this is a feature worth holding up the release. It's not a new
> >>>>> primary or secondary storage type integration, and it's not a feature
> >>>>> where the admin is helpless to do it themselves. If VPC doesn't work,
> >>>>> the admin can't do anything about it. If this sync doesn't work, the
> >>>>> admin writes a script that copies their stuff everywhere.
> >>>>>
> >>>>> Please, if anyone considers this a major feature worth blocking on,
> >>>>> explain to us why. Are you willing to push back release of all of the
> >>>>> other new features, and push back the 4.2 features, to have this one
> >>>>> feature in June, or whenever 4.1 gets out?
> >>>>>
> >>>>>
> >>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen
> >>>>><ru...@gmail.com> wrote:
> >>>>>> +1 on moving forward.
> >>>>>>
> >>>>>> On this issue and on the upgrade issue I have realized that we
> >>>>>>forgot about our time based release philosophy.
> >>>>>>
> >>>>>> There will always be bugs in the software. If we know them we can
> >>>>>>acknowledge them in release notes and get started quickly on the
> >>>>>>next releases.
> >>>>>>
> >>>>>> To keep it short, I am now of the opinion (and I know I am kind of
> >>>>>>switching mind here), that we should release 4.1 asap and start
> >>>>>>working on the bug fix versions right away.
> >>>>>>
> >>>>>> If we do release often, then folks stuck on a particular bug can
> >>>>>>expect a quick turn around and fix of their problems.
> >>>>>>
> >>>>>> -sebastien
> >>>>>>
> >>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins
> >>>>>><ma...@citrix.com> wrote:
> >>>>>>
> >>>>>>> -1 on this.
> >>>>>>>
> >>>>>>> New features really should be across the board for the
> >>>>>>>Hypervisors. Part
> >>>>>>> of the thing that distinguishes ACS is it's support across Xen /
> >>>>>>>VMware /
> >>>>>>> KVM. Do we really want to start getting in the habit of pushing
> >>>>>>>forward
> >>>>>>> new features that are not across the fully functional hypervisors?
> >>>>>>>
> >>>>>>> I agree with Outback this also will start to affect the Xen/XCP
> >>>>>>>community
> >>>>>>> by basically setting them apart and out on what a lot of people
> >>>>>>>see as a
> >>>>>>> major feature.
> >>>>>>>
> >>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is
> >>>>>>>not
> >>>>>>> fully functional and not a major feature-set right now, I would be
> >>>>>>>+1 on
> >>>>>>> this.
> >>>>>>>
> >>>>>>> MHO
> >>>>>>> Matt
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com>
> >>>>>>>wrote:
> >>>>>>>
> >>>>>>>> All,
> >>>>>>>>
> >>>>>>>> As discussed on another thread [1], we identified a bug
> >>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
> >>>>>>>>VMs
> >>>>>>>> are not configured to sync their time with either the host HV or
> >>>>>>>>an NTP
> >>>>>>>> service.  That bug affects the system VMs for all three primary
> >>>>>>>>HVs (KVM,
> >>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
> >>>>>>>>and
> >>>>>>>> KVM.  It appears that a correction for Xen would require the
> >>>>>>>>re-build of
> >>>>>>>> a system VM image and a full round of regression testing that
> >>>>>>>>image.
> >>>>>>>>
> >>>>>>>> Given that the discussion thread has not resulted in a consensus
> >>>>>>>>on this
> >>>>>>>> issue, I unfortunately believe that the only path forward is to
> >>>>>>>>call for
> >>>>>>>> a formal VOTE.
> >>>>>>>>
> >>>>>>>> Please respond with one of the following:
> >>>>>>>>
> >>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
> >>>>>>>>being
> >>>>>>>> resolved
> >>>>>>>> +0: don't care one way or the other
> >>>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
> >>>>>>>> CLOUDSTACK-2492 has been fully resolved
> >>>>>>>>
> >>>>>>>> -chip
> >>>>>>>>
> >>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
> >>>>>>>
> >>>>>>
> >>>>
> >>
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Ahmad Emneina <ae...@gmail.com>.
I'm also reneging on my 0 vote, and move it to a +1 to release.


On Wed, May 22, 2013 at 11:44 AM, Mathias Mullins <
mathias.mullins@citrix.com> wrote:

> Reading through this and seeing Chiradeep's passionate comments here and
> reasoning. I'Ll go from -1 to +1 on this. Especially since we will see
> this in 4.2.
>
> I still think it's a bruise, but hopefully will not be a black eye.
>
> Matt
>
>
> On 5/22/13 10:24 AM, "Chip Childers" <ch...@sungard.com> wrote:
>
> >On Wed, May 22, 2013 at 1:16 PM, Chiradeep Vittal
> ><Ch...@citrix.com> wrote:
> >> For those wanting to use the S3-sync for XCP/Xen, they could use the 4.2
> >> template. Just like the IPV6 feature, that could be deemed experimental.
> >
> >I'm now in agreement with this.  The "root issue" is now solved for
> >KVM and VMware users in 4.1.  We have also resolved the root cause in
> >the new system VM templates.  We can provide instructions for using
> >the new system VM templates within the documentation prior to
> >releasing 4.1, and highlight that required step if using S3.
> >
> >I'm +1 on moving forward (although I started the vote, I figured I
> >should also cast a vote on this one).
>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Mathias Mullins <ma...@citrix.com>.
Reading through this and seeing Chiradeep's passionate comments here and
reasoning. I'Ll go from -1 to +1 on this. Especially since we will see
this in 4.2. 

I still think it's a bruise, but hopefully will not be a black eye.

Matt 


On 5/22/13 10:24 AM, "Chip Childers" <ch...@sungard.com> wrote:

>On Wed, May 22, 2013 at 1:16 PM, Chiradeep Vittal
><Ch...@citrix.com> wrote:
>> For those wanting to use the S3-sync for XCP/Xen, they could use the 4.2
>> template. Just like the IPV6 feature, that could be deemed experimental.
>
>I'm now in agreement with this.  The "root issue" is now solved for
>KVM and VMware users in 4.1.  We have also resolved the root cause in
>the new system VM templates.  We can provide instructions for using
>the new system VM templates within the documentation prior to
>releasing 4.1, and highlight that required step if using S3.
>
>I'm +1 on moving forward (although I started the vote, I figured I
>should also cast a vote on this one).


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chip Childers <ch...@sungard.com>.
On Wed, May 22, 2013 at 1:16 PM, Chiradeep Vittal
<Ch...@citrix.com> wrote:
> For those wanting to use the S3-sync for XCP/Xen, they could use the 4.2
> template. Just like the IPV6 feature, that could be deemed experimental.

I'm now in agreement with this.  The "root issue" is now solved for
KVM and VMware users in 4.1.  We have also resolved the root cause in
the new system VM templates.  We can provide instructions for using
the new system VM templates within the documentation prior to
releasing 4.1, and highlight that required step if using S3.

I'm +1 on moving forward (although I started the vote, I figured I
should also cast a vote on this one).

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
For those wanting to use the S3-sync for XCP/Xen, they could use the 4.2
template. Just like the IPV6 feature, that could be deemed experimental.

On 5/22/13 10:01 AM, "Chiradeep Vittal" <Ch...@citrix.com>
wrote:

>As the author of the original systemvm (and current contributor to the
>systemvm), I can confidently state that this issue has been there since
>2.2.0. 
>The issue is that the Debian 2.6.32 kernel is a PVOPS kernel. All PVOPs
>kernels require ntp to keep time sync.
>http://www.gossamer-threads.com/lists/xen/users/234750
>
>On 5/22/13 9:56 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:
>
>>If this were creating a new bug, for example "oh, your VPCs won't work
>>anymore for this release", or "here's a new UI, but it's really buggy
>>and barely functional" then I'd agree with this train of thought.
>>Instead, we are saying "we recently found out that since 2.2.x
>>cloudstack has had this behavior, and it will be fixed in 4.2"*.
>>That's a totally different thing. If 4.1 ends up being a poor quality
>>release that everyone remembers compared to others, it's not going to
>>be because we didn't address something that has been around for
>>several releases, that nobody has noticed.
>>
>>* Assuming we verify that it's not a regression, which I'm still very
>>interested in knowing
>>
>>On Wed, May 22, 2013 at 9:51 AM, John Burwell <jb...@basho.com> wrote:
>>> Marcus,
>>>
>>> I would say that the only thing for an open source project worse than
>>>not releasing is releasing a poor quality release.  A late release with
>>>high quality is soon forgotten.  An on-time or late release with poor
>>>quality lingers in folks memory. The KDE project made the near fatal
>>>mistake of following the same logic when they release 4.0, and the
>>>reputation of KDE 4.x continues to suffer from it to this day.
>>>CloudStack is trusted to run at the core our user's operations.  In my
>>>view, if we err, we should err on the side of quality to avoid of
>>>erosion of that trust.  If we ever lost that trust, our new features
>>>would never be evaluated.
>>
>>>
>>> Thanks,
>>> -John
>>>
>>> On May 22, 2013, at 11:18 AM, Marcus Sorensen <sh...@gmail.com>
>>>wrote:
>>>
>>>> Thanks for the response. Time sync is certainly an issue, I think one
>>>> of the things we are trying to gauge is whether the system vm
>>>> functionality has been impacted by time sync such that anyone has
>>>> noticed or cared.  That's not to detract from the point that having
>>>> time sync is optimal, and affects a lot of things, but functionally,
>>>> back to my item #1, can we confirm that earlier versions have gotten
>>>> out of sync, and if so, do we have bug reports showing that it has
>>>> mattered?
>>>>
>>>>  To counter the argument, there are plenty of people looking for the
>>>> features in 4.1, that wouldn't choose cloudstack because it's not
>>>> released yet. Then there's the delay impact to 4.2, and keeping all of
>>>> those features out of the hands of people as well.
>>>>
>>>> For me, the fear is that we end up pushing 4.1 back to or near where
>>>> 4.2 would have been otherwise released, at which point we haven't
>>>> really accomplished anything but delayed the release of the working
>>>> features in 4.1.
>>>>
>>>>
>>>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com>
>>>>wrote:
>>>>> Marcus,
>>>>>
>>>>> For me, S3 integration and Xen feature parity are not the primary
>>>>>reasons that this defect should remain a blocker.  Time
>>>>>synchronization is a basic and essential assumption for systems such
>>>>>as CloudStack.  This defect yields file and log timestamps from
>>>>>secondary storage that are unreliable -- impacting customers in an
>>>>>accredited environment (e.g. SOX) or that rely on those timestamps for
>>>>>any downstream operations.  It also stands as a significant impediment
>>>>>to operational debugging.  Additionally, as others have pointed out,
>>>>>time drifts also impact encryption, and possibly handshake operations
>>>>>between the systems VMs and management server.  While I appreciate and
>>>>>fully support a time-based release cycle, there has to be a quality
>>>>>threshold for any release.  Looking at it from an operations
>>>>>perspective, failure to maintain time sync across components is
>>>>>unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a
>>>>>4.1.0 if the known issues list stated that the system VMs could not
>>>>>maintain time sync?", and, without hesitation, I would answer, "No.",
>>>>>and follow it up quickly, "Oh no, I hope the release I have in
>>>>>production doesn't have this problem."
>>>>>
>>>>> Thanks,
>>>>> -John
>>>>>
>>>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> I feel like we need to clarify what's at risk here. Not to
>>>>>>disrespect
>>>>>> anyone's opinion, but I'm just not getting where this is being
>>>>>> considered a major feature.  I think the very idea of Xen not having
>>>>>> feature parity (regardless of the feature) is distasteful to a lot
>>>>>>of
>>>>>> us, and it should be. But consider that we are already two months
>>>>>> behind on a four month release cycle, and it sounds like fixing this
>>>>>> could take a month (if no issues are found, two weeks to qual the
>>>>>>new
>>>>>> template). We run a time-based release, not a feature-based release.
>>>>>> Not all features are expected to be fully functional to get out the
>>>>>> door. Isn't the correct option to just mark the feature
>>>>>>experimental,
>>>>>> tell them to run the newer template at their risk if they want it?
>>>>>>
>>>>>> 1) We need to verify whether this bug has been around for a long
>>>>>>time,
>>>>>> because it will tell us how much it really matters and thus whether
>>>>>>or
>>>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>>>>> issues not related to new features.
>>>>>>
>>>>>> 2) We need to reiterate exactly what features are being affected.
>>>>>>The
>>>>>> original e-mail lists 'S3 integration' as the only feature affected.
>>>>>> As far as I understand it, the actual feature impacted is a
>>>>>>'secondary
>>>>>> storage sync', if you have multiple zones, multiple secondary
>>>>>> storages, this backs up and handles the copying of templates, etc so
>>>>>> you don't have to manually register them everywhere.
>>>>>>
>>>>>> I appreciate John's work for getting that secondary storage sync
>>>>>> feature in place. I really wish we would have noticed the issue
>>>>>> earlier on, then we may not be having this discussion. That said, no
>>>>>> disrespect intended toward John, I'm having a hard time
>>>>>>understanding
>>>>>> how this is a feature worth holding up the release. It's not a new
>>>>>> primary or secondary storage type integration, and it's not a
>>>>>>feature
>>>>>> where the admin is helpless to do it themselves. If VPC doesn't
>>>>>>work,
>>>>>> the admin can't do anything about it. If this sync doesn't work, the
>>>>>> admin writes a script that copies their stuff everywhere.
>>>>>>
>>>>>> Please, if anyone considers this a major feature worth blocking on,
>>>>>> explain to us why. Are you willing to push back release of all of
>>>>>>the
>>>>>> other new features, and push back the 4.2 features, to have this one
>>>>>> feature in June, or whenever 4.1 gets out?
>>>>>>
>>>>>>
>>>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen
>>>>>><ru...@gmail.com> wrote:
>>>>>>> +1 on moving forward.
>>>>>>>
>>>>>>> On this issue and on the upgrade issue I have realized that we
>>>>>>>forgot about our time based release philosophy.
>>>>>>>
>>>>>>> There will always be bugs in the software. If we know them we can
>>>>>>>acknowledge them in release notes and get started quickly on the
>>>>>>>next releases.
>>>>>>>
>>>>>>> To keep it short, I am now of the opinion (and I know I am kind of
>>>>>>>switching mind here), that we should release 4.1 asap and start
>>>>>>>working on the bug fix versions right away.
>>>>>>>
>>>>>>> If we do release often, then folks stuck on a particular bug can
>>>>>>>expect a quick turn around and fix of their problems.
>>>>>>>
>>>>>>> -sebastien
>>>>>>>
>>>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins
>>>>>>><ma...@citrix.com> wrote:
>>>>>>>
>>>>>>>> -1 on this.
>>>>>>>>
>>>>>>>> New features really should be across the board for the
>>>>>>>>Hypervisors. Part
>>>>>>>> of the thing that distinguishes ACS is it's support across Xen /
>>>>>>>>VMware /
>>>>>>>> KVM. Do we really want to start getting in the habit of pushing
>>>>>>>>forward
>>>>>>>> new features that are not across the fully functional hypervisors?
>>>>>>>>
>>>>>>>> I agree with Outback this also will start to affect the Xen/XCP
>>>>>>>>community
>>>>>>>> by basically setting them apart and out on what a lot of people
>>>>>>>>see as a
>>>>>>>> major feature.
>>>>>>>>
>>>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is
>>>>>>>>not
>>>>>>>> fully functional and not a major feature-set right now, I would be
>>>>>>>>+1 on
>>>>>>>> this.
>>>>>>>>
>>>>>>>> MHO
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
>>>>>>>>>VMs
>>>>>>>>> are not configured to sync their time with either the host HV or
>>>>>>>>>an NTP
>>>>>>>>> service.  That bug affects the system VMs for all three primary
>>>>>>>>>HVs (KVM,
>>>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
>>>>>>>>>and
>>>>>>>>> KVM.  It appears that a correction for Xen would require the
>>>>>>>>>re-build of
>>>>>>>>> a system VM image and a full round of regression testing that
>>>>>>>>>image.
>>>>>>>>>
>>>>>>>>> Given that the discussion thread has not resulted in a consensus
>>>>>>>>>on this
>>>>>>>>> issue, I unfortunately believe that the only path forward is to
>>>>>>>>>call for
>>>>>>>>> a formal VOTE.
>>>>>>>>>
>>>>>>>>> Please respond with one of the following:
>>>>>>>>>
>>>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>>>>>>>>being
>>>>>>>>> resolved
>>>>>>>>> +0: don't care one way or the other
>>>>>>>>> -1: do *not* proceed with any further 4.1 release candidates
>>>>>>>>>until
>>>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>>>
>>>>>>>>> -chip
>>>>>>>>>
>>>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>>>
>>>>>>>
>>>>>
>>>
>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Chiradeep Vittal <Ch...@citrix.com>.
As the author of the original systemvm (and current contributor to the
systemvm), I can confidently state that this issue has been there since
2.2.0. 
The issue is that the Debian 2.6.32 kernel is a PVOPS kernel. All PVOPs
kernels require ntp to keep time sync.
http://www.gossamer-threads.com/lists/xen/users/234750

On 5/22/13 9:56 AM, "Marcus Sorensen" <sh...@gmail.com> wrote:

>If this were creating a new bug, for example "oh, your VPCs won't work
>anymore for this release", or "here's a new UI, but it's really buggy
>and barely functional" then I'd agree with this train of thought.
>Instead, we are saying "we recently found out that since 2.2.x
>cloudstack has had this behavior, and it will be fixed in 4.2"*.
>That's a totally different thing. If 4.1 ends up being a poor quality
>release that everyone remembers compared to others, it's not going to
>be because we didn't address something that has been around for
>several releases, that nobody has noticed.
>
>* Assuming we verify that it's not a regression, which I'm still very
>interested in knowing
>
>On Wed, May 22, 2013 at 9:51 AM, John Burwell <jb...@basho.com> wrote:
>> Marcus,
>>
>> I would say that the only thing for an open source project worse than
>>not releasing is releasing a poor quality release.  A late release with
>>high quality is soon forgotten.  An on-time or late release with poor
>>quality lingers in folks memory. The KDE project made the near fatal
>>mistake of following the same logic when they release 4.0, and the
>>reputation of KDE 4.x continues to suffer from it to this day.
>>CloudStack is trusted to run at the core our user's operations.  In my
>>view, if we err, we should err on the side of quality to avoid of
>>erosion of that trust.  If we ever lost that trust, our new features
>>would never be evaluated.
>
>>
>> Thanks,
>> -John
>>
>> On May 22, 2013, at 11:18 AM, Marcus Sorensen <sh...@gmail.com>
>>wrote:
>>
>>> Thanks for the response. Time sync is certainly an issue, I think one
>>> of the things we are trying to gauge is whether the system vm
>>> functionality has been impacted by time sync such that anyone has
>>> noticed or cared.  That's not to detract from the point that having
>>> time sync is optimal, and affects a lot of things, but functionally,
>>> back to my item #1, can we confirm that earlier versions have gotten
>>> out of sync, and if so, do we have bug reports showing that it has
>>> mattered?
>>>
>>>  To counter the argument, there are plenty of people looking for the
>>> features in 4.1, that wouldn't choose cloudstack because it's not
>>> released yet. Then there's the delay impact to 4.2, and keeping all of
>>> those features out of the hands of people as well.
>>>
>>> For me, the fear is that we end up pushing 4.1 back to or near where
>>> 4.2 would have been otherwise released, at which point we haven't
>>> really accomplished anything but delayed the release of the working
>>> features in 4.1.
>>>
>>>
>>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com>
>>>wrote:
>>>> Marcus,
>>>>
>>>> For me, S3 integration and Xen feature parity are not the primary
>>>>reasons that this defect should remain a blocker.  Time
>>>>synchronization is a basic and essential assumption for systems such
>>>>as CloudStack.  This defect yields file and log timestamps from
>>>>secondary storage that are unreliable -- impacting customers in an
>>>>accredited environment (e.g. SOX) or that rely on those timestamps for
>>>>any downstream operations.  It also stands as a significant impediment
>>>>to operational debugging.  Additionally, as others have pointed out,
>>>>time drifts also impact encryption, and possibly handshake operations
>>>>between the systems VMs and management server.  While I appreciate and
>>>>fully support a time-based release cycle, there has to be a quality
>>>>threshold for any release.  Looking at it from an operations
>>>>perspective, failure to maintain time sync across components is
>>>>unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a
>>>>4.1.0 if the known issues list stated that the system VMs could not
>>>>maintain time sync?", and, without hesitation, I would answer, "No.",
>>>>and follow it up quickly, "Oh no, I hope the release I have in
>>>>production doesn't have this problem."
>>>>
>>>> Thanks,
>>>> -John
>>>>
>>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com>
>>>>wrote:
>>>>
>>>>> I feel like we need to clarify what's at risk here. Not to disrespect
>>>>> anyone's opinion, but I'm just not getting where this is being
>>>>> considered a major feature.  I think the very idea of Xen not having
>>>>> feature parity (regardless of the feature) is distasteful to a lot of
>>>>> us, and it should be. But consider that we are already two months
>>>>> behind on a four month release cycle, and it sounds like fixing this
>>>>> could take a month (if no issues are found, two weeks to qual the new
>>>>> template). We run a time-based release, not a feature-based release.
>>>>> Not all features are expected to be fully functional to get out the
>>>>> door. Isn't the correct option to just mark the feature experimental,
>>>>> tell them to run the newer template at their risk if they want it?
>>>>>
>>>>> 1) We need to verify whether this bug has been around for a long
>>>>>time,
>>>>> because it will tell us how much it really matters and thus whether
>>>>>or
>>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>>>> issues not related to new features.
>>>>>
>>>>> 2) We need to reiterate exactly what features are being affected. The
>>>>> original e-mail lists 'S3 integration' as the only feature affected.
>>>>> As far as I understand it, the actual feature impacted is a
>>>>>'secondary
>>>>> storage sync', if you have multiple zones, multiple secondary
>>>>> storages, this backs up and handles the copying of templates, etc so
>>>>> you don't have to manually register them everywhere.
>>>>>
>>>>> I appreciate John's work for getting that secondary storage sync
>>>>> feature in place. I really wish we would have noticed the issue
>>>>> earlier on, then we may not be having this discussion. That said, no
>>>>> disrespect intended toward John, I'm having a hard time understanding
>>>>> how this is a feature worth holding up the release. It's not a new
>>>>> primary or secondary storage type integration, and it's not a feature
>>>>> where the admin is helpless to do it themselves. If VPC doesn't work,
>>>>> the admin can't do anything about it. If this sync doesn't work, the
>>>>> admin writes a script that copies their stuff everywhere.
>>>>>
>>>>> Please, if anyone considers this a major feature worth blocking on,
>>>>> explain to us why. Are you willing to push back release of all of the
>>>>> other new features, and push back the 4.2 features, to have this one
>>>>> feature in June, or whenever 4.1 gets out?
>>>>>
>>>>>
>>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen
>>>>><ru...@gmail.com> wrote:
>>>>>> +1 on moving forward.
>>>>>>
>>>>>> On this issue and on the upgrade issue I have realized that we
>>>>>>forgot about our time based release philosophy.
>>>>>>
>>>>>> There will always be bugs in the software. If we know them we can
>>>>>>acknowledge them in release notes and get started quickly on the
>>>>>>next releases.
>>>>>>
>>>>>> To keep it short, I am now of the opinion (and I know I am kind of
>>>>>>switching mind here), that we should release 4.1 asap and start
>>>>>>working on the bug fix versions right away.
>>>>>>
>>>>>> If we do release often, then folks stuck on a particular bug can
>>>>>>expect a quick turn around and fix of their problems.
>>>>>>
>>>>>> -sebastien
>>>>>>
>>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins
>>>>>><ma...@citrix.com> wrote:
>>>>>>
>>>>>>> -1 on this.
>>>>>>>
>>>>>>> New features really should be across the board for the
>>>>>>>Hypervisors. Part
>>>>>>> of the thing that distinguishes ACS is it's support across Xen /
>>>>>>>VMware /
>>>>>>> KVM. Do we really want to start getting in the habit of pushing
>>>>>>>forward
>>>>>>> new features that are not across the fully functional hypervisors?
>>>>>>>
>>>>>>> I agree with Outback this also will start to affect the Xen/XCP
>>>>>>>community
>>>>>>> by basically setting them apart and out on what a lot of people
>>>>>>>see as a
>>>>>>> major feature.
>>>>>>>
>>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is
>>>>>>>not
>>>>>>> fully functional and not a major feature-set right now, I would be
>>>>>>>+1 on
>>>>>>> this.
>>>>>>>
>>>>>>> MHO
>>>>>>> Matt
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com>
>>>>>>>wrote:
>>>>>>>
>>>>>>>> All,
>>>>>>>>
>>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
>>>>>>>>VMs
>>>>>>>> are not configured to sync their time with either the host HV or
>>>>>>>>an NTP
>>>>>>>> service.  That bug affects the system VMs for all three primary
>>>>>>>>HVs (KVM,
>>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
>>>>>>>>and
>>>>>>>> KVM.  It appears that a correction for Xen would require the
>>>>>>>>re-build of
>>>>>>>> a system VM image and a full round of regression testing that
>>>>>>>>image.
>>>>>>>>
>>>>>>>> Given that the discussion thread has not resulted in a consensus
>>>>>>>>on this
>>>>>>>> issue, I unfortunately believe that the only path forward is to
>>>>>>>>call for
>>>>>>>> a formal VOTE.
>>>>>>>>
>>>>>>>> Please respond with one of the following:
>>>>>>>>
>>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>>>>>>>being
>>>>>>>> resolved
>>>>>>>> +0: don't care one way or the other
>>>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>>
>>>>>>>> -chip
>>>>>>>>
>>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>>
>>>>>>
>>>>
>>


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
If this were creating a new bug, for example "oh, your VPCs won't work
anymore for this release", or "here's a new UI, but it's really buggy
and barely functional" then I'd agree with this train of thought.
Instead, we are saying "we recently found out that since 2.2.x
cloudstack has had this behavior, and it will be fixed in 4.2"*.
That's a totally different thing. If 4.1 ends up being a poor quality
release that everyone remembers compared to others, it's not going to
be because we didn't address something that has been around for
several releases, that nobody has noticed.

* Assuming we verify that it's not a regression, which I'm still very
interested in knowing

On Wed, May 22, 2013 at 9:51 AM, John Burwell <jb...@basho.com> wrote:
> Marcus,
>
> I would say that the only thing for an open source project worse than not releasing is releasing a poor quality release.  A late release with high quality is soon forgotten.  An on-time or late release with poor quality lingers in folks memory. The KDE project made the near fatal mistake of following the same logic when they release 4.0, and the reputation of KDE 4.x continues to suffer from it to this day.  CloudStack is trusted to run at the core our user's operations.  In my view, if we err, we should err on the side of quality to avoid of erosion of that trust.  If we ever lost that trust, our new features would never be evaluated.

>
> Thanks,
> -John
>
> On May 22, 2013, at 11:18 AM, Marcus Sorensen <sh...@gmail.com> wrote:
>
>> Thanks for the response. Time sync is certainly an issue, I think one
>> of the things we are trying to gauge is whether the system vm
>> functionality has been impacted by time sync such that anyone has
>> noticed or cared.  That's not to detract from the point that having
>> time sync is optimal, and affects a lot of things, but functionally,
>> back to my item #1, can we confirm that earlier versions have gotten
>> out of sync, and if so, do we have bug reports showing that it has
>> mattered?
>>
>>  To counter the argument, there are plenty of people looking for the
>> features in 4.1, that wouldn't choose cloudstack because it's not
>> released yet. Then there's the delay impact to 4.2, and keeping all of
>> those features out of the hands of people as well.
>>
>> For me, the fear is that we end up pushing 4.1 back to or near where
>> 4.2 would have been otherwise released, at which point we haven't
>> really accomplished anything but delayed the release of the working
>> features in 4.1.
>>
>>
>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com> wrote:
>>> Marcus,
>>>
>>> For me, S3 integration and Xen feature parity are not the primary reasons that this defect should remain a blocker.  Time synchronization is a basic and essential assumption for systems such as CloudStack.  This defect yields file and log timestamps from secondary storage that are unreliable -- impacting customers in an accredited environment (e.g. SOX) or that rely on those timestamps for any downstream operations.  It also stands as a significant impediment to operational debugging.  Additionally, as others have pointed out, time drifts also impact encryption, and possibly handshake operations between the systems VMs and management server.  While I appreciate and fully support a time-based release cycle, there has to be a quality threshold for any release.  Looking at it from an operations perspective, failure to maintain time sync across components is unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a 4.1.0 if the known issues list stated that the system VMs could not maintain time sync?", and, without hesitation, I would answer, "No.", and follow it up quickly, "Oh no, I hope the release I have in production doesn't have this problem."
>>>
>>> Thanks,
>>> -John
>>>
>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com> wrote:
>>>
>>>> I feel like we need to clarify what's at risk here. Not to disrespect
>>>> anyone's opinion, but I'm just not getting where this is being
>>>> considered a major feature.  I think the very idea of Xen not having
>>>> feature parity (regardless of the feature) is distasteful to a lot of
>>>> us, and it should be. But consider that we are already two months
>>>> behind on a four month release cycle, and it sounds like fixing this
>>>> could take a month (if no issues are found, two weeks to qual the new
>>>> template). We run a time-based release, not a feature-based release.
>>>> Not all features are expected to be fully functional to get out the
>>>> door. Isn't the correct option to just mark the feature experimental,
>>>> tell them to run the newer template at their risk if they want it?
>>>>
>>>> 1) We need to verify whether this bug has been around for a long time,
>>>> because it will tell us how much it really matters and thus whether or
>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>>> issues not related to new features.
>>>>
>>>> 2) We need to reiterate exactly what features are being affected. The
>>>> original e-mail lists 'S3 integration' as the only feature affected.
>>>> As far as I understand it, the actual feature impacted is a 'secondary
>>>> storage sync', if you have multiple zones, multiple secondary
>>>> storages, this backs up and handles the copying of templates, etc so
>>>> you don't have to manually register them everywhere.
>>>>
>>>> I appreciate John's work for getting that secondary storage sync
>>>> feature in place. I really wish we would have noticed the issue
>>>> earlier on, then we may not be having this discussion. That said, no
>>>> disrespect intended toward John, I'm having a hard time understanding
>>>> how this is a feature worth holding up the release. It's not a new
>>>> primary or secondary storage type integration, and it's not a feature
>>>> where the admin is helpless to do it themselves. If VPC doesn't work,
>>>> the admin can't do anything about it. If this sync doesn't work, the
>>>> admin writes a script that copies their stuff everywhere.
>>>>
>>>> Please, if anyone considers this a major feature worth blocking on,
>>>> explain to us why. Are you willing to push back release of all of the
>>>> other new features, and push back the 4.2 features, to have this one
>>>> feature in June, or whenever 4.1 gets out?
>>>>
>>>>
>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <ru...@gmail.com> wrote:
>>>>> +1 on moving forward.
>>>>>
>>>>> On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.
>>>>>
>>>>> There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.
>>>>>
>>>>> To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.
>>>>>
>>>>> If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems.
>>>>>
>>>>> -sebastien
>>>>>
>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:
>>>>>
>>>>>> -1 on this.
>>>>>>
>>>>>> New features really should be across the board for the Hypervisors. Part
>>>>>> of the thing that distinguishes ACS is it's support across Xen / VMware /
>>>>>> KVM. Do we really want to start getting in the habit of pushing forward
>>>>>> new features that are not across the fully functional hypervisors?
>>>>>>
>>>>>> I agree with Outback this also will start to affect the Xen/XCP community
>>>>>> by basically setting them apart and out on what a lot of people see as a
>>>>>> major feature.
>>>>>>
>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is not
>>>>>> fully functional and not a major feature-set right now, I would be +1 on
>>>>>> this.
>>>>>>
>>>>>> MHO
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>>>>> are not configured to sync their time with either the host HV or an NTP
>>>>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>>>>> a system VM image and a full round of regression testing that image.
>>>>>>>
>>>>>>> Given that the discussion thread has not resulted in a consensus on this
>>>>>>> issue, I unfortunately believe that the only path forward is to call for
>>>>>>> a formal VOTE.
>>>>>>>
>>>>>>> Please respond with one of the following:
>>>>>>>
>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>>>> resolved
>>>>>>> +0: don't care one way or the other
>>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>
>>>>>>> -chip
>>>>>>>
>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>
>>>>>
>>>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Joe,

I just realized a dropped a sentence out by accident.  I meant to add that this issue does not approach KDE 4.0 (they had literally 100s of issues), but they got caught up in the desire to get something out for world to see, and ended up harming their reputation.  I am merely warning against getting on that slippery slope.

I apologize for the inadvertent hyperbole,
-John

On May 22, 2013, at 12:11 PM, Joe Brockmeier <jz...@zonker.net> wrote:

> On Wed, May 22, 2013, at 10:51 AM, John Burwell wrote:
>> I would say that the only thing for an open source project worse than not
>> releasing is releasing a poor quality release.  A late release with high
>> quality is soon forgotten.  An on-time or late release with poor quality
>> lingers in folks memory. The KDE project made the near fatal mistake of
>> following the same logic when they release 4.0, and the reputation of KDE
>> 4.x continues to suffer from it to this day.  CloudStack is trusted to
>> run at the core our user's operations.  In my view, if we err, we should
>> err on the side of quality to avoid of erosion of that trust.  If we ever
>> lost that trust, our new features would never be evaluated. 
> 
> I'm not sure this issue approaches KDE 4.0 levels, but otherwise +1.
> (Note - the KDE folks are *very* touchy about 4.0 *still* being held up
> as a high-water mark of poor judgement in releases, which is in and of
> itself a cautionary tale for releasing something that's not ready...) 
> 
> Why are users waiting for us to officially release instead of grabbing
> artifacts from Jenkins? In large part, they're waiting for the project
> to "bless" the quality of the release by saying it's ready. Time-based
> releases are supposed to be a way of ensuring that we don't hold up
> releases indefinitely because of missing features - but I don't think
> that extends to knowingly releasing something that is a pretty serious
> bug. 
> 
> Best,
> 
> jzb
> -- 
> Joe Brockmeier
> jzb@zonker.net
> Twitter: @jzb
> http://www.dissociatedpress.net/


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
I agree with everything being said, assuming it's a critical bug.
Otherwise, the quality argument doesn't hold up, because it's
otherwise a blanket defense for anything.

I'm just not seeing evidence that anyone running a cloudstack cloud
for the last two years has cared or noticed that the time can get off
sync in system vms, or that it has implaced functionality at all. Do
we have bug reports, or is this one feature the sole thing that has
made anyone notice? A bug that doesn't cause harm isn't critical, and
in my perspective only impacts quality at an academic level.


On Wed, May 22, 2013 at 10:23 AM, Outback Dingo <ou...@gmail.com> wrote:
> On Wed, May 22, 2013 at 12:11 PM, Joe Brockmeier <jz...@zonker.net> wrote:
>
>> On Wed, May 22, 2013, at 10:51 AM, John Burwell wrote:
>> > I would say that the only thing for an open source project worse than not
>> > releasing is releasing a poor quality release.  A late release with high
>> > quality is soon forgotten.  An on-time or late release with poor quality
>> > lingers in folks memory. The KDE project made the near fatal mistake of
>> > following the same logic when they release 4.0, and the reputation of KDE
>> > 4.x continues to suffer from it to this day.  CloudStack is trusted to
>> > run at the core our user's operations.  In my view, if we err, we should
>> > err on the side of quality to avoid of erosion of that trust.  If we ever
>> > lost that trust, our new features would never be evaluated.
>>
>> I'm not sure this issue approaches KDE 4.0 levels, but otherwise +1.
>> (Note - the KDE folks are *very* touchy about 4.0 *still* being held up
>> as a high-water mark of poor judgement in releases, which is in and of
>> itself a cautionary tale for releasing something that's not ready...)
>>
>> Why are users waiting for us to officially release instead of grabbing
>> artifacts from Jenkins? In large part, they're waiting for the project
>> to "bless" the quality of the release by saying it's ready. Time-based
>> releases are supposed to be a way of ensuring that we don't hold up
>> releases indefinitely because of missing features - but I don't think
>> that extends to knowingly releasing something that is a pretty serious
>> bug.
>>
>>
> The quality of software and its new feature sets, if supported should all
> fall in parity with supported platforms
> The fact that 1) this is a critical bug, 2) it affects the entire XEN/XCP
> base, 3) has been known and not resolved
>
> While being at a senior level management position running an R&D team, I
> would always tell the CTO/CEO
> If its not fully baked and QA's its not ready to come out of the oven. Push
> the date. Id rather see CS as a whole
> remain in feature parity and crush this last critical bug then push out a
> release, and discourage any XEN/XCP
> environments from looking at moving forward with the software stack as a
> whole. I wouldnt do it to clients, I feel
> we shouldnt do it to our users. resolve the problem, and QA it, then move
> forward, dont bandaid it, dont neglect it
> if others are so gung ho to user 4.1 before its released they can build
> from source, there are options for moving
> forward, leaving this stone unturned i feel would be detrimental to the
> good reputation Cs had enjoyed. I usually
> say much, until I feel strongly about an issue. But I ask, have we even
> really assessed what it will take to fix,
> instead of just throwing it to the wolves to vote on? will it take a week,
> to resolve and test. If we cant answer
> this question, then we shouldnt even be having the voting discussion, let
> alone how longs it been a "known"
> issue, regardless of who noticed or who it affected, the fact is someone
> noticed it, otherwise there woulnt be a bug report on it. so we just
> answered logically who noticed, someone did, whos it affect, well obviously
> it did affect
> someone. fix it, qa it, release and moved forward before we get to far down
> the road and its harder to resolv.
>
>
>
>> Best,
>>
>> jzb
>> --
>> Joe Brockmeier
>> jzb@zonker.net
>> Twitter: @jzb
>> http://www.dissociatedpress.net/
>>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Outback Dingo <ou...@gmail.com>.
On Wed, May 22, 2013 at 12:11 PM, Joe Brockmeier <jz...@zonker.net> wrote:

> On Wed, May 22, 2013, at 10:51 AM, John Burwell wrote:
> > I would say that the only thing for an open source project worse than not
> > releasing is releasing a poor quality release.  A late release with high
> > quality is soon forgotten.  An on-time or late release with poor quality
> > lingers in folks memory. The KDE project made the near fatal mistake of
> > following the same logic when they release 4.0, and the reputation of KDE
> > 4.x continues to suffer from it to this day.  CloudStack is trusted to
> > run at the core our user's operations.  In my view, if we err, we should
> > err on the side of quality to avoid of erosion of that trust.  If we ever
> > lost that trust, our new features would never be evaluated.
>
> I'm not sure this issue approaches KDE 4.0 levels, but otherwise +1.
> (Note - the KDE folks are *very* touchy about 4.0 *still* being held up
> as a high-water mark of poor judgement in releases, which is in and of
> itself a cautionary tale for releasing something that's not ready...)
>
> Why are users waiting for us to officially release instead of grabbing
> artifacts from Jenkins? In large part, they're waiting for the project
> to "bless" the quality of the release by saying it's ready. Time-based
> releases are supposed to be a way of ensuring that we don't hold up
> releases indefinitely because of missing features - but I don't think
> that extends to knowingly releasing something that is a pretty serious
> bug.
>
>
The quality of software and its new feature sets, if supported should all
fall in parity with supported platforms
The fact that 1) this is a critical bug, 2) it affects the entire XEN/XCP
base, 3) has been known and not resolved

While being at a senior level management position running an R&D team, I
would always tell the CTO/CEO
If its not fully baked and QA's its not ready to come out of the oven. Push
the date. Id rather see CS as a whole
remain in feature parity and crush this last critical bug then push out a
release, and discourage any XEN/XCP
environments from looking at moving forward with the software stack as a
whole. I wouldnt do it to clients, I feel
we shouldnt do it to our users. resolve the problem, and QA it, then move
forward, dont bandaid it, dont neglect it
if others are so gung ho to user 4.1 before its released they can build
from source, there are options for moving
forward, leaving this stone unturned i feel would be detrimental to the
good reputation Cs had enjoyed. I usually
say much, until I feel strongly about an issue. But I ask, have we even
really assessed what it will take to fix,
instead of just throwing it to the wolves to vote on? will it take a week,
to resolve and test. If we cant answer
this question, then we shouldnt even be having the voting discussion, let
alone how longs it been a "known"
issue, regardless of who noticed or who it affected, the fact is someone
noticed it, otherwise there woulnt be a bug report on it. so we just
answered logically who noticed, someone did, whos it affect, well obviously
it did affect
someone. fix it, qa it, release and moved forward before we get to far down
the road and its harder to resolv.



> Best,
>
> jzb
> --
> Joe Brockmeier
> jzb@zonker.net
> Twitter: @jzb
> http://www.dissociatedpress.net/
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Joe Brockmeier <jz...@zonker.net>.
On Wed, May 22, 2013, at 10:51 AM, John Burwell wrote:
> I would say that the only thing for an open source project worse than not
> releasing is releasing a poor quality release.  A late release with high
> quality is soon forgotten.  An on-time or late release with poor quality
> lingers in folks memory. The KDE project made the near fatal mistake of
> following the same logic when they release 4.0, and the reputation of KDE
> 4.x continues to suffer from it to this day.  CloudStack is trusted to
> run at the core our user's operations.  In my view, if we err, we should
> err on the side of quality to avoid of erosion of that trust.  If we ever
> lost that trust, our new features would never be evaluated. 

I'm not sure this issue approaches KDE 4.0 levels, but otherwise +1.
(Note - the KDE folks are *very* touchy about 4.0 *still* being held up
as a high-water mark of poor judgement in releases, which is in and of
itself a cautionary tale for releasing something that's not ready...) 

Why are users waiting for us to officially release instead of grabbing
artifacts from Jenkins? In large part, they're waiting for the project
to "bless" the quality of the release by saying it's ready. Time-based
releases are supposed to be a way of ensuring that we don't hold up
releases indefinitely because of missing features - but I don't think
that extends to knowingly releasing something that is a pretty serious
bug. 

Best,

jzb
-- 
Joe Brockmeier
jzb@zonker.net
Twitter: @jzb
http://www.dissociatedpress.net/

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Marcus,

I would say that the only thing for an open source project worse than not releasing is releasing a poor quality release.  A late release with high quality is soon forgotten.  An on-time or late release with poor quality lingers in folks memory. The KDE project made the near fatal mistake of following the same logic when they release 4.0, and the reputation of KDE 4.x continues to suffer from it to this day.  CloudStack is trusted to run at the core our user's operations.  In my view, if we err, we should err on the side of quality to avoid of erosion of that trust.  If we ever lost that trust, our new features would never be evaluated. 

Thanks,
-John

On May 22, 2013, at 11:18 AM, Marcus Sorensen <sh...@gmail.com> wrote:

> Thanks for the response. Time sync is certainly an issue, I think one
> of the things we are trying to gauge is whether the system vm
> functionality has been impacted by time sync such that anyone has
> noticed or cared.  That's not to detract from the point that having
> time sync is optimal, and affects a lot of things, but functionally,
> back to my item #1, can we confirm that earlier versions have gotten
> out of sync, and if so, do we have bug reports showing that it has
> mattered?
> 
>  To counter the argument, there are plenty of people looking for the
> features in 4.1, that wouldn't choose cloudstack because it's not
> released yet. Then there's the delay impact to 4.2, and keeping all of
> those features out of the hands of people as well.
> 
> For me, the fear is that we end up pushing 4.1 back to or near where
> 4.2 would have been otherwise released, at which point we haven't
> really accomplished anything but delayed the release of the working
> features in 4.1.
> 
> 
> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com> wrote:
>> Marcus,
>> 
>> For me, S3 integration and Xen feature parity are not the primary reasons that this defect should remain a blocker.  Time synchronization is a basic and essential assumption for systems such as CloudStack.  This defect yields file and log timestamps from secondary storage that are unreliable -- impacting customers in an accredited environment (e.g. SOX) or that rely on those timestamps for any downstream operations.  It also stands as a significant impediment to operational debugging.  Additionally, as others have pointed out, time drifts also impact encryption, and possibly handshake operations between the systems VMs and management server.  While I appreciate and fully support a time-based release cycle, there has to be a quality threshold for any release.  Looking at it from an operations perspective, failure to maintain time sync across components is unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a 4.1.0 if the known issues list stated that the system VMs could not maintain time sync?", and, without hesitation, I would answer, "No.", and follow it up quickly, "Oh no, I hope the release I have in production doesn't have this problem."
>> 
>> Thanks,
>> -John
>> 
>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com> wrote:
>> 
>>> I feel like we need to clarify what's at risk here. Not to disrespect
>>> anyone's opinion, but I'm just not getting where this is being
>>> considered a major feature.  I think the very idea of Xen not having
>>> feature parity (regardless of the feature) is distasteful to a lot of
>>> us, and it should be. But consider that we are already two months
>>> behind on a four month release cycle, and it sounds like fixing this
>>> could take a month (if no issues are found, two weeks to qual the new
>>> template). We run a time-based release, not a feature-based release.
>>> Not all features are expected to be fully functional to get out the
>>> door. Isn't the correct option to just mark the feature experimental,
>>> tell them to run the newer template at their risk if they want it?
>>> 
>>> 1) We need to verify whether this bug has been around for a long time,
>>> because it will tell us how much it really matters and thus whether or
>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>> issues not related to new features.
>>> 
>>> 2) We need to reiterate exactly what features are being affected. The
>>> original e-mail lists 'S3 integration' as the only feature affected.
>>> As far as I understand it, the actual feature impacted is a 'secondary
>>> storage sync', if you have multiple zones, multiple secondary
>>> storages, this backs up and handles the copying of templates, etc so
>>> you don't have to manually register them everywhere.
>>> 
>>> I appreciate John's work for getting that secondary storage sync
>>> feature in place. I really wish we would have noticed the issue
>>> earlier on, then we may not be having this discussion. That said, no
>>> disrespect intended toward John, I'm having a hard time understanding
>>> how this is a feature worth holding up the release. It's not a new
>>> primary or secondary storage type integration, and it's not a feature
>>> where the admin is helpless to do it themselves. If VPC doesn't work,
>>> the admin can't do anything about it. If this sync doesn't work, the
>>> admin writes a script that copies their stuff everywhere.
>>> 
>>> Please, if anyone considers this a major feature worth blocking on,
>>> explain to us why. Are you willing to push back release of all of the
>>> other new features, and push back the 4.2 features, to have this one
>>> feature in June, or whenever 4.1 gets out?
>>> 
>>> 
>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <ru...@gmail.com> wrote:
>>>> +1 on moving forward.
>>>> 
>>>> On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.
>>>> 
>>>> There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.
>>>> 
>>>> To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.
>>>> 
>>>> If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems.
>>>> 
>>>> -sebastien
>>>> 
>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:
>>>> 
>>>>> -1 on this.
>>>>> 
>>>>> New features really should be across the board for the Hypervisors. Part
>>>>> of the thing that distinguishes ACS is it's support across Xen / VMware /
>>>>> KVM. Do we really want to start getting in the habit of pushing forward
>>>>> new features that are not across the fully functional hypervisors?
>>>>> 
>>>>> I agree with Outback this also will start to affect the Xen/XCP community
>>>>> by basically setting them apart and out on what a lot of people see as a
>>>>> major feature.
>>>>> 
>>>>> I think it sets a really bad precedent. If it was Hyper-V which is not
>>>>> fully functional and not a major feature-set right now, I would be +1 on
>>>>> this.
>>>>> 
>>>>> MHO
>>>>> Matt
>>>>> 
>>>>> 
>>>>> 
>>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> As discussed on another thread [1], we identified a bug
>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>>>> are not configured to sync their time with either the host HV or an NTP
>>>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>>>> a system VM image and a full round of regression testing that image.
>>>>>> 
>>>>>> Given that the discussion thread has not resulted in a consensus on this
>>>>>> issue, I unfortunately believe that the only path forward is to call for
>>>>>> a formal VOTE.
>>>>>> 
>>>>>> Please respond with one of the following:
>>>>>> 
>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>>> resolved
>>>>>> +0: don't care one way or the other
>>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>> 
>>>>>> -chip
>>>>>> 
>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>> 
>>>> 
>> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
Thanks for the response. Time sync is certainly an issue, I think one
of the things we are trying to gauge is whether the system vm
functionality has been impacted by time sync such that anyone has
noticed or cared.  That's not to detract from the point that having
time sync is optimal, and affects a lot of things, but functionally,
back to my item #1, can we confirm that earlier versions have gotten
out of sync, and if so, do we have bug reports showing that it has
mattered?

  To counter the argument, there are plenty of people looking for the
features in 4.1, that wouldn't choose cloudstack because it's not
released yet. Then there's the delay impact to 4.2, and keeping all of
those features out of the hands of people as well.

 For me, the fear is that we end up pushing 4.1 back to or near where
4.2 would have been otherwise released, at which point we haven't
really accomplished anything but delayed the release of the working
features in 4.1.


On Wed, May 22, 2013 at 9:09 AM, John Burwell <jb...@basho.com> wrote:
> Marcus,
>
> For me, S3 integration and Xen feature parity are not the primary reasons that this defect should remain a blocker.  Time synchronization is a basic and essential assumption for systems such as CloudStack.  This defect yields file and log timestamps from secondary storage that are unreliable -- impacting customers in an accredited environment (e.g. SOX) or that rely on those timestamps for any downstream operations.  It also stands as a significant impediment to operational debugging.  Additionally, as others have pointed out, time drifts also impact encryption, and possibly handshake operations between the systems VMs and management server.  While I appreciate and fully support a time-based release cycle, there has to be a quality threshold for any release.  Looking at it from an operations perspective, failure to maintain time sync across components is unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a 4.1.0 if the known issues list stated that the system VMs could not maintain time sync?", and, without hesitation, I would answer, "No.", and follow it up quickly, "Oh no, I hope the release I have in production doesn't have this problem."
>
> Thanks,
> -John
>
> On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com> wrote:
>
>> I feel like we need to clarify what's at risk here. Not to disrespect
>> anyone's opinion, but I'm just not getting where this is being
>> considered a major feature.  I think the very idea of Xen not having
>> feature parity (regardless of the feature) is distasteful to a lot of
>> us, and it should be. But consider that we are already two months
>> behind on a four month release cycle, and it sounds like fixing this
>> could take a month (if no issues are found, two weeks to qual the new
>> template). We run a time-based release, not a feature-based release.
>> Not all features are expected to be fully functional to get out the
>> door. Isn't the correct option to just mark the feature experimental,
>> tell them to run the newer template at their risk if they want it?
>>
>> 1) We need to verify whether this bug has been around for a long time,
>> because it will tell us how much it really matters and thus whether or
>> not it's a blocker. This addresses the 'timestamp of logs" and other
>> issues not related to new features.
>>
>> 2) We need to reiterate exactly what features are being affected. The
>> original e-mail lists 'S3 integration' as the only feature affected.
>> As far as I understand it, the actual feature impacted is a 'secondary
>> storage sync', if you have multiple zones, multiple secondary
>> storages, this backs up and handles the copying of templates, etc so
>> you don't have to manually register them everywhere.
>>
>> I appreciate John's work for getting that secondary storage sync
>> feature in place. I really wish we would have noticed the issue
>> earlier on, then we may not be having this discussion. That said, no
>> disrespect intended toward John, I'm having a hard time understanding
>> how this is a feature worth holding up the release. It's not a new
>> primary or secondary storage type integration, and it's not a feature
>> where the admin is helpless to do it themselves. If VPC doesn't work,
>> the admin can't do anything about it. If this sync doesn't work, the
>> admin writes a script that copies their stuff everywhere.
>>
>> Please, if anyone considers this a major feature worth blocking on,
>> explain to us why. Are you willing to push back release of all of the
>> other new features, and push back the 4.2 features, to have this one
>> feature in June, or whenever 4.1 gets out?
>>
>>
>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <ru...@gmail.com> wrote:
>>> +1 on moving forward.
>>>
>>> On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.
>>>
>>> There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.
>>>
>>> To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.
>>>
>>> If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems.
>>>
>>> -sebastien
>>>
>>> On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:
>>>
>>>> -1 on this.
>>>>
>>>> New features really should be across the board for the Hypervisors. Part
>>>> of the thing that distinguishes ACS is it's support across Xen / VMware /
>>>> KVM. Do we really want to start getting in the habit of pushing forward
>>>> new features that are not across the fully functional hypervisors?
>>>>
>>>> I agree with Outback this also will start to affect the Xen/XCP community
>>>> by basically setting them apart and out on what a lot of people see as a
>>>> major feature.
>>>>
>>>> I think it sets a really bad precedent. If it was Hyper-V which is not
>>>> fully functional and not a major feature-set right now, I would be +1 on
>>>> this.
>>>>
>>>> MHO
>>>> Matt
>>>>
>>>>
>>>>
>>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>>>
>>>>> All,
>>>>>
>>>>> As discussed on another thread [1], we identified a bug
>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>>> are not configured to sync their time with either the host HV or an NTP
>>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>>> a system VM image and a full round of regression testing that image.
>>>>>
>>>>> Given that the discussion thread has not resulted in a consensus on this
>>>>> issue, I unfortunately believe that the only path forward is to call for
>>>>> a formal VOTE.
>>>>>
>>>>> Please respond with one of the following:
>>>>>
>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>>> resolved
>>>>> +0: don't care one way or the other
>>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>
>>>>> -chip
>>>>>
>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>
>>>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by John Burwell <jb...@basho.com>.
Marcus,

For me, S3 integration and Xen feature parity are not the primary reasons that this defect should remain a blocker.  Time synchronization is a basic and essential assumption for systems such as CloudStack.  This defect yields file and log timestamps from secondary storage that are unreliable -- impacting customers in an accredited environment (e.g. SOX) or that rely on those timestamps for any downstream operations.  It also stands as a significant impediment to operational debugging.  Additionally, as others have pointed out, time drifts also impact encryption, and possibly handshake operations between the systems VMs and management server.  While I appreciate and fully support a time-based release cycle, there has to be a quality threshold for any release.  Looking at it from an operations perspective, failure to maintain time sync across components is unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a 4.1.0 if the known issues list stated that the system VMs could not maintain time sync?", and, without hesitation, I would answer, "No.", and follow it up quickly, "Oh no, I hope the release I have in production doesn't have this problem."

Thanks,
-John

On May 22, 2013, at 10:35 AM, Marcus Sorensen <sh...@gmail.com> wrote:

> I feel like we need to clarify what's at risk here. Not to disrespect
> anyone's opinion, but I'm just not getting where this is being
> considered a major feature.  I think the very idea of Xen not having
> feature parity (regardless of the feature) is distasteful to a lot of
> us, and it should be. But consider that we are already two months
> behind on a four month release cycle, and it sounds like fixing this
> could take a month (if no issues are found, two weeks to qual the new
> template). We run a time-based release, not a feature-based release.
> Not all features are expected to be fully functional to get out the
> door. Isn't the correct option to just mark the feature experimental,
> tell them to run the newer template at their risk if they want it?
> 
> 1) We need to verify whether this bug has been around for a long time,
> because it will tell us how much it really matters and thus whether or
> not it's a blocker. This addresses the 'timestamp of logs" and other
> issues not related to new features.
> 
> 2) We need to reiterate exactly what features are being affected. The
> original e-mail lists 'S3 integration' as the only feature affected.
> As far as I understand it, the actual feature impacted is a 'secondary
> storage sync', if you have multiple zones, multiple secondary
> storages, this backs up and handles the copying of templates, etc so
> you don't have to manually register them everywhere.
> 
> I appreciate John's work for getting that secondary storage sync
> feature in place. I really wish we would have noticed the issue
> earlier on, then we may not be having this discussion. That said, no
> disrespect intended toward John, I'm having a hard time understanding
> how this is a feature worth holding up the release. It's not a new
> primary or secondary storage type integration, and it's not a feature
> where the admin is helpless to do it themselves. If VPC doesn't work,
> the admin can't do anything about it. If this sync doesn't work, the
> admin writes a script that copies their stuff everywhere.
> 
> Please, if anyone considers this a major feature worth blocking on,
> explain to us why. Are you willing to push back release of all of the
> other new features, and push back the 4.2 features, to have this one
> feature in June, or whenever 4.1 gets out?
> 
> 
> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <ru...@gmail.com> wrote:
>> +1 on moving forward.
>> 
>> On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.
>> 
>> There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.
>> 
>> To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.
>> 
>> If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems.
>> 
>> -sebastien
>> 
>> On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:
>> 
>>> -1 on this.
>>> 
>>> New features really should be across the board for the Hypervisors. Part
>>> of the thing that distinguishes ACS is it's support across Xen / VMware /
>>> KVM. Do we really want to start getting in the habit of pushing forward
>>> new features that are not across the fully functional hypervisors?
>>> 
>>> I agree with Outback this also will start to affect the Xen/XCP community
>>> by basically setting them apart and out on what a lot of people see as a
>>> major feature.
>>> 
>>> I think it sets a really bad precedent. If it was Hyper-V which is not
>>> fully functional and not a major feature-set right now, I would be +1 on
>>> this.
>>> 
>>> MHO
>>> Matt
>>> 
>>> 
>>> 
>>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>> 
>>>> All,
>>>> 
>>>> As discussed on another thread [1], we identified a bug
>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>>> are not configured to sync their time with either the host HV or an NTP
>>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>>> KVM.  It appears that a correction for Xen would require the re-build of
>>>> a system VM image and a full round of regression testing that image.
>>>> 
>>>> Given that the discussion thread has not resulted in a consensus on this
>>>> issue, I unfortunately believe that the only path forward is to call for
>>>> a formal VOTE.
>>>> 
>>>> Please respond with one of the following:
>>>> 
>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>>> resolved
>>>> +0: don't care one way or the other
>>>> -1: do *not* proceed with any further 4.1 release candidates until
>>>> CLOUDSTACK-2492 has been fully resolved
>>>> 
>>>> -chip
>>>> 
>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>> 
>> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Marcus Sorensen <sh...@gmail.com>.
I feel like we need to clarify what's at risk here. Not to disrespect
anyone's opinion, but I'm just not getting where this is being
considered a major feature.  I think the very idea of Xen not having
feature parity (regardless of the feature) is distasteful to a lot of
us, and it should be. But consider that we are already two months
behind on a four month release cycle, and it sounds like fixing this
could take a month (if no issues are found, two weeks to qual the new
template). We run a time-based release, not a feature-based release.
Not all features are expected to be fully functional to get out the
door. Isn't the correct option to just mark the feature experimental,
tell them to run the newer template at their risk if they want it?

1) We need to verify whether this bug has been around for a long time,
because it will tell us how much it really matters and thus whether or
not it's a blocker. This addresses the 'timestamp of logs" and other
issues not related to new features.

2) We need to reiterate exactly what features are being affected. The
original e-mail lists 'S3 integration' as the only feature affected.
As far as I understand it, the actual feature impacted is a 'secondary
storage sync', if you have multiple zones, multiple secondary
storages, this backs up and handles the copying of templates, etc so
you don't have to manually register them everywhere.

I appreciate John's work for getting that secondary storage sync
feature in place. I really wish we would have noticed the issue
earlier on, then we may not be having this discussion. That said, no
disrespect intended toward John, I'm having a hard time understanding
how this is a feature worth holding up the release. It's not a new
primary or secondary storage type integration, and it's not a feature
where the admin is helpless to do it themselves. If VPC doesn't work,
the admin can't do anything about it. If this sync doesn't work, the
admin writes a script that copies their stuff everywhere.

Please, if anyone considers this a major feature worth blocking on,
explain to us why. Are you willing to push back release of all of the
other new features, and push back the 4.2 features, to have this one
feature in June, or whenever 4.1 gets out?


On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <ru...@gmail.com> wrote:
> +1 on moving forward.
>
> On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.
>
> There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.
>
> To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.
>
> If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems.
>
> -sebastien
>
> On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:
>
>> -1 on this.
>>
>> New features really should be across the board for the Hypervisors. Part
>> of the thing that distinguishes ACS is it's support across Xen / VMware /
>> KVM. Do we really want to start getting in the habit of pushing forward
>> new features that are not across the fully functional hypervisors?
>>
>> I agree with Outback this also will start to affect the Xen/XCP community
>> by basically setting them apart and out on what a lot of people see as a
>> major feature.
>>
>> I think it sets a really bad precedent. If it was Hyper-V which is not
>> fully functional and not a major feature-set right now, I would be +1 on
>> this.
>>
>> MHO
>> Matt
>>
>>
>>
>> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
>>
>>> All,
>>>
>>> As discussed on another thread [1], we identified a bug
>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>>> are not configured to sync their time with either the host HV or an NTP
>>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>>> Xen and vSphere).  Patches have been committed addressing vSphere and
>>> KVM.  It appears that a correction for Xen would require the re-build of
>>> a system VM image and a full round of regression testing that image.
>>>
>>> Given that the discussion thread has not resulted in a consensus on this
>>> issue, I unfortunately believe that the only path forward is to call for
>>> a formal VOTE.
>>>
>>> Please respond with one of the following:
>>>
>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>>> resolved
>>> +0: don't care one way or the other
>>> -1: do *not* proceed with any further 4.1 release candidates until
>>> CLOUDSTACK-2492 has been fully resolved
>>>
>>> -chip
>>>
>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>
>

Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Sebastien Goasguen <ru...@gmail.com>.
+1 on moving forward.

On this issue and on the upgrade issue I have realized that we forgot about our time based release philosophy.

There will always be bugs in the software. If we know them we can acknowledge them in release notes and get started quickly on the next releases.

To keep it short, I am now of the opinion (and I know I am kind of switching mind here), that we should release 4.1 asap and start working on the bug fix versions right away.

If we do release often, then folks stuck on a particular bug can expect a quick turn around and fix of their problems. 

-sebastien

On May 22, 2013, at 2:59 AM, Mathias Mullins <ma...@citrix.com> wrote:

> -1 on this. 
> 
> New features really should be across the board for the Hypervisors. Part
> of the thing that distinguishes ACS is it's support across Xen / VMware /
> KVM. Do we really want to start getting in the habit of pushing forward
> new features that are not across the fully functional hypervisors?
> 
> I agree with Outback this also will start to affect the Xen/XCP community
> by basically setting them apart and out on what a lot of people see as a
> major feature. 
> 
> I think it sets a really bad precedent. If it was Hyper-V which is not
> fully functional and not a major feature-set right now, I would be +1 on
> this.
> 
> MHO
> Matt 
> 
> 
> 
> On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:
> 
>> All,
>> 
>> As discussed on another thread [1], we identified a bug
>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>> are not configured to sync their time with either the host HV or an NTP
>> service.  That bug affects the system VMs for all three primary HVs (KVM,
>> Xen and vSphere).  Patches have been committed addressing vSphere and
>> KVM.  It appears that a correction for Xen would require the re-build of
>> a system VM image and a full round of regression testing that image.
>> 
>> Given that the discussion thread has not resulted in a consensus on this
>> issue, I unfortunately believe that the only path forward is to call for
>> a formal VOTE.
>> 
>> Please respond with one of the following:
>> 
>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>> resolved
>> +0: don't care one way or the other
>> -1: do *not* proceed with any further 4.1 release candidates until
>> CLOUDSTACK-2492 has been fully resolved
>> 
>> -chip
>> 
>> [1] http://markmail.org/message/rw7vciq3r33biasb
> 


Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?

Posted by Mathias Mullins <ma...@citrix.com>.
-1 on this. 

New features really should be across the board for the Hypervisors. Part
of the thing that distinguishes ACS is it's support across Xen / VMware /
KVM. Do we really want to start getting in the habit of pushing forward
new features that are not across the fully functional hypervisors?

I agree with Outback this also will start to affect the Xen/XCP community
by basically setting them apart and out on what a lot of people see as a
major feature. 

I think it sets a really bad precedent. If it was Hyper-V which is not
fully functional and not a major feature-set right now, I would be +1 on
this.

MHO
Matt 



On 5/20/13 4:15 PM, "Chip Childers" <ch...@sungard.com> wrote:

>All,
>
>As discussed on another thread [1], we identified a bug
>(CLOUDSTACK-2492) in the current 3.x system VMs, where the System VMs
>are not configured to sync their time with either the host HV or an NTP
>service.  That bug affects the system VMs for all three primary HVs (KVM,
>Xen and vSphere).  Patches have been committed addressing vSphere and
>KVM.  It appears that a correction for Xen would require the re-build of
>a system VM image and a full round of regression testing that image.
>
>Given that the discussion thread has not resulted in a consensus on this
>issue, I unfortunately believe that the only path forward is to call for
>a formal VOTE.
>
>Please respond with one of the following:
>
>+1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 being
>resolved
>+0: don't care one way or the other
>-1: do *not* proceed with any further 4.1 release candidates until
>CLOUDSTACK-2492 has been fully resolved
>
>-chip
>
>[1] http://markmail.org/message/rw7vciq3r33biasb