You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Byung-Gon Chun <bg...@gmail.com> on 2017/07/21 04:42:01 UTC

Re: 0.16 release plan

Sounds good!

- Gon

Sent from my iPhone

2017. 7. 21. 오후 1:34 Tae-Geon Um <ta...@gmail.com> 작성:

> Hi all, 
> 
> Again, I think it’s time to release 0.16; I would like to share the release plan :)
> 
> For the release 0.16, it looks like we have a consensus on the following items: 
> 1) releasing 0.16 with known transient failures (.NET side) 
> 2) upgrading to hadoop 2.7.0 for the new release
> 
> If you have any objection to these things, please share your opinion. 
> 
> Here is the release plan. 
> 1) I will wait *one week* for resolving the remaining transient failures
> 2) I will upgrade to hadoop 2.7.0 during the week
> 3) Although transient failures still remain after the deadline, I will call a release vote with the remaining transient issues. 
> 
> What do you think about this plan? 
> 
> Thanks,
> Taegeon
> 
>> On May 11, 2017, at 12:56 PM, Sergiy Matusevych <se...@gmail.com> wrote:
>> 
>> Hi Taegeon,
>> 
>> I am afraid I won't be able to look at either of these issues this week. I
>> am very busy working on my slides on Distributed Factorization Machines,
>> and I have a presentation on Friday.
>> 
>> We can release 0.16-preview (or beta, or rc1, or whatever you call it) - I
>> think we are good in terms of features, and there is a finite number of
>> bugs that should not stop early adopters from using 0.16-preview
>> 
>> What do you guys think?
>> 
>> Cheers,
>> Sergiy.
>> 
>> 
>> 
>> 
>>> On Wed, May 10, 2017 at 7:23 PM, Tae-Geon Um <ta...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Thanks Sergiy for taking a look at them!
>>> As far as I know, ApacheCon opens next week (May 16-18), so I think we
>>> need to resolve the issues until the end of this week.
>>> 
>>> I think I can help you in investigating REEF-1770. However, I’m not sure I
>>> can fix it until the end of this week.
>>> REEF-1770 is a transient failure, but there is no transient failure during
>>> the past 30 days in our Java-side Travis CI (0 transient failure out of 69
>>> builds).
>>> So, maybe it could be hard to reproduce it.
>>> 
>>> Sergiy, do you think you can resolve REEF-1796 until the end of this week?
>>> If not, we have two options.
>>> 
>>> 1) release 0.16 during the week of ApacheCon without fixing them
>>> (Actually, the .NET side CI is unstable, but just release 0.16)
>>> 2) do not release 0.16 until the bugs (in addition to the .NET side CI
>>> failures) are resolved
>>> 
>>> What do you guys think?
>>> 
>>> Thanks,
>>> Taegeon
>>> 
>>>> On May 11, 2017, at 6:55 AM, Sergiy Matusevych <
>>> sergiy.matusevych@gmail.com> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> It surely would be great to announce 0.16 at the conference, and we have
>>>> some awesome features to brag about - most notably, REEF-on-Spark. Still,
>>>> there are a few bugs that we need to fix before the release. I am mostly
>>>> concerned with https://issues.apache.org/jira/browse/REEF-1770 and
>>>> especially the https://issues.apache.org/jira/browse/REEF-1796 I am
>>> looking
>>>> at them now, but any help would be greatly appreciated!
>>>> 
>>>> Cheers,
>>>> Sergiy.
>>>> 
>>>> On Tue, May 9, 2017 at 11:55 PM, Byung-Gon Chun <bg...@gmail.com>
>>> wrote:
>>>> 
>>>>> Any update?
>>>>> It'd be great if we can release 0.16 during the week of ApacheCon.
>>>>> 
>>>>> -Gon
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 10:27 AM, Tae-Geon Um <ta...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Unfortunately, we’ve also got a recent build failure in Java side [1],
>>>>>> which is not reported previously.
>>>>>> I’ve created an issue [2] to track this failure, and am going to
>>>>>> investigate it.
>>>>>> 
>>>>>> Thanks,
>>>>>> Taegeon
>>>>>> 
>>>>>> [1]: https://travis-ci.org/apache/reef/builds/220731026 <
>>>>>> https://travis-ci.org/apache/reef/builds/220731026>
>>>>>> [2]: https://issues.apache.org/jira/browse/REEF-1770 <
>>>>>> https://issues.apache.org/jira/browse/REEF-1770>
>>>>>>> On Apr 6, 2017, at 3:13 AM, Mariia Mykhailova <mamykhai@microsoft.com
>>> .
>>>>> INVALID>
>>>>>> wrote:
>>>>>>> 
>>>>>>> At least 3 of the issues previously reported under REEF-1462 have
>>>>>> re-occurred in the past two days (I've reopened corresponding JIRAs and
>>>>>> attached links to failures). Unfortunately, with the transient failures
>>>>>> like these one good build is insufficient.
>>>>>>> 
>>>>>>> It is a known issue, since we're using free access to AppVeyor, our
>>>>>> builds are sequential and low-priority, so sometimes when a lot of pull
>>>>>> requests have to be built the build queue takes a while to drain.
>>>>>>> 
>>>>>>> -Mariia
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>>>>>>> Sent: Wednesday, April 5, 2017 12:11 AM
>>>>>>> To: dev@reef.apache.org
>>>>>>> Subject: Re: 0.16 release plan
>>>>>>> 
>>>>>>> Awesome!
>>>>>>> 
>>>>>>> There is no build failure in the .Net side with the latest build [1].
>>>>>>> 
>>>>>>> It looks like Appveyor's quite slow. Regarding PR1284 [2], Travis CI's
>>>>>> already done. We're still waiting for Appveyor. :(
>>>>>>> 
>>>>>>> [1]
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>>>> https%3A%2F%2Fci.appveyor.com%2Fproject%2FApacheSoftwareFoundation%
>>>>>> 2Freef%2Fbuild%2F1455-master&data=02%7C01%7Cmamykhai%40microsoft.com%
>>>>>> 7C90c5159366d44f7efdd308d47bf2fda1%7C72f988bf86f141af91ab2d7cd011
>>>>>> db47%7C1%7C0%7C636269730940876660&sdata=j2h9%
>>> 2BhaBnkHjFnkwxLh6GiPubCBDb%
>>>>>> 2B5%2B3S8Ok6aU2dc%3D&reserved=0
>>>>>>> [2] https://na01.safelinks.protection.outlook.com/?url=
>>>>>> https%3A%2F%2Fgithub.com%2Fapache%2Freef%2Fpull%2F1284&
>>>>>> data=02%7C01%7Cmamykhai%40microsoft.com%7C90c5159366d44f7efdd308d47bf2
>>>>>> fda1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
>>>>>> 7C636269730940876660&sdata=oF%2F6YP9JkpD%
>>> 2FahyiP9Yu6MpIrOEd48MEVQQ8w1Iw
>>>>>> E9Q%3D&reserved=0
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Apr 4, 2017 at 10:29 AM, Tae-Geon Um <ta...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Julia for the work!
>>>>>>>> 
>>>>>>>> It looks like Java and .NET builds are almost stable, except for the
>>>>>>>> recent build failure in .NET side [1].
>>>>>>>> As Julia said in REEF-1406 [2], we would need to wait for time if
>>> this
>>>>>>>> failure is reproduced or not.
>>>>>>>> 
>>>>>>>> I will wait for a week and call a release vote if there are no build
>>>>>>>> failures during that time.
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> Taegeon
>>>>>>>> 
>>>>>>>> [1]:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fci.ap
>>>>>>>> pveyor.com%2Fproject%2FApacheSoftwareFoundation%
>>>>> 2Freef%2F&data=02%7C01
>>>>>>>> %7Cmamykhai%40microsoft.com%7C90c5159366d44f7efdd308d47bf2
>>>>> fda1%7C72f98
>>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636269730940876660&
>>>>> sdata=Y9iO%2B
>>>>>>>> YRbPNirr38T%2BNJtxEQg0xm65lOb0P%2Bc5w6agYI%3D&reserved=0
>>>>>>>> build/1453-master
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fci.a
>>>>>>>> ppveyor.com%2Fproject%2F&data=02%7C01%7Cmamykhai%40microsoft.com
>>>>> %7C90c
>>>>>>>> 5159366d44f7efdd308d47bf2fda1%7C72f988bf86f141af91ab2d7cd011
>>>>> db47%7C1%7
>>>>>>>> C0%7C636269730940876660&sdata=exulSYnqM0PxkRJBTpxAd825tbhRnt
>>>>> M6avrry5nk
>>>>>>>> Nfw%3D&reserved=0 ApacheSoftwareFoundation/reef/build/1453-master>
>>>>>>>> [2]:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissue
>>>>>>>> s.apache.org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%
>>>>> 7Cmamykhai%40mic
>>>>>>>> rosoft.com%7C90c5159366d44f7efdd308d47bf2
>>>>> fda1%7C72f988bf86f141af91ab2d
>>>>>>>> 7cd011db47%7C1%7C0%7C636269730940876660&sdata=oS%
>>>>> 2F9yenZoGqe%2FkowHza7
>>>>>>>> m2T531qmGySb7q1qGmX%2FTJA%3D&reserved=0 <
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissue
>>>>>>>> s.apache.org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%
>>>>> 7Cmamykhai%40mic
>>>>>>>> rosoft.com%7C90c5159366d44f7efdd308d47bf2
>>>>> fda1%7C72f988bf86f141af91ab2d
>>>>>>>> 7cd011db47%7C1%7C0%7C636269730940876660&sdata=oS%
>>>>> 2F9yenZoGqe%2FkowHza7
>>>>>>>> m2T531qmGySb7q1qGmX%2FTJA%3D&reserved=0>
>>>>>>>> 
>>>>>>>>> On Mar 30, 2017, at 10:28 AM, Julia Wang (QIUHE) <
>>>>>>>> Qiuhe.Wang@microsoft.com.INVALID> wrote:
>>>>>>>>> 
>>>>>>>>> I have resolved all the .Net test issues for now. The fixes contain
>>>>>>>>> what
>>>>>>>> I have identifies so far based on the failures.
>>>>>>>>> 
>>>>>>>>> I agree with Marria, as they are transit failures, also they failed
>>>>>>>>> for
>>>>>>>> multiple reasons sometimes, we need to continue to observe if the
>>>>>>>> issues come back again.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Julia
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Tae-Geon Um [mailto:taegeonum@gmail.com]
>>>>>>>>> Sent: Thursday, March 23, 2017 5:20 PM
>>>>>>>>> To: dev@reef.apache.org
>>>>>>>>> Subject: Re: 0.16 release plan
>>>>>>>>> 
>>>>>>>>> Thanks Mariia for pointing it out to me.
>>>>>>>>> Yes. I agree that we need more time to fix all of the transient
>>>>>> failures.
>>>>>>>>> After they are resolved, I will wait for some time to ensure that
>>>>>>>>> they
>>>>>>>> are not reoccurred.
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>>> Taegeon
>>>>>>>>> 
>>>>>>>>>> On Mar 24, 2017, at 2:54 AM, Mariia Mykhailova
>>>>>>>>>> <ma...@microsoft.com.INVALID>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Please note that due to the transient nature of .NET failures, it
>>>>>>>>>> makes
>>>>>>>> sense to wait for some time and to observe whether they are actually
>>>>>>>> fixed or just lying low until the next reoccurrence. We had to reopen
>>>>>>>> some bugs which looked resolved in the past but then reoccurred.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -Mariia
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ________________________________
>>>>>>>>>> From: Tae-Geon Um <ta...@gmail.com>
>>>>>>>>>> Sent: Thursday, March 23, 2017 6:51:24 AM
>>>>>>>>>> To: dev@reef.apache.org
>>>>>>>>>> Subject: Re: 0.16 release plan
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Julia has been doing a great work to resolve the .NET side issues.
>>>>>>>>>> It looks like she has resolved 3 issues recently (and now 3 issues
>>>>>>>> remain in .NET side with 1 pending PR).
>>>>>>>>>> 
>>>>>>>>>> Sergiy and I also have worked for the java side issues, and we've
>>>>>>>> resolved 1 issue (and 1 issue still remains with 1 pending PR).
>>>>>>>>>> 
>>>>>>>>>> Because of the unresolved issues (3 .NET side and 1 java side), I
>>>>>>>>>> think
>>>>>>>> it would be good to delay the release vote.
>>>>>>>>>> However, judging from the progress we made, I think all of the
>>>>>>>>>> issues
>>>>>>>> could be resolved until at the end of this week or begging of next
>>>>> week.
>>>>>>>>>> 
>>>>>>>>>> I will call a vote as soon as possible after they are resolved.
>>>>>>>>>> Thanks!
>>>>>>>>>> 
>>>>>>>>>> Taegeon
>>>>>>>>>> 
>>>>>>>>>>> On Mar 23, 2017, at 5:47 PM, Byung-Gon Chun <bg...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thank you for all the efforts to make release 0.16 happen!
>>>>>>>>>>> 
>>>>>>>>>>> Taegeon, could you give us status update? Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> On Sat, Mar 18, 2017 at 10:01 AM, Byung-Gon Chun
>>>>>>>>>>> <bg...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for looking at .Net CI failures, Julia!
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for handling Java CI failures, Sergiy and Taegeon!
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Mar 17, 2017 at 11:20 AM, Julia Wang (QIUHE) <
>>>>>>>>>>>> Qiuhe.Wang@microsoft.com.invalid> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I am working on some of the .Net AppVeyor test failures now.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Sergiy Matusevych [mailto:sergiy.matusevych@gmail.com]
>>>>>>>>>>>>> Sent: Wednesday, March 15, 2017 5:12 PM
>>>>>>>>>>>>> To: dev@reef.apache.org
>>>>>>>>>>>>> Subject: Re: 0.16 release plan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Mar 15, 2017 at 4:21 PM, Tae-Geon Um
>>>>>>>>>>>>> <ta...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Oh, never mind.
>>>>>>>>>>>>>> I thought that we still need some time to make sure that
>>>>>>>>>>>>>> Unmanaged AM works properly on Hadoop 2.7.3 :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Great, then we have only two items left! :-) I can confirm that
>>>>>>>>>>>>> Unmanaged AM works on proper version of YARN; still, we need to
>>>>>>>>>>>>> address the Java issues (that is, item #1) as they are related
>>>>>>>>>>>>> to the Unmanaged AM mode. For example, we must make sure close
>>>>>>>>>>>>> all threads before exiting REEF Driver - otherwise,
>>>>>>>>>>>>> HelloREEFYarnUnmanagedAM example can hang as it does not
>>>>>>>>>>>>> currently
>>>>>>>> have a System.exit() call at the end.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for help!
>>>>>>>>>>>>> Sergiy.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mar 16, 2017, at 6:07 AM, Sergiy Matusevych <
>>>>>>>>>>>>>> sergiy.matusevych@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Taegeon,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What exactly do you mean by #3? We have a HelloREEF example
>>>>>>>>>>>>>>> running in Unmanaged AM mode (see HelloREEFYarnUnmanagedAM
>>>>>>>>>>>>>>> class), and it works fine on YARN 2.7.3. We also have several
>>>>>>>>>>>>>>> examples and unit tests that check
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Unmanaged AM and REEF-as-a-library functionality, e.g.
>>>>>>>>>>>>>>> HelloREEFEnvironment, ReefOnReefDriver,
>>>>>>>>>>>>>>> REEFEnvironmentDriverTest, and such. What else do you think we
>>>>>>>>>>>>>>> should unit test? (I am saying that our unit tests are
>>>>>>>>>>>>>>> comprehensive (they are not!), but I would love to know
>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>> area you think we should focus on for 0.16 release)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Sergiy.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Mar 15, 2017 at 7:10 AM, Tae-Geon Um
>>>>>>>>>>>>>>> <ta...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It's been about 10 months since we've released the latest
>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>> (0.15
>>>>>>>>>>>>>>> version).
>>>>>>>>>>>>>>>> In order not to delay the release any longer, I want to call
>>>>>>>>>>>>>>>> a release
>>>>>>>>>>>>>>> vote as soon as possible.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Do you think it is ok for me to call a 0.16 release vote on
>>>>>>>>>>>>>>>> next
>>>>>>>>>>>>>> Thursday
>>>>>>>>>>>>>>> (23th)?
>>>>>>>>>>>>>>>> I know there still remain several blocking issues:
>>>>>>>>>>>>>>>> 1) Java side CI failures
>>>>>>>>>>>>>>>> 2) .NET side CI failures
>>>>>>>>>>>>>>>> 3) Unmanaged AM test
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I want to know if it is possible that they can be resolved
>>>>>>>>>>>>>>>> until next
>>>>>>>>>>>>>>> Thursday.
>>>>>>>>>>>>>>>> I'm currently taking a look at 1) (with Sergiy's help), and
>>>>>>>>>>>>>>>> the due date
>>>>>>>>>>>>>>> is ok to me.
>>>>>>>>>>>>>>>> How about 2) and 3) ? As far as I know, 2) is on Julia and
>>>>>>>>>>>>>>>> Sergiy is
>>>>>>>>>>>>>>> working on 3).
>>>>>>>>>>>>>>>> If the plan seems not ok, could you please share the ETA of
>>>>>> them?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Taegeon
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Byung-Gon Chun
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Byung-Gon Chun
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Byung-Gon Chun
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Byung-Gon Chun
>>>>> 
>>> 
>>> 
> 

RE: 0.16 release plan

Posted by Taegeon Um <ta...@gmail.com>.
Thanks Markus and Mariia for the comments!

I will document the transient failures in the release notes and update
REEF-1462 :)

Thanks,
Taegeon

2017. 7. 22. 오전 3:22에 "Mariia Mykhailova" <ma...@microsoft.com.invalid>님이
작성:

If we're releasing with the transient failures, the release manager needs
to go through the recent (~1 month) AppVeyor builds and update our list of
transient failures in REEF-1462 is. It has been a while since I updated
that list, and if we're making this part of release documentation it needs
to be up-to-date.

-Mariia

-----Original Message-----
From: Markus Weimer [mailto:markus@weimo.de]
Sent: Friday, July 21, 2017 10:03 AM
To: REEF Developers Mailinglist <de...@reef.apache.org>
Subject: Re: 0.16 release plan

Looks good overall. I'd like to add a goal for the release: Document the
transient failures in the release notes. That way, they are not a surprise
to future users. --Markus

RE: 0.16 release plan

Posted by Mariia Mykhailova <ma...@microsoft.com.INVALID>.
If we're releasing with the transient failures, the release manager needs to go through the recent (~1 month) AppVeyor builds and update our list of transient failures in REEF-1462 is. It has been a while since I updated that list, and if we're making this part of release documentation it needs to be up-to-date.

-Mariia

-----Original Message-----
From: Markus Weimer [mailto:markus@weimo.de] 
Sent: Friday, July 21, 2017 10:03 AM
To: REEF Developers Mailinglist <de...@reef.apache.org>
Subject: Re: 0.16 release plan

Looks good overall. I'd like to add a goal for the release: Document the transient failures in the release notes. That way, they are not a surprise to future users. --Markus

Re: 0.16 release plan

Posted by Markus Weimer <ma...@weimo.de>.
Looks good overall. I'd like to add a goal for the release: Document
the transient failures in the release notes. That way, they are not a
surprise to future users. --Markus