You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Anupam <an...@gmail.com> on 2016/01/15 02:17:51 UTC

Improving dev experience for REEF .NET

I have noticed a few pain points while working on REEF .NET in terms of
maintaining API, packaging, builds and testability .
Here is a non-exhaustive list of things I have noticed and a few of my
suggestions. Please respond with your take and comments.
  1. Stable APIs
    a. Tagging new API which could change frequently
    b. API spread too thin across assemblies
    c. Accidentally public APIs and their deprecation
  2. REEF Consumption
    a. Too many NuGet packages
    b. Documentation for new features. What should be the process?
    c. Clear documentation of REEF APIs
    d. Example code should be exemplary
  3. Build flakiness
    a. Build is flaky when executed in parallel; we have been trying hard
to get VS2015 to work
  4. Test coverage
    a. Unit test coverage
    b. One-box tests on real environment like YARN
    c. Related to (b), add E2E tests
    d. Lack of stress tests
  5. Pull request process
    a. Prioritization of issues to point out (design and API issues vs
existence of unit tests vs clarity in unit tests vs E2E tests vs clarity of
comments vs formatting and other cosmetic issues)
    b. (Subjective) Lack of process for unstable APIs along with PR
turn-around time (sometimes) makes me wary of splitting work into small PRs.

Here are a few suggestions I have:

Make a separate project named something like
Org.Apache.REEF.SDK/Org.Apache.REEF.APIs which has all the supported stable
and public APIs. This project will contain only interfaces and data
contracts which we intend to support. REEF internal assemblies will then
have a proper, but fungible contract without making “internals visible” to
each other. Further we know exactly what is our public API surface area, in
a single place.

Consumption: Split NuGets into SDK and Runtime (absolute minimum to run a
REEF app) packages. IMRU and such applications can have their own NuGets.

Build flakiness: Use IDE features as much as possible. Minimize manually
changing build files as much as possible. Clean up build.props to not have
Targets but just have aliases. Remove hidden common operations done via
build.props across multiple project builds, this makes us susceptible to
build concurrency issues.

Test coverage: Mandating addition of unittests for all new code added. Add
E2E tests, run them on localruntime as well as one box setup of YARN
(Mesos?). We need to add stress tests.

Pull request: We need to come up with PR guidelines which contributors can
follow to submit high quality code and which can be enforced preferably by
tools or by reviewers and committers. We would be able to probably copy
such guidelines from some project.

Let me know what you all think? Please feel free to add more issues that
you have noticed or refute what I have said.

-- 
Anupam

Re: Improving dev experience for REEF .NET

Posted by Anupam <an...@gmail.com>.
@Markus:
> Also, there is still a use case for `[Unstable]`: We sometimes introduce new *APIs* that are unstable and not covered by the same "we won't break it" promise as the others.

Would it makes sense to put those APIs in one level deeper namespace?
Ex: For a new client API, instead of having
Org.Apache.REEF.Client.FancyNewClient we could start off with
Org.Apache.REEF.Client.Unstable.FancyNewClient?

> 5a was the lack of guidelines for the reviewers, right? Do you have a suggestion as to how to improve the situation?

Coding guidelines would also be code reviewers' guidelines. It is just
that we need a more compact an relevant one as compared to CoreFx.


@Julia: Thanks for your remarks. A few clarifications:

> "Moving things to three API assemblies (REEF, Tang, Wake)" - How about utilities? It is shared by all the three.

I am proposing to *not* expose utilities in any way in the API assemblies.

> For example, build architecture and platform

What you are pointing out is the "valid" use of build.props. I agree
that build.props would continue in this capacity. What I am suggesting
to remove are things like: <Target
Name="RemoveIntermediateDirectories"  AfterTargets="clean"> or Jar
related targets.

Thanks.

On 19 January 2016 at 12:20, Julia Wang (QIUHE)
<Qi...@microsoft.com> wrote:
>
> Thanks Anupam for proposing the improvements and comments from Makes,  Gon and Mariia! A few more comments from me:
>
> "Moving things to three API assemblies (REEF, Tang, Wake)" - How about utilities? It is shared by all the three.
>
> "Clean up build.props to not have Targets but just have aliases." - Agree we shall clean up if anything doesn't belong there. But some of the settings are for common and put there on purpose. For example, build architecture and platform, If we use VS auto generated project file, those Any CPU and x86 things would make the solution and projects mess up and very easy to get run time issue for unmatched architecture issue. Many teams suffered this issues in my experience. That is the reason you can see CBT for ecample uses the same approach, define those at base level. Another thing we defined in build.prop is for NuGet generation. It is shared cross all the projects. Defining at base level is easy to update and less out of sync issue.
>
> "PR turn-around time" - agree splitting a work into multiple PRs lower down the efficiency for developers, but easy for review. We should have a balance in between. Developers should make a reasonable cut off when splitting.
>
> "Pull request process" - Understand committers turn-around time is one concern. To ensure all related people are aware of the change, I would think the contributor who issue the PR should add those names that are related to the code (like worked or modified on the same code before) in the PR to call for review. It can be one or more people, similar to what we use CodeFlow and specify required reviewers. Otherwise, a code review could be missed by those who are really familiar to the code and able to give comments. By specifying names, it would also reminder the reviewer to review the PR.
>
> "Split NuGets into SDK and Runtime (absolute minimum to run a REEF app) packages. IMRU and such applications can have their own NuGets." - In fact, only Examples and IMRU which is a customized driver/tasks are not needed for general clients and we already have separated NuGet for them.  NuGet is per cs project. To reduce the number of the NeGut that clients are facing, one way is to use the approach as in Org.Apache.REEF.ALL to package NuGets so that clients would only deal with those that are packaged.
>
> "Documentation for new features" - API level documentation can be at least enforced at code level as comments. Feature level should be reflected in Jira. When open a Jira, we should choose a proper type so that it would be easy to query when needed. I guess Markus has been monitoring/updating this.
>
> Thanks,
> Julia
>
> -----Original Message-----
> From: Markus Weimer [mailto:markus@weimo.de]
> Sent: Tuesday, January 19, 2016 10:39 AM
> To: dev@reef.apache.org
> Subject: Re: Improving dev experience for REEF .NET
>
> On 2016-01-19 01:19, Anupam wrote:
> > "What should we do beyond annotating that code with `[Unstable]`?"
> > Mix of [Unstable] and "stable" APIs in the same assembly and namespace
> > are hard to work with for both REEF developers and users.
> > As a user, Visual Studio is not giving me intellisense on what I am
> > not supposed to use.
>
> Good point. Moving things to three API assemblies (REEF, Tang, Wake) makes sense to me. Does this mean we should also rename things such that namespace and assembly names still align?
>
> Also, there is still a use case for `[Unstable]`: We sometimes introduce new *APIs* that are unstable and not covered by the same "we won't break it" promise as the others. Maybe we can find a way to highlight those in Visual Studio?
>
> > RTC vs CTR: I had collected these items over months and now I think
> > 5.a is much more of and issue than 5.b.
>
> 5a was the lack of guidelines for the reviewers, right? Do you have a suggestion as to how to improve the situation?
>
> > "Please have a look at our contributors guide and improve as you see
> > fit." Will do.
>
> Thanks! Maybe file a JIRA to track that work and ask for reviews of drafts as they come along.
>
> Markus




-- 
Anupam
Bellevue, WA
Ph: +1 (425)-777-5570

RE: Improving dev experience for REEF .NET

Posted by "Julia Wang (QIUHE)" <Qi...@microsoft.com>.
Thanks Anupam for proposing the improvements and comments from Makes,  Gon and Mariia! A few more comments from me:

"Moving things to three API assemblies (REEF, Tang, Wake)" - How about utilities? It is shared by all the three. 

"Clean up build.props to not have Targets but just have aliases." - Agree we shall clean up if anything doesn't belong there. But some of the settings are for common and put there on purpose. For example, build architecture and platform, If we use VS auto generated project file, those Any CPU and x86 things would make the solution and projects mess up and very easy to get run time issue for unmatched architecture issue. Many teams suffered this issues in my experience. That is the reason you can see CBT for ecample uses the same approach, define those at base level. Another thing we defined in build.prop is for NuGet generation. It is shared cross all the projects. Defining at base level is easy to update and less out of sync issue. 

"PR turn-around time" - agree splitting a work into multiple PRs lower down the efficiency for developers, but easy for review. We should have a balance in between. Developers should make a reasonable cut off when splitting. 

"Pull request process" - Understand committers turn-around time is one concern. To ensure all related people are aware of the change, I would think the contributor who issue the PR should add those names that are related to the code (like worked or modified on the same code before) in the PR to call for review. It can be one or more people, similar to what we use CodeFlow and specify required reviewers. Otherwise, a code review could be missed by those who are really familiar to the code and able to give comments. By specifying names, it would also reminder the reviewer to review the PR. 

"Split NuGets into SDK and Runtime (absolute minimum to run a REEF app) packages. IMRU and such applications can have their own NuGets." - In fact, only Examples and IMRU which is a customized driver/tasks are not needed for general clients and we already have separated NuGet for them.  NuGet is per cs project. To reduce the number of the NeGut that clients are facing, one way is to use the approach as in Org.Apache.REEF.ALL to package NuGets so that clients would only deal with those that are packaged. 

"Documentation for new features" - API level documentation can be at least enforced at code level as comments. Feature level should be reflected in Jira. When open a Jira, we should choose a proper type so that it would be easy to query when needed. I guess Markus has been monitoring/updating this. 

Thanks,
Julia

-----Original Message-----
From: Markus Weimer [mailto:markus@weimo.de] 
Sent: Tuesday, January 19, 2016 10:39 AM
To: dev@reef.apache.org
Subject: Re: Improving dev experience for REEF .NET

On 2016-01-19 01:19, Anupam wrote:
> "What should we do beyond annotating that code with `[Unstable]`?" 
> Mix of [Unstable] and "stable" APIs in the same assembly and namespace 
> are hard to work with for both REEF developers and users.
> As a user, Visual Studio is not giving me intellisense on what I am 
> not supposed to use.

Good point. Moving things to three API assemblies (REEF, Tang, Wake) makes sense to me. Does this mean we should also rename things such that namespace and assembly names still align?

Also, there is still a use case for `[Unstable]`: We sometimes introduce new *APIs* that are unstable and not covered by the same "we won't break it" promise as the others. Maybe we can find a way to highlight those in Visual Studio?

> RTC vs CTR: I had collected these items over months and now I think 
> 5.a is much more of and issue than 5.b.

5a was the lack of guidelines for the reviewers, right? Do you have a suggestion as to how to improve the situation?

> "Please have a look at our contributors guide and improve as you see 
> fit." Will do.

Thanks! Maybe file a JIRA to track that work and ask for reviews of drafts as they come along.

Markus

Re: Improving dev experience for REEF .NET

Posted by Markus Weimer <ma...@weimo.de>.
On 2016-01-19 01:19, Anupam wrote:
> "What should we do beyond annotating that code with `[Unstable]`?" 
> Mix of [Unstable] and "stable" APIs in the same assembly and 
> namespace are hard to work with for both REEF developers and users. 
> As a user, Visual Studio is not giving me intellisense on what I am 
> not supposed to use.

Good point. Moving things to three API assemblies (REEF, Tang, Wake)
makes sense to me. Does this mean we should also rename things such that
namespace and assembly names still align?

Also, there is still a use case for `[Unstable]`: We sometimes introduce
new *APIs* that are unstable and not covered by the same "we won't break
it" promise as the others. Maybe we can find a way to highlight those in
Visual Studio?

> RTC vs CTR: I had collected these items over months and now I think 
> 5.a is much more of and issue than 5.b.

5a was the lack of guidelines for the reviewers, right? Do you have a
suggestion as to how to improve the situation?

> "Please have a look at our contributors guide and improve as you see 
> fit." Will do.

Thanks! Maybe file a JIRA to track that work and ask for reviews of
drafts as they come along.

Markus

Re: Improving dev experience for REEF .NET

Posted by Anupam <an...@gmail.com>.
Thanks everyone for replying.

Few things I would like to respond to and add.

"What should we do beyond annotating that code with `[Unstable]`?"
Mix of [Unstable] and "stable" APIs in the same assembly and namespace are
hard to work with for both REEF developers and users. As a user, Visual
Studio is not giving me intellisense on what I am not supposed to use. I
feel that developing APIs in internal REEF assembly and then graduating the
APIs over time to the public SDK assembly would be a better way of handling
this. As long as the user compiles their code against only SDK assembly,
they will be fine. (making this change cleanly without making a breaking
change would be a difficult thing to accomplish as you pointed out)

RTC vs CTR
I had collected these items over months and now I think 5.a is much more of
and issue than 5.b. I have usually had good experience with PRs wrt 5.b
since I have been working on REEF-70, REEF-227 and related JIRAs. If
anything, on a spectrum of RTC to CTR I would like to see more rigorous RTC
than we do now.

"This will be tricky to enforce, as we have plenty of "trivial" code
changes flying by."
Except for typos/cosmetic changes, there should either be a unittest added
or changed at the very least. We are not an old enough codebase to justify
non-addition of tests as the rigidity existing design. If we follow good
practices while writing the code in the first place, adding/changing
unittests should be extremely cheap (in some places that is not the case,
which clearly shows in the interfaces which are strongly coupled). We would
leave it to the collective wisdom of the contributor and code reviewer to
justify and accept no addition of unittest. In any case it should exception
and not the norm. We could also keep track of code coverage, with a grain
of salt.

"we somehow need to agree among the committers on review standards"
Please take a look at
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-CodeReviewCriteria
or https://wiki.apache.org/hadoop/CodeReviewChecklist. We could begin from
something simple like these.

"So far, the coding guidelines are a straight copy from CoreFx, with the
added emphasis on immutability in REEF."
Personally I haven't seen much being followed except the immutability part
:). I doubt that all contributors and committers have gone through the
CoreFx guidelines, as they are too verbose and much of it does not apply to
us. Which brings me to the next point.

"Please have a look at our contributors guide and improve as you see fit"
Will do. Will basically borrow heavily from publicly available PR
guidelines (Hadoop, CoreFx, Spark etc) and send a draft. Will need some
help on the Java specific details.

+1 on generating nice website documentation. It also makes it easier to
point out problems with documentation.

I will be filing JIRA on few of the actionable items here.

@bgchun: as engineers we all just see (never ending?) room for
improvement... It is fun to contribute on REEF .NET :)

Please don't consider this as a wrap up of this thread. Feel free to add
more comments and more issues that you have noticed.

Thanks,

On 15 January 2016 at 12:27, Mariia Mykhailova <ma...@microsoft.com>
wrote:

> A couple of comments:
>
> >> b. Documentation for new features. What should be the process?
> > However, that documentation is currently largely in vain, as we haven't
> found a way to generate a nice website from it like we do on the Java side.
>
> REEF-113, easy to do if we're willing to use external tools for site
> generation. I've just tried to configure Doxygen (
> http://www.stack.nl/~dimitri/doxygen/, free open-source), and it produces
> a very similar site with very little configuration effort. I have several
> questions about appearance of the website, will ask in JIRA.
>
> >> 3. Build flakiness 4. Test coverage
>
> Yes, we need to put more emphasis on test quality.
> Right now we have two ongoing issues with existing tests, REEF-1079 which
> I'm looking at, and REEF-1040 which is pretty much abandoned.
>
> -Mariia
>
>
> -----Original Message-----
> From: Markus Weimer [mailto:markus@weimo.de]
> Sent: Thursday, January 14, 2016 6:13 PM
> To: dev@reef.apache.org
> Subject: Re: Improving dev experience for REEF .NET
>
> thanks for putting these together!
>
>
> On 2016-01-14 17:17, Anupam wrote:
> > 1. Stable APIs a. Tagging new API which could change frequently b.
> > API spread too thin across assemblies c. Accidentally public APIs and
> > their deprecation
>
> We are acting on those, right? Once REEF-1087 is completed, we should be
> in much better shape in terms of making only the real APIs `public`.
> That in turn should enable us to act on the assembly structure.
>
> > b. Documentation for new features. What should be the process?
>
> There are two levels of documentation: API docs and prose on that what and
> why of a new feature. I think we should absolutely have API docs for every
> `public` interface, class and method. However, that documentation is
> currently largely in vain, as we haven't found a way to generate a nice
> website from it like we do on the Java side.
>
> Regarding tutorials and such, we are doing badly. Some work has been done,
> but this clearly can be improved. Not sure how to make the time for that,
> though. Everybody agrees that we *should* do something, but feature work
> seems to get in the way of that pretty reliably.
>
> > d. Example code should be exemplary 3. Build flakiness 4. Test
> > coverage
>
> +1 on each of them.
>
> All of these are "easy" fixes. We know how this should be done, we just
> haven't.
>
> > 5. Pull request process
>
> This is the thorny one. Like most Big Data projects in Apache, we've
> decided on a review-then-commit (RTC) workflow, which means every
> contribution needs to somehow motivate a committer to review it.
> Further, we somehow need to agree amongst the committers on review
> standards. Both of these `somehow` statements are extremely tricky in
> practice.
>
> > b. (Subjective) Lack of process for unstable APIs along with PR
> > turn-around time (sometimes) makes me wary of splitting work into
> > small PRs.
>
> What should we do beyond annotating that code with `[Unstable]`?
>
> PR turn-around time will always be dependent on the load on the
> committers, unless we decide to ditch RTC for CTR. If we do that, the work
> of the release manager becomes more substantial: They'd have to orchestrate
> several rounds of cleanups before the release can be cut.
> Other projects do that, so it isn't out of the question.
>
> > Make a separate project named something like
> > Org.Apache.REEF.SDK/Org.Apache.REEF.APIs which has all the supported
> > stable and public APIs. [...] Consumption: Split NuGets into SDK and
> > Runtime (absolute minimum to run a REEF app) packages. IMRU and such
> > applications can have their own NuGets.
>
> +1, with one change: We need one each for REEF, Wake, Tang.
>
> One issue I see with this is that we either have to rename all types again
> or have to have projects where the DLL, NuGet and default namespace all
> have different names.
>
> > Build flakiness: Use IDE features as much as possible. Minimize
> > manually changing build files as much as possible. Clean up
> > build.props to not have Targets but just have aliases. Remove hidden
> > common operations done via build.props across multiple project builds,
> > this makes us susceptible to build concurrency issues.
>
> +1, this is a bug and really shouldn't need a list discussion to drive
> consensus. Feel free to file JIRAs.
>
> > Test coverage: Mandating addition of unittests for all new code added.
> > Add E2E tests, run them on localruntime as well as one box setup of
> > YARN (Mesos?). We need to add stress tests.
>
> [REEF-111] captures the infrastructure aspects of this. I believe Mariia
> is looking into it. Regarding the mandate for new tests: This will be
> tricky to enforce, as we have plenty of "trivial" code changes flying by.
> Asking for tests in addition to those would likely slow us down. For new
> features, I am all with you. Which brings us back to the question of how to
> set common expectations for PR reviews.
>
> > Pull request: We need to come up with PR guidelines
>
> Please have a look at our contributors guide and improve as you see fit:
>
>
> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcwiki.apache.org%2fconfluence%2fdisplay%2fREEF%2fContributing&data=01%7c01%7cmamykhai%40microsoft.com%7c511e60c6254e4edf2b9908d31d51680f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=e4UKAdvf4%2fWoRe4Xagl2DXeAkuf24L2YRKf2K%2fKyqkE%3d
>
> So far, the coding guidelines are a straight copy from CoreFx, with the
> added emphasis on immutability in REEF.
>
> Markus
>



-- 
Anupam
Bellevue, WA
Ph: +1 (425)-777-5570

RE: Improving dev experience for REEF .NET

Posted by Mariia Mykhailova <ma...@microsoft.com>.
A couple of comments:

>> b. Documentation for new features. What should be the process?
> However, that documentation is currently largely in vain, as we haven't found a way to generate a nice website from it like we do on the Java side.

REEF-113, easy to do if we're willing to use external tools for site generation. I've just tried to configure Doxygen (http://www.stack.nl/~dimitri/doxygen/, free open-source), and it produces a very similar site with very little configuration effort. I have several questions about appearance of the website, will ask in JIRA. 

>> 3. Build flakiness 4. Test coverage

Yes, we need to put more emphasis on test quality.
Right now we have two ongoing issues with existing tests, REEF-1079 which I'm looking at, and REEF-1040 which is pretty much abandoned.

-Mariia


-----Original Message-----
From: Markus Weimer [mailto:markus@weimo.de] 
Sent: Thursday, January 14, 2016 6:13 PM
To: dev@reef.apache.org
Subject: Re: Improving dev experience for REEF .NET

thanks for putting these together!


On 2016-01-14 17:17, Anupam wrote:
> 1. Stable APIs a. Tagging new API which could change frequently b.
> API spread too thin across assemblies c. Accidentally public APIs and 
> their deprecation

We are acting on those, right? Once REEF-1087 is completed, we should be in much better shape in terms of making only the real APIs `public`.
That in turn should enable us to act on the assembly structure.

> b. Documentation for new features. What should be the process?

There are two levels of documentation: API docs and prose on that what and why of a new feature. I think we should absolutely have API docs for every `public` interface, class and method. However, that documentation is currently largely in vain, as we haven't found a way to generate a nice website from it like we do on the Java side.

Regarding tutorials and such, we are doing badly. Some work has been done, but this clearly can be improved. Not sure how to make the time for that, though. Everybody agrees that we *should* do something, but feature work seems to get in the way of that pretty reliably.

> d. Example code should be exemplary 3. Build flakiness 4. Test 
> coverage

+1 on each of them.

All of these are "easy" fixes. We know how this should be done, we just haven't.

> 5. Pull request process

This is the thorny one. Like most Big Data projects in Apache, we've decided on a review-then-commit (RTC) workflow, which means every contribution needs to somehow motivate a committer to review it. 
Further, we somehow need to agree amongst the committers on review standards. Both of these `somehow` statements are extremely tricky in practice.

> b. (Subjective) Lack of process for unstable APIs along with PR 
> turn-around time (sometimes) makes me wary of splitting work into 
> small PRs.

What should we do beyond annotating that code with `[Unstable]`?

PR turn-around time will always be dependent on the load on the committers, unless we decide to ditch RTC for CTR. If we do that, the work of the release manager becomes more substantial: They'd have to orchestrate several rounds of cleanups before the release can be cut. 
Other projects do that, so it isn't out of the question.

> Make a separate project named something like 
> Org.Apache.REEF.SDK/Org.Apache.REEF.APIs which has all the supported  
> stable and public APIs. [...] Consumption: Split NuGets into SDK and  
> Runtime (absolute minimum to run a REEF app) packages. IMRU and such  
> applications can have their own NuGets.

+1, with one change: We need one each for REEF, Wake, Tang.

One issue I see with this is that we either have to rename all types again or have to have projects where the DLL, NuGet and default namespace all have different names.

> Build flakiness: Use IDE features as much as possible. Minimize 
> manually changing build files as much as possible. Clean up 
> build.props to not have Targets but just have aliases. Remove hidden  
> common operations done via build.props across multiple project builds, 
> this makes us susceptible to build concurrency issues.

+1, this is a bug and really shouldn't need a list discussion to drive
consensus. Feel free to file JIRAs.

> Test coverage: Mandating addition of unittests for all new code added. 
> Add E2E tests, run them on localruntime as well as one box setup of 
> YARN (Mesos?). We need to add stress tests.

[REEF-111] captures the infrastructure aspects of this. I believe Mariia is looking into it. Regarding the mandate for new tests: This will be tricky to enforce, as we have plenty of "trivial" code changes flying by. Asking for tests in addition to those would likely slow us down. For new features, I am all with you. Which brings us back to the question of how to set common expectations for PR reviews.

> Pull request: We need to come up with PR guidelines

Please have a look at our contributors guide and improve as you see fit:

https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcwiki.apache.org%2fconfluence%2fdisplay%2fREEF%2fContributing&data=01%7c01%7cmamykhai%40microsoft.com%7c511e60c6254e4edf2b9908d31d51680f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=e4UKAdvf4%2fWoRe4Xagl2DXeAkuf24L2YRKf2K%2fKyqkE%3d

So far, the coding guidelines are a straight copy from CoreFx, with the added emphasis on immutability in REEF.

Markus

Re: Improving dev experience for REEF .NET

Posted by Byung-Gon Chun <bg...@gmail.com>.
Thanks for the note!

I've been happy with the Java setup, although the Java side needs to take
care of some of the issues raised.



On Fri, Jan 15, 2016 at 11:12 AM, Markus Weimer <ma...@weimo.de> wrote:

> thanks for putting these together!
>
>
> On 2016-01-14 17:17, Anupam wrote:
>
>> 1. Stable APIs a. Tagging new API which could change frequently b.
>> API spread too thin across assemblies c. Accidentally public APIs and
>> their deprecation
>>
>
> We are acting on those, right? Once REEF-1087 is completed, we should be
> in much better shape in terms of making only the real APIs `public`.
> That in turn should enable us to act on the assembly structure.
>
> b. Documentation for new features. What should be the process?
>>
>
> There are two levels of documentation: API docs and prose on that what
> and why of a new feature. I think we should absolutely have API docs for
> every `public` interface, class and method. However, that documentation
> is currently largely in vain, as we haven't found a way to generate a
> nice website from it like we do on the Java side.
>
> Regarding tutorials and such, we are doing badly. Some work has been
> done, but this clearly can be improved. Not sure how to make the time
> for that, though. Everybody agrees that we *should* do something, but
> feature work seems to get in the way of that pretty reliably.
>
> d. Example code should be exemplary 3. Build flakiness 4. Test
>> coverage
>>
>
> +1 on each of them.
>
> All of these are "easy" fixes. We know how this should be done,
> we just haven't.
>
> 5. Pull request process
>>
>
> This is the thorny one. Like most Big Data projects in Apache, we've
> decided on a review-then-commit (RTC) workflow, which means every
> contribution needs to somehow motivate a committer to review it. Further,
> we somehow need to agree amongst the committers on review standards. Both
> of these `somehow` statements are extremely tricky in practice.
>
> b. (Subjective) Lack of process for unstable APIs along with PR
>> turn-around time (sometimes) makes me wary of splitting work into
>> small PRs.
>>
>
> What should we do beyond annotating that code with `[Unstable]`?
>
> PR turn-around time will always be dependent on the load on the
> committers, unless we decide to ditch RTC for CTR. If we do that, the work
> of the release manager becomes more substantial: They'd have to orchestrate
> several rounds of cleanups before the release can be cut. Other projects do
> that, so it isn't out of the question.
>
> Make a separate project named something like
>> Org.Apache.REEF.SDK/Org.Apache.REEF.APIs which has all the supported
>>  stable and public APIs. [...] Consumption: Split NuGets into SDK and
>>  Runtime (absolute minimum to run a REEF app) packages. IMRU and such
>>  applications can have their own NuGets.
>>
>
> +1, with one change: We need one each for REEF, Wake, Tang.
>
> One issue I see with this is that we either have to rename all types again
> or have to have projects where the DLL, NuGet and default namespace all
> have different names.
>
> Build flakiness: Use IDE features as much as possible. Minimize
>> manually changing build files as much as possible. Clean up
>> build.props to not have Targets but just have aliases. Remove hidden
>>  common operations done via build.props across multiple project
>> builds, this makes us susceptible to build concurrency issues.
>>
>
> +1, this is a bug and really shouldn't need a list discussion to drive
> consensus. Feel free to file JIRAs.
>
> Test coverage: Mandating addition of unittests for all new code
>> added. Add E2E tests, run them on localruntime as well as one box
>> setup of YARN (Mesos?). We need to add stress tests.
>>
>
> [REEF-111] captures the infrastructure aspects of this. I believe Mariia
> is looking into it. Regarding the mandate for new tests: This will be
> tricky to enforce, as we have plenty of "trivial" code changes flying
> by. Asking for tests in addition to those would likely slow us down. For
> new features, I am all with you. Which brings us back to the question of
> how to set common expectations for PR reviews.
>
> Pull request: We need to come up with PR guidelines
>>
>
> Please have a look at our contributors guide and improve as you see fit:
>
> https://cwiki.apache.org/confluence/display/REEF/Contributing
>
> So far, the coding guidelines are a straight copy from CoreFx, with the
> added emphasis on immutability in REEF.
>
> Markus
>



-- 
Byung-Gon Chun

Re: Improving dev experience for REEF .NET

Posted by Markus Weimer <ma...@weimo.de>.
thanks for putting these together!


On 2016-01-14 17:17, Anupam wrote:
> 1. Stable APIs a. Tagging new API which could change frequently b.
> API spread too thin across assemblies c. Accidentally public APIs and
> their deprecation

We are acting on those, right? Once REEF-1087 is completed, we should be
in much better shape in terms of making only the real APIs `public`.
That in turn should enable us to act on the assembly structure.

> b. Documentation for new features. What should be the process?

There are two levels of documentation: API docs and prose on that what
and why of a new feature. I think we should absolutely have API docs for
every `public` interface, class and method. However, that documentation
is currently largely in vain, as we haven't found a way to generate a
nice website from it like we do on the Java side.

Regarding tutorials and such, we are doing badly. Some work has been
done, but this clearly can be improved. Not sure how to make the time
for that, though. Everybody agrees that we *should* do something, but
feature work seems to get in the way of that pretty reliably.

> d. Example code should be exemplary 3. Build flakiness 4. Test
> coverage

+1 on each of them.

All of these are "easy" fixes. We know how this should be done,
we just haven't.

> 5. Pull request process

This is the thorny one. Like most Big Data projects in Apache, we've
decided on a review-then-commit (RTC) workflow, which means every
contribution needs to somehow motivate a committer to review it. 
Further, we somehow need to agree amongst the committers on review 
standards. Both of these `somehow` statements are extremely tricky in 
practice.

> b. (Subjective) Lack of process for unstable APIs along with PR
> turn-around time (sometimes) makes me wary of splitting work into
> small PRs.

What should we do beyond annotating that code with `[Unstable]`?

PR turn-around time will always be dependent on the load on the 
committers, unless we decide to ditch RTC for CTR. If we do that, the 
work of the release manager becomes more substantial: They'd have to 
orchestrate several rounds of cleanups before the release can be cut. 
Other projects do that, so it isn't out of the question.

> Make a separate project named something like
> Org.Apache.REEF.SDK/Org.Apache.REEF.APIs which has all the supported
>  stable and public APIs. [...] Consumption: Split NuGets into SDK and
>  Runtime (absolute minimum to run a REEF app) packages. IMRU and such
>  applications can have their own NuGets.

+1, with one change: We need one each for REEF, Wake, Tang.

One issue I see with this is that we either have to rename all types 
again or have to have projects where the DLL, NuGet and default 
namespace all have different names.

> Build flakiness: Use IDE features as much as possible. Minimize
> manually changing build files as much as possible. Clean up
> build.props to not have Targets but just have aliases. Remove hidden
>  common operations done via build.props across multiple project
> builds, this makes us susceptible to build concurrency issues.

+1, this is a bug and really shouldn't need a list discussion to drive
consensus. Feel free to file JIRAs.

> Test coverage: Mandating addition of unittests for all new code
> added. Add E2E tests, run them on localruntime as well as one box
> setup of YARN (Mesos?). We need to add stress tests.

[REEF-111] captures the infrastructure aspects of this. I believe Mariia
is looking into it. Regarding the mandate for new tests: This will be
tricky to enforce, as we have plenty of "trivial" code changes flying
by. Asking for tests in addition to those would likely slow us down. For
new features, I am all with you. Which brings us back to the question of
how to set common expectations for PR reviews.

> Pull request: We need to come up with PR guidelines

Please have a look at our contributors guide and improve as you see fit:

https://cwiki.apache.org/confluence/display/REEF/Contributing

So far, the coding guidelines are a straight copy from CoreFx, with the
added emphasis on immutability in REEF.

Markus