You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by István Fehérvári <go...@gmail.com> on 2019/01/03 18:21:45 UTC

Running CI for doc changes

Hello developers,

I recently opened a PR about a very small documentation change (
https://github.com/apache/incubator-mxnet/pull/13766) and I see it
triggered the whole CI pipeline which is as far as I know very expensive
and obviously unnecessary.

Are there any efforts to script this behavior in a way to avoid triggering
CI when there is no code change in the PR? I am happy to help if there is
interest.

Best,
Istvan

Re: Running CI for doc changes

Posted by Aaron Markham <aa...@gmail.com>.
Hi Istvan,
You can make a page here:
https://cwiki.apache.org/confluence/display/MXNET/Website+Update+Proposals
If you don't have an account yet you can sign up, and then we can check on
edit access.
Also if you're not on slack yet, please join and we can also chat there.

With regard to the files, for .md generating code, are you referring to the
tutorials?
While probably overlooking something, I think the first pass on this would
be to check for changes to /docs/* then limit the CI process to the docs
pipeline. There's a tutorial validation pipeline that could be another
check - if something in /docs/tutorials/* gets updated then trigger that
pipeline...

I tried this API call to get a list of commits:
https://api.github.com/repos/apache/incubator-mxnet/pulls/13769/commits
But I was rate limited. So unless there's another way, the solution will
require an API key and management of the API calls. This might exist
somewhere already since we have the label bot.

Then this could be tried to get a list of files...
https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
Then grep that list...
Then trigger pipelines accordingly...

Cheers,
Aaron


On Fri, Jan 4, 2019 at 6:42 AM István Fehérvári <go...@gmail.com> wrote:

> I sorta got a simple skipping logic working however there seems to be a
> problem I cannot solve. In order to figure out whether an effecting change
> has happened I need a list of all commits in the PR and all the files in
> it. Unfortunately, currentBuild.changeSets returns data only about the head
> commit of the PR since jenkins assembles the build that way. It is even
> noted in the log after the merge of the master branch (First time build.
> Skipping changelog.)
>
> Does anyone have any idea how to retrieve all the commits in the PR?
>
> Also I naively thought that .md files can be skipped for the non-website
> pipelines however I was told that some of those files are indeed used for
> generating code. So for an effective strategy we need to understand what
> can be skipped in what cases: Marco mentioned language changes (R, scala)
> which I can add, any other ideas about strategies? Also I would like to
> document the strategies before coding them into a wiki of some sort, but
> not sure where I am supposed to do that (write access?).
>
> Thanks, for the help!
>
> Best,
> Istvan
>
> On Thu, Jan 3, 2019 at 2:18 PM István Fehérvári <go...@gmail.com> wrote:
>
> > Hello Marco,
> >
> > Your idea is very much in line how I was imaging it. I will try to come
> up
> > with a prototype around this idea then we can continue the discussion
> about
> > specifics.
> >
> > Best,
> > Istvan
> >
> > On Thu, Jan 3, 2019 at 11:43 AM Marco de Abreu <ma...@gmail.com>
> > wrote:
> >
> >> Hello István,
> >>
> >> thanks for your interest and for offering your assistance! This is
> >> definitely a great idea and I think it has been mentioned a few times,
> but
> >> nobody jumped on it yet. We would be very happy to assist you on these
> >> efforts.
> >>
> >> As far as I can tell, it would boil down to having some kind of custom
> >> groovy function that we could call within our Jenkinsfile pipelines.
> This
> >> function would then determine whether that specific job/node should run
> or
> >> not. This could have two granularity levels:
> >> 1. Run or skip whole Jenkins job
> >> 2. Run or skip Jenkins node
> >>
> >> To clarify the terminology: Each entry (e.g. windows-cpu) at [1]
> (sourced
> >> from [2]) is a Jenkins job. Each Jenkins job can contain multiple stages
> >> (irrelevant here) and each stage contains one or more nodes. One green
> >> circle here [3] (e.g. Python 3: CPU Win) represents a node.
> >>
> >> #1 would be your example use case. In case of a doc change, we would
> skip
> >> our unit test pipelines while the website pipeline would still be
> >> executed.
> >> #2 would be language specific changes, for example. Imagine a change to
> >> the
> >> Scala code. This would require all Scala and Clojure jobs to re-run, but
> >> there would be no need to run R, Julia or Python, for example.
> >>
> >> One thought how to do this would be to define a mapping that would
> contain
> >> the "watched" directories for a particular Jenkins job or node. Before a
> >> job or node is triggered, it could then evaluate the previously
> mentioned
> >> function to determine whether any file within that mapping has changed.
> >>
> >> If you have any further questions or would like to discuss a design
> >> proposal, please don't hesitate to reach out to us again :)
> >>
> >> Best regards,
> >> Marco
> >>
> >> [1]: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/
> >> [2]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins
> >> [3]:
> >>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/150/
> >>
> >> On Thu, Jan 3, 2019 at 7:22 PM István Fehérvári <go...@gmail.com>
> wrote:
> >>
> >> > Hello developers,
> >> >
> >> > I recently opened a PR about a very small documentation change (
> >> > https://github.com/apache/incubator-mxnet/pull/13766) and I see it
> >> > triggered the whole CI pipeline which is as far as I know very
> expensive
> >> > and obviously unnecessary.
> >> >
> >> > Are there any efforts to script this behavior in a way to avoid
> >> triggering
> >> > CI when there is no code change in the PR? I am happy to help if there
> >> is
> >> > interest.
> >> >
> >> > Best,
> >> > Istvan
> >> >
> >>
> >
>

Re: Running CI for doc changes

Posted by István Fehérvári <go...@gmail.com>.
I sorta got a simple skipping logic working however there seems to be a
problem I cannot solve. In order to figure out whether an effecting change
has happened I need a list of all commits in the PR and all the files in
it. Unfortunately, currentBuild.changeSets returns data only about the head
commit of the PR since jenkins assembles the build that way. It is even
noted in the log after the merge of the master branch (First time build.
Skipping changelog.)

Does anyone have any idea how to retrieve all the commits in the PR?

Also I naively thought that .md files can be skipped for the non-website
pipelines however I was told that some of those files are indeed used for
generating code. So for an effective strategy we need to understand what
can be skipped in what cases: Marco mentioned language changes (R, scala)
which I can add, any other ideas about strategies? Also I would like to
document the strategies before coding them into a wiki of some sort, but
not sure where I am supposed to do that (write access?).

Thanks, for the help!

Best,
Istvan

On Thu, Jan 3, 2019 at 2:18 PM István Fehérvári <go...@gmail.com> wrote:

> Hello Marco,
>
> Your idea is very much in line how I was imaging it. I will try to come up
> with a prototype around this idea then we can continue the discussion about
> specifics.
>
> Best,
> Istvan
>
> On Thu, Jan 3, 2019 at 11:43 AM Marco de Abreu <ma...@gmail.com>
> wrote:
>
>> Hello István,
>>
>> thanks for your interest and for offering your assistance! This is
>> definitely a great idea and I think it has been mentioned a few times, but
>> nobody jumped on it yet. We would be very happy to assist you on these
>> efforts.
>>
>> As far as I can tell, it would boil down to having some kind of custom
>> groovy function that we could call within our Jenkinsfile pipelines. This
>> function would then determine whether that specific job/node should run or
>> not. This could have two granularity levels:
>> 1. Run or skip whole Jenkins job
>> 2. Run or skip Jenkins node
>>
>> To clarify the terminology: Each entry (e.g. windows-cpu) at [1] (sourced
>> from [2]) is a Jenkins job. Each Jenkins job can contain multiple stages
>> (irrelevant here) and each stage contains one or more nodes. One green
>> circle here [3] (e.g. Python 3: CPU Win) represents a node.
>>
>> #1 would be your example use case. In case of a doc change, we would skip
>> our unit test pipelines while the website pipeline would still be
>> executed.
>> #2 would be language specific changes, for example. Imagine a change to
>> the
>> Scala code. This would require all Scala and Clojure jobs to re-run, but
>> there would be no need to run R, Julia or Python, for example.
>>
>> One thought how to do this would be to define a mapping that would contain
>> the "watched" directories for a particular Jenkins job or node. Before a
>> job or node is triggered, it could then evaluate the previously mentioned
>> function to determine whether any file within that mapping has changed.
>>
>> If you have any further questions or would like to discuss a design
>> proposal, please don't hesitate to reach out to us again :)
>>
>> Best regards,
>> Marco
>>
>> [1]: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/
>> [2]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins
>> [3]:
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/150/
>>
>> On Thu, Jan 3, 2019 at 7:22 PM István Fehérvári <go...@gmail.com> wrote:
>>
>> > Hello developers,
>> >
>> > I recently opened a PR about a very small documentation change (
>> > https://github.com/apache/incubator-mxnet/pull/13766) and I see it
>> > triggered the whole CI pipeline which is as far as I know very expensive
>> > and obviously unnecessary.
>> >
>> > Are there any efforts to script this behavior in a way to avoid
>> triggering
>> > CI when there is no code change in the PR? I am happy to help if there
>> is
>> > interest.
>> >
>> > Best,
>> > Istvan
>> >
>>
>

Re: Running CI for doc changes

Posted by István Fehérvári <go...@gmail.com>.
Hello Marco,

Your idea is very much in line how I was imaging it. I will try to come up
with a prototype around this idea then we can continue the discussion about
specifics.

Best,
Istvan

On Thu, Jan 3, 2019 at 11:43 AM Marco de Abreu <ma...@gmail.com>
wrote:

> Hello István,
>
> thanks for your interest and for offering your assistance! This is
> definitely a great idea and I think it has been mentioned a few times, but
> nobody jumped on it yet. We would be very happy to assist you on these
> efforts.
>
> As far as I can tell, it would boil down to having some kind of custom
> groovy function that we could call within our Jenkinsfile pipelines. This
> function would then determine whether that specific job/node should run or
> not. This could have two granularity levels:
> 1. Run or skip whole Jenkins job
> 2. Run or skip Jenkins node
>
> To clarify the terminology: Each entry (e.g. windows-cpu) at [1] (sourced
> from [2]) is a Jenkins job. Each Jenkins job can contain multiple stages
> (irrelevant here) and each stage contains one or more nodes. One green
> circle here [3] (e.g. Python 3: CPU Win) represents a node.
>
> #1 would be your example use case. In case of a doc change, we would skip
> our unit test pipelines while the website pipeline would still be executed.
> #2 would be language specific changes, for example. Imagine a change to the
> Scala code. This would require all Scala and Clojure jobs to re-run, but
> there would be no need to run R, Julia or Python, for example.
>
> One thought how to do this would be to define a mapping that would contain
> the "watched" directories for a particular Jenkins job or node. Before a
> job or node is triggered, it could then evaluate the previously mentioned
> function to determine whether any file within that mapping has changed.
>
> If you have any further questions or would like to discuss a design
> proposal, please don't hesitate to reach out to us again :)
>
> Best regards,
> Marco
>
> [1]: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/
> [2]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins
> [3]:
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/150/
>
> On Thu, Jan 3, 2019 at 7:22 PM István Fehérvári <go...@gmail.com> wrote:
>
> > Hello developers,
> >
> > I recently opened a PR about a very small documentation change (
> > https://github.com/apache/incubator-mxnet/pull/13766) and I see it
> > triggered the whole CI pipeline which is as far as I know very expensive
> > and obviously unnecessary.
> >
> > Are there any efforts to script this behavior in a way to avoid
> triggering
> > CI when there is no code change in the PR? I am happy to help if there is
> > interest.
> >
> > Best,
> > Istvan
> >
>

Re: Running CI for doc changes

Posted by Marco de Abreu <ma...@gmail.com>.
Hello István,

thanks for your interest and for offering your assistance! This is
definitely a great idea and I think it has been mentioned a few times, but
nobody jumped on it yet. We would be very happy to assist you on these
efforts.

As far as I can tell, it would boil down to having some kind of custom
groovy function that we could call within our Jenkinsfile pipelines. This
function would then determine whether that specific job/node should run or
not. This could have two granularity levels:
1. Run or skip whole Jenkins job
2. Run or skip Jenkins node

To clarify the terminology: Each entry (e.g. windows-cpu) at [1] (sourced
from [2]) is a Jenkins job. Each Jenkins job can contain multiple stages
(irrelevant here) and each stage contains one or more nodes. One green
circle here [3] (e.g. Python 3: CPU Win) represents a node.

#1 would be your example use case. In case of a doc change, we would skip
our unit test pipelines while the website pipeline would still be executed.
#2 would be language specific changes, for example. Imagine a change to the
Scala code. This would require all Scala and Clojure jobs to re-run, but
there would be no need to run R, Julia or Python, for example.

One thought how to do this would be to define a mapping that would contain
the "watched" directories for a particular Jenkins job or node. Before a
job or node is triggered, it could then evaluate the previously mentioned
function to determine whether any file within that mapping has changed.

If you have any further questions or would like to discuss a design
proposal, please don't hesitate to reach out to us again :)

Best regards,
Marco

[1]: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/
[2]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins
[3]:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/150/

On Thu, Jan 3, 2019 at 7:22 PM István Fehérvári <go...@gmail.com> wrote:

> Hello developers,
>
> I recently opened a PR about a very small documentation change (
> https://github.com/apache/incubator-mxnet/pull/13766) and I see it
> triggered the whole CI pipeline which is as far as I know very expensive
> and obviously unnecessary.
>
> Are there any efforts to script this behavior in a way to avoid triggering
> CI when there is no code change in the PR? I am happy to help if there is
> interest.
>
> Best,
> Istvan
>