You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "张铎 (Duo Zhang)" <pa...@gmail.com> on 2020/08/11 15:08:25 UTC

Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.

There is a known issue that our flaky dashboard is broken, due to this new
feature of jenkins

https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy

Josh is contacting the infra team to see if they can relax the policy but I
do not think it is easy as the policy is per site, not per job...

Anyway, there is a chrome plugin to temporarily disable CSP, so you can
view the correct flaky dashboard.

https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden

Thanks.

Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道:

> https://issues.apache.org/jira/browse/INFRA-20613
>
>
>
> > On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <pa...@gmail.com> wrote:
> >
> > This never worked in the past...
> >
> > But it would be great if you can kick the infra team to get this done :)
> >
> > File an infra issue?
> >
> > Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道:
> >
> >> You’re having the same issue with HBase Robot btw. At the end of console
> >> outputs:
> >>
> >> "Could not update commit status, please check if your scan credentials
> >> belong to a member of the organization or a collaborator of the
> repository
> >> and repo:status scope is selected”
> >>
> >> ...and shortly after that:
> >>
> >> "GitHub has been notified of this commit’s build result”
> >>
> >> Whatever does it mean.
> >>
> >> Andor
> >>
> >>
> >>
> >>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
> >>>
> >>> Yep, we’ve finally received it. It’s done.
> >>>
> >>> Current issue is that Jenkins is unable to set Github build status.
> I’ve
> >> added repo:status permission, but it’s also asking to be member of the
> >> project/organization and not sure how to do that.
> >>>
> >>> Andor
> >>>
> >>>
> >>>
> >>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
> >>>>
> >>>> Seems you have already made it?
> >>>>
> >>>> Usually there are several moderators for the private list, you need to
> >> ask
> >>>> them to let the GitHub registration go through.
> >>>>
> >>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道:
> >>>>
> >>>>> Thanks Duo, that’s very helpful.
> >>>>> I cannot set private@zookeeper as a verified e-mail address, because
> >> the
> >>>>> verification e-mail cannot be sent to the list. Isn’t that restricted
> >> for
> >>>>> members only (by default)?
> >>>>>
> >>>>> Andor
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Hi Andor,
> >>>>>>
> >>>>>> The Apache-HBase account is registered by me, using the
> private@hbase
> >>>>>> mailing list, so all the PMC members can maintain the password.
> >>>>>>
> >>>>>> I generated an access token and added it to our jenkins, so we can
> >> use it
> >>>>>> to post comments back to GitHub.
> >>>>>>
> >>>>>> I think you could do the same to register an Apache-ZooKeeper
> >> account? Or
> >>>>>> if you want  to use the hadoop-yetus account, you'd better ask the
> >> hadoop
> >>>>>> PMC members or Gavin to add the token to jenkins so you can use it.
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道:
> >>>>>>
> >>>>>>> Hi Duo,
> >>>>>>>
> >>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
> >>>>> unfortunately
> >>>>>>> haven’t got too much help on the Apache builds@ list so far, so
> I’m
> >>>>>>> rather asking you if you don’t mind.
> >>>>>>>
> >>>>>>> First, how have you set up the Hbase Github account that you use in
> >> this
> >>>>>>> job to access the repo?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Andor
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <pa...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
> >>>>>>>>
> >>>>>>>> I have disabled periodical scan for the old job on builds.a.o, as
> we
> >>>>>>> still
> >>>>>>>> need to view the pre commit result on it do not delete for now.
> Will
> >>>>>>> delete
> >>>>>>>> it later, maybe after several weeks.
> >>>>>>>>
> >>>>>>>> The new job is here
> >>>>>>>>
> >>>>>>>>
> >> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午9:44写道:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We successfully finished a nightly build.
> >>>>>>>>>
> >>>>>>>>> But seems the jiraComment did not work. I haven't seen the
> comment
> >>>>>>>>> on HBASE-24757...
> >>>>>>>>>
> >>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午4:51写道:
> >>>>>>>>>
> >>>>>>>>>> After installing two new jenkins plugins, the pre commit job
> seems
> >>>>> fine
> >>>>>>>>>> now.
> >>>>>>>>>>
> >>>>>>>>>> The last failure is because of a timeout, I assume the problem
> is
> >>>>> that
> >>>>>>> we
> >>>>>>>>>> do not have enough executors so all the jobs are executed
> >>>>> sequentially.
> >>>>>>>>>>
> >>>>>>>>>> Maybe we could move the pre commit job to the new env first? The
> >>>>>>> nightly
> >>>>>>>>>> job and flaky job require more resources, and we need the output
> >> of
> >>>>>>> these
> >>>>>>>>>> jenkins jobs(the flaky test list).
> >>>>>>>>>>
> >>>>>>>>>> Thanks.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月24日周五 下午4:36写道:
> >>>>>>>>>>
> >>>>>>>>>>> The problem seems because of this:
> >>>>>>>>>>>
> >>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
> >>>>>>>>>>>
> >>>>>>>>>>> I triggered the job again, it passed the timestamps call, and
> >> will
> >>>>>>> keep
> >>>>>>>>>>> an eye on it.
> >>>>>>>>>>>
> >>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月21日周二 上午11:18写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> On the sponsors, we could have a try.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The problem here is the process of the donation? IIRC there
> is a
> >>>>>>> thread
> >>>>>>>>>>>> on the infra mailing list about how to donate machines to a
> >>>>> specific
> >>>>>>>>>>>> project and the discussion did not go well...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午11:13写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> We could check with ASF infra for the current state of things
> >> wrt
> >>>>>>>>>>>>> GitHub
> >>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
> projects.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It has the same resource issue Travis had; things are fine
> >> until
> >>>>>>> some
> >>>>>>>>>>>>> critical mass of projects seeking better perf realize some
> new
> >>>>>>> option
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>> available and then quickly all available resources are
> >> consumed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> AFAICT the only option that gets us the same or better as the
> >> H*
> >>>>>>> nodes
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>> be finding sponsors and running our own.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
> >> palomino219@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should be
> >>>>>>>>>>>>> transferred as a
> >>>>>>>>>>>>>> whole? They depend on each other.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I offer my help on the transition.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And on github CI, does ASF have a special deal with github?
> If
> >>>>> not,
> >>>>>>>>>>>>> I do
> >>>>>>>>>>>>>> not think the default resource can fit our requirements...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午1:49写道:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi folks!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
> >> Infra's
> >>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
> >> currently
> >>>>>>>>>>>>> slated
> >>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and related
> >>>>>>>>>>>>> projects".
> >>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the H*
> >> worker
> >>>>>>>>>>>>> nodes
> >>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've been
> >>>>> running
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> for ages[2].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Migration discussion still happens on the
> >>>>> hadoop-migrations@i.a.o
> >>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of August
> >> 15th
> >>>>>>> for
> >>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up and
> >>>>> working
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
> clear
> >> to
> >>>>> me
> >>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth what
> we
> >>>>> get
> >>>>>>>>>>>>> out
> >>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I did a
> >> quick
> >>>>>>>>>>>>> test
> >>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run just
> >>>>> branch-2,
> >>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed with a
> >>>>> Jenkins
> >>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating will
> >> be a
> >>>>>>>>>>>>> slog.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As far as I can see our options are:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
> >> publication
> >>>>> in
> >>>>>>>>>>>>> mid
> >>>>>>>>>>>>>>> August.
> >>>>>>>>>>>>>>> * Transition website publication and nothing else (probably
> >> can
> >>>>> be
> >>>>>>>>>>>>>>> done in a day)
> >>>>>>>>>>>>>>> * Transition just precommit testing for various repos
> >> (probably
> >>>>>>>>>>>>> can be
> >>>>>>>>>>>>>>> done in a few days)
> >>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due to
> >>>>> nightly,
> >>>>>>>>>>>>>>> flaky stuff, etc)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The alternatives if we do not transition any given job to
> >>>>>>>>>>>>> ci-hadoop:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Try to move to GitHub Actions
> >>>>>>>>>>>>>>> * Try to move to Travis CI
> >>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
> >> (presumably
> >>>>>>> by
> >>>>>>>>>>>>>>> soliciting project specific donations for worker nodes on
> >> cloud
> >>>>>>>>>>>>>>> vendors)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It's important to remember that as a project we have a
> heavy
> >>>>>>>>>>>>> footprint
> >>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
> branch's
> >>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
> >>>>> builds.a.o
> >>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
> bringing
> >>>>> down
> >>>>>>>>>>>>> what
> >>>>>>>>>>>>>>> it takes to run the full test suite, but applying that work
> >> to
> >>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>> is itself a significant undertaking.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready to
> >> work
> >>>>>>>>>>>>> towards
> >>>>>>>>>>>>>>> any given approach?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
> >>>>>>>>>>>>>>> https://s.apache.org/fux1o
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> >> https://lists.apache.org/list.html?hadoop-migrations@infra.apache.org
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
> >> ci-hadoop
> >>>>>>>>>>>>>>> https://s.apache.org/7e1nq
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [6]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thank you! Will have a try.

Andor Molnar <an...@apache.org> 于2020年8月15日周六 下午2:12写道:

> Hi Duo,
>
> Infra finished setting up an official GitHub user for accessing and
> updating Pull Requests.
> Look for 'ASF Cloudbees Jenkins ci-hadoop’ credential
>
> Regards,
> Andor
>
>
>
> > On 2020. Aug 12., at 3:49, 张铎(Duo Zhang) <pa...@gmail.com> wrote:
> >
> > And one more thing is that, due to the limited resources(infra has not
> > migrated all the build nodes), I filtered out all the feature branches
> for
> > nightly and flaky jobs.
> >
> > Will add them back after infra has done all the migration.
> >
> > And I wonder whether we really need to test all the feature branches.
> >
> > Thanks.
> >
> > Nick Dimiduk <nd...@apache.org> 于2020年8月12日周三 上午2:02写道:
> >
> >> Thank you so much for taking on these migrations! I very much appreciate
> >> it!
> >>
> >> -n
> >>
> >> On Tue, Aug 11, 2020 at 8:08 AM 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >>> Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.
> >>>
> >>> There is a known issue that our flaky dashboard is broken, due to this
> >> new
> >>> feature of jenkins
> >>>
> >>>
> >>
> https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy
> >>>
> >>> Josh is contacting the infra team to see if they can relax the policy
> >> but I
> >>> do not think it is easy as the policy is per site, not per job...
> >>>
> >>> Anyway, there is a chrome plugin to temporarily disable CSP, so you can
> >>> view the correct flaky dashboard.
> >>>
> >>>
> >>>
> >>
> https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden
> >>>
> >>> Thanks.
> >>>
> >>> Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道:
> >>>
> >>>> https://issues.apache.org/jira/browse/INFRA-20613
> >>>>
> >>>>
> >>>>
> >>>>> On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <pa...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>> This never worked in the past...
> >>>>>
> >>>>> But it would be great if you can kick the infra team to get this done
> >>> :)
> >>>>>
> >>>>> File an infra issue?
> >>>>>
> >>>>> Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道:
> >>>>>
> >>>>>> You’re having the same issue with HBase Robot btw. At the end of
> >>> console
> >>>>>> outputs:
> >>>>>>
> >>>>>> "Could not update commit status, please check if your scan
> >> credentials
> >>>>>> belong to a member of the organization or a collaborator of the
> >>>> repository
> >>>>>> and repo:status scope is selected”
> >>>>>>
> >>>>>> ...and shortly after that:
> >>>>>>
> >>>>>> "GitHub has been notified of this commit’s build result”
> >>>>>>
> >>>>>> Whatever does it mean.
> >>>>>>
> >>>>>> Andor
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Yep, we’ve finally received it. It’s done.
> >>>>>>>
> >>>>>>> Current issue is that Jenkins is unable to set Github build status.
> >>>> I’ve
> >>>>>> added repo:status permission, but it’s also asking to be member of
> >> the
> >>>>>> project/organization and not sure how to do that.
> >>>>>>>
> >>>>>>> Andor
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <pa...@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> Seems you have already made it?
> >>>>>>>>
> >>>>>>>> Usually there are several moderators for the private list, you
> >> need
> >>> to
> >>>>>> ask
> >>>>>>>> them to let the GitHub registration go through.
> >>>>>>>>
> >>>>>>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道:
> >>>>>>>>
> >>>>>>>>> Thanks Duo, that’s very helpful.
> >>>>>>>>> I cannot set private@zookeeper as a verified e-mail address,
> >>> because
> >>>>>> the
> >>>>>>>>> verification e-mail cannot be sent to the list. Isn’t that
> >>> restricted
> >>>>>> for
> >>>>>>>>> members only (by default)?
> >>>>>>>>>
> >>>>>>>>> Andor
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <palomino219@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Andor,
> >>>>>>>>>>
> >>>>>>>>>> The Apache-HBase account is registered by me, using the
> >>>> private@hbase
> >>>>>>>>>> mailing list, so all the PMC members can maintain the password.
> >>>>>>>>>>
> >>>>>>>>>> I generated an access token and added it to our jenkins, so we
> >> can
> >>>>>> use it
> >>>>>>>>>> to post comments back to GitHub.
> >>>>>>>>>>
> >>>>>>>>>> I think you could do the same to register an Apache-ZooKeeper
> >>>>>> account? Or
> >>>>>>>>>> if you want  to use the hadoop-yetus account, you'd better ask
> >> the
> >>>>>> hadoop
> >>>>>>>>>> PMC members or Gavin to add the token to jenkins so you can use
> >>> it.
> >>>>>>>>>>
> >>>>>>>>>> Thanks.
> >>>>>>>>>>
> >>>>>>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Duo,
> >>>>>>>>>>>
> >>>>>>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
> >>>>>>>>> unfortunately
> >>>>>>>>>>> haven’t got too much help on the Apache builds@ list so far,
> >> so
> >>>> I’m
> >>>>>>>>>>> rather asking you if you don’t mind.
> >>>>>>>>>>>
> >>>>>>>>>>> First, how have you set up the Hbase Github account that you
> >> use
> >>> in
> >>>>>> this
> >>>>>>>>>>> job to access the repo?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Andor
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <
> >> palomino219@gmail.com
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have disabled periodical scan for the old job on builds.a.o,
> >>> as
> >>>> we
> >>>>>>>>>>> still
> >>>>>>>>>>>> need to view the pre commit result on it do not delete for
> >> now.
> >>>> Will
> >>>>>>>>>>> delete
> >>>>>>>>>>>> it later, maybe after several weeks.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The new job is here
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>
> >> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午9:44写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We successfully finished a nightly build.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But seems the jiraComment did not work. I haven't seen the
> >>>> comment
> >>>>>>>>>>>>> on HBASE-24757...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六
> >> 下午4:51写道:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> After installing two new jenkins plugins, the pre commit job
> >>>> seems
> >>>>>>>>> fine
> >>>>>>>>>>>>>> now.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The last failure is because of a timeout, I assume the
> >> problem
> >>>> is
> >>>>>>>>> that
> >>>>>>>>>>> we
> >>>>>>>>>>>>>> do not have enough executors so all the jobs are executed
> >>>>>>>>> sequentially.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe we could move the pre commit job to the new env first?
> >>> The
> >>>>>>>>>>> nightly
> >>>>>>>>>>>>>> job and flaky job require more resources, and we need the
> >>> output
> >>>>>> of
> >>>>>>>>>>> these
> >>>>>>>>>>>>>> jenkins jobs(the flaky test list).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月24日周五
> >> 下午4:36写道:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The problem seems because of this:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I triggered the job again, it passed the timestamps call,
> >> and
> >>>>>> will
> >>>>>>>>>>> keep
> >>>>>>>>>>>>>>> an eye on it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月21日周二
> >>> 上午11:18写道:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On the sponsors, we could have a try.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The problem here is the process of the donation? IIRC
> >> there
> >>>> is a
> >>>>>>>>>>> thread
> >>>>>>>>>>>>>>>> on the infra mailing list about how to donate machines to
> >> a
> >>>>>>>>> specific
> >>>>>>>>>>>>>>>> project and the discussion did not go well...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午11:13写道:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We could check with ASF infra for the current state of
> >>> things
> >>>>>> wrt
> >>>>>>>>>>>>>>>>> GitHub
> >>>>>>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
> >>>> projects.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> It has the same resource issue Travis had; things are
> >> fine
> >>>>>> until
> >>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> critical mass of projects seeking better perf realize
> >> some
> >>>> new
> >>>>>>>>>>> option
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> available and then quickly all available resources are
> >>>>>> consumed.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> AFAICT the only option that gets us the same or better as
> >>> the
> >>>>>> H*
> >>>>>>>>>>> nodes
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> be finding sponsors and running our own.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
> >>>>>> palomino219@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should
> >> be
> >>>>>>>>>>>>>>>>> transferred as a
> >>>>>>>>>>>>>>>>>> whole? They depend on each other.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I offer my help on the transition.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> And on github CI, does ASF have a special deal with
> >>> github?
> >>>> If
> >>>>>>>>> not,
> >>>>>>>>>>>>>>>>> I do
> >>>>>>>>>>>>>>>>>> not think the default resource can fit our
> >> requirements...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午1:49写道:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi folks!
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
> >>>>>> Infra's
> >>>>>>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
> >>>>>> currently
> >>>>>>>>>>>>>>>>> slated
> >>>>>>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and
> >> related
> >>>>>>>>>>>>>>>>> projects".
> >>>>>>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the
> >> H*
> >>>>>> worker
> >>>>>>>>>>>>>>>>> nodes
> >>>>>>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've
> >>> been
> >>>>>>>>> running
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> for ages[2].
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Migration discussion still happens on the
> >>>>>>>>> hadoop-migrations@i.a.o
> >>>>>>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of
> >>> August
> >>>>>> 15th
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up
> >>> and
> >>>>>>>>> working
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
> >>>> clear
> >>>>>> to
> >>>>>>>>> me
> >>>>>>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth
> >>> what
> >>>> we
> >>>>>>>>> get
> >>>>>>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I
> >> did
> >>> a
> >>>>>> quick
> >>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run
> >> just
> >>>>>>>>> branch-2,
> >>>>>>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed
> >> with
> >>> a
> >>>>>>>>> Jenkins
> >>>>>>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating
> >>> will
> >>>>>> be a
> >>>>>>>>>>>>>>>>> slog.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> As far as I can see our options are:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
> >>>>>> publication
> >>>>>>>>> in
> >>>>>>>>>>>>>>>>> mid
> >>>>>>>>>>>>>>>>>>> August.
> >>>>>>>>>>>>>>>>>>> * Transition website publication and nothing else
> >>> (probably
> >>>>>> can
> >>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> done in a day)
> >>>>>>>>>>>>>>>>>>> * Transition just precommit testing for various repos
> >>>>>> (probably
> >>>>>>>>>>>>>>>>> can be
> >>>>>>>>>>>>>>>>>>> done in a few days)
> >>>>>>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due
> >> to
> >>>>>>>>> nightly,
> >>>>>>>>>>>>>>>>>>> flaky stuff, etc)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The alternatives if we do not transition any given job
> >> to
> >>>>>>>>>>>>>>>>> ci-hadoop:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> * Try to move to GitHub Actions
> >>>>>>>>>>>>>>>>>>> * Try to move to Travis CI
> >>>>>>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
> >>>>>> (presumably
> >>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>> soliciting project specific donations for worker nodes
> >> on
> >>>>>> cloud
> >>>>>>>>>>>>>>>>>>> vendors)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It's important to remember that as a project we have a
> >>>> heavy
> >>>>>>>>>>>>>>>>> footprint
> >>>>>>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
> >>>> branch's
> >>>>>>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
> >>>>>>>>> builds.a.o
> >>>>>>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
> >>>> bringing
> >>>>>>>>> down
> >>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>> it takes to run the full test suite, but applying that
> >>> work
> >>>>>> to
> >>>>>>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>>>>>> is itself a significant undertaking.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready
> >> to
> >>>>>> work
> >>>>>>>>>>>>>>>>> towards
> >>>>>>>>>>>>>>>>>>> any given approach?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
> >>>>>>>>>>>>>>>>>>> https://s.apache.org/fux1o
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>
> >> https://lists.apache.org/list.html?hadoop-migrations@infra.apache.org
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
> >>>>>> ci-hadoop
> >>>>>>>>>>>>>>>>>>> https://s.apache.org/7e1nq
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [6]
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Posted by Andor Molnar <an...@apache.org>.
Hi Duo,

Infra finished setting up an official GitHub user for accessing and updating Pull Requests.
Look for 'ASF Cloudbees Jenkins ci-hadoop’ credential

Regards,
Andor
 


> On 2020. Aug 12., at 3:49, 张铎(Duo Zhang) <pa...@gmail.com> wrote:
> 
> And one more thing is that, due to the limited resources(infra has not
> migrated all the build nodes), I filtered out all the feature branches for
> nightly and flaky jobs.
> 
> Will add them back after infra has done all the migration.
> 
> And I wonder whether we really need to test all the feature branches.
> 
> Thanks.
> 
> Nick Dimiduk <nd...@apache.org> 于2020年8月12日周三 上午2:02写道:
> 
>> Thank you so much for taking on these migrations! I very much appreciate
>> it!
>> 
>> -n
>> 
>> On Tue, Aug 11, 2020 at 8:08 AM 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>> 
>>> Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.
>>> 
>>> There is a known issue that our flaky dashboard is broken, due to this
>> new
>>> feature of jenkins
>>> 
>>> 
>> https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy
>>> 
>>> Josh is contacting the infra team to see if they can relax the policy
>> but I
>>> do not think it is easy as the policy is per site, not per job...
>>> 
>>> Anyway, there is a chrome plugin to temporarily disable CSP, so you can
>>> view the correct flaky dashboard.
>>> 
>>> 
>>> 
>> https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden
>>> 
>>> Thanks.
>>> 
>>> Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道:
>>> 
>>>> https://issues.apache.org/jira/browse/INFRA-20613
>>>> 
>>>> 
>>>> 
>>>>> On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <pa...@gmail.com>
>>> wrote:
>>>>> 
>>>>> This never worked in the past...
>>>>> 
>>>>> But it would be great if you can kick the infra team to get this done
>>> :)
>>>>> 
>>>>> File an infra issue?
>>>>> 
>>>>> Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道:
>>>>> 
>>>>>> You’re having the same issue with HBase Robot btw. At the end of
>>> console
>>>>>> outputs:
>>>>>> 
>>>>>> "Could not update commit status, please check if your scan
>> credentials
>>>>>> belong to a member of the organization or a collaborator of the
>>>> repository
>>>>>> and repo:status scope is selected”
>>>>>> 
>>>>>> ...and shortly after that:
>>>>>> 
>>>>>> "GitHub has been notified of this commit’s build result”
>>>>>> 
>>>>>> Whatever does it mean.
>>>>>> 
>>>>>> Andor
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
>>>>>>> 
>>>>>>> Yep, we’ve finally received it. It’s done.
>>>>>>> 
>>>>>>> Current issue is that Jenkins is unable to set Github build status.
>>>> I’ve
>>>>>> added repo:status permission, but it’s also asking to be member of
>> the
>>>>>> project/organization and not sure how to do that.
>>>>>>> 
>>>>>>> Andor
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <pa...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> Seems you have already made it?
>>>>>>>> 
>>>>>>>> Usually there are several moderators for the private list, you
>> need
>>> to
>>>>>> ask
>>>>>>>> them to let the GitHub registration go through.
>>>>>>>> 
>>>>>>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道:
>>>>>>>> 
>>>>>>>>> Thanks Duo, that’s very helpful.
>>>>>>>>> I cannot set private@zookeeper as a verified e-mail address,
>>> because
>>>>>> the
>>>>>>>>> verification e-mail cannot be sent to the list. Isn’t that
>>> restricted
>>>>>> for
>>>>>>>>> members only (by default)?
>>>>>>>>> 
>>>>>>>>> Andor
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <palomino219@gmail.com
>>> 
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Andor,
>>>>>>>>>> 
>>>>>>>>>> The Apache-HBase account is registered by me, using the
>>>> private@hbase
>>>>>>>>>> mailing list, so all the PMC members can maintain the password.
>>>>>>>>>> 
>>>>>>>>>> I generated an access token and added it to our jenkins, so we
>> can
>>>>>> use it
>>>>>>>>>> to post comments back to GitHub.
>>>>>>>>>> 
>>>>>>>>>> I think you could do the same to register an Apache-ZooKeeper
>>>>>> account? Or
>>>>>>>>>> if you want  to use the hadoop-yetus account, you'd better ask
>> the
>>>>>> hadoop
>>>>>>>>>> PMC members or Gavin to add the token to jenkins so you can use
>>> it.
>>>>>>>>>> 
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道:
>>>>>>>>>> 
>>>>>>>>>>> Hi Duo,
>>>>>>>>>>> 
>>>>>>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
>>>>>>>>> unfortunately
>>>>>>>>>>> haven’t got too much help on the Apache builds@ list so far,
>> so
>>>> I’m
>>>>>>>>>>> rather asking you if you don’t mind.
>>>>>>>>>>> 
>>>>>>>>>>> First, how have you set up the Hbase Github account that you
>> use
>>> in
>>>>>> this
>>>>>>>>>>> job to access the repo?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Andor
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <
>> palomino219@gmail.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have disabled periodical scan for the old job on builds.a.o,
>>> as
>>>> we
>>>>>>>>>>> still
>>>>>>>>>>>> need to view the pre commit result on it do not delete for
>> now.
>>>> Will
>>>>>>>>>>> delete
>>>>>>>>>>>> it later, maybe after several weeks.
>>>>>>>>>>>> 
>>>>>>>>>>>> The new job is here
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> 
>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午9:44写道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We successfully finished a nightly build.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But seems the jiraComment did not work. I haven't seen the
>>>> comment
>>>>>>>>>>>>> on HBASE-24757...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六
>> 下午4:51写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> After installing two new jenkins plugins, the pre commit job
>>>> seems
>>>>>>>>> fine
>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The last failure is because of a timeout, I assume the
>> problem
>>>> is
>>>>>>>>> that
>>>>>>>>>>> we
>>>>>>>>>>>>>> do not have enough executors so all the jobs are executed
>>>>>>>>> sequentially.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Maybe we could move the pre commit job to the new env first?
>>> The
>>>>>>>>>>> nightly
>>>>>>>>>>>>>> job and flaky job require more resources, and we need the
>>> output
>>>>>> of
>>>>>>>>>>> these
>>>>>>>>>>>>>> jenkins jobs(the flaky test list).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月24日周五
>> 下午4:36写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The problem seems because of this:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I triggered the job again, it passed the timestamps call,
>> and
>>>>>> will
>>>>>>>>>>> keep
>>>>>>>>>>>>>>> an eye on it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月21日周二
>>> 上午11:18写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On the sponsors, we could have a try.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The problem here is the process of the donation? IIRC
>> there
>>>> is a
>>>>>>>>>>> thread
>>>>>>>>>>>>>>>> on the infra mailing list about how to donate machines to
>> a
>>>>>>>>> specific
>>>>>>>>>>>>>>>> project and the discussion did not go well...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午11:13写道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We could check with ASF infra for the current state of
>>> things
>>>>>> wrt
>>>>>>>>>>>>>>>>> GitHub
>>>>>>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
>>>> projects.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> It has the same resource issue Travis had; things are
>> fine
>>>>>> until
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> critical mass of projects seeking better perf realize
>> some
>>>> new
>>>>>>>>>>> option
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> available and then quickly all available resources are
>>>>>> consumed.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> AFAICT the only option that gets us the same or better as
>>> the
>>>>>> H*
>>>>>>>>>>> nodes
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> be finding sponsors and running our own.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
>>>>>> palomino219@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should
>> be
>>>>>>>>>>>>>>>>> transferred as a
>>>>>>>>>>>>>>>>>> whole? They depend on each other.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I offer my help on the transition.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> And on github CI, does ASF have a special deal with
>>> github?
>>>> If
>>>>>>>>> not,
>>>>>>>>>>>>>>>>> I do
>>>>>>>>>>>>>>>>>> not think the default resource can fit our
>> requirements...
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午1:49写道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi folks!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
>>>>>> Infra's
>>>>>>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
>>>>>> currently
>>>>>>>>>>>>>>>>> slated
>>>>>>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and
>> related
>>>>>>>>>>>>>>>>> projects".
>>>>>>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the
>> H*
>>>>>> worker
>>>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've
>>> been
>>>>>>>>> running
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> for ages[2].
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Migration discussion still happens on the
>>>>>>>>> hadoop-migrations@i.a.o
>>>>>>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of
>>> August
>>>>>> 15th
>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up
>>> and
>>>>>>>>> working
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
>>>> clear
>>>>>> to
>>>>>>>>> me
>>>>>>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth
>>> what
>>>> we
>>>>>>>>> get
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I
>> did
>>> a
>>>>>> quick
>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run
>> just
>>>>>>>>> branch-2,
>>>>>>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed
>> with
>>> a
>>>>>>>>> Jenkins
>>>>>>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating
>>> will
>>>>>> be a
>>>>>>>>>>>>>>>>> slog.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> As far as I can see our options are:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
>>>>>> publication
>>>>>>>>> in
>>>>>>>>>>>>>>>>> mid
>>>>>>>>>>>>>>>>>>> August.
>>>>>>>>>>>>>>>>>>> * Transition website publication and nothing else
>>> (probably
>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> done in a day)
>>>>>>>>>>>>>>>>>>> * Transition just precommit testing for various repos
>>>>>> (probably
>>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>>>>>> done in a few days)
>>>>>>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due
>> to
>>>>>>>>> nightly,
>>>>>>>>>>>>>>>>>>> flaky stuff, etc)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The alternatives if we do not transition any given job
>> to
>>>>>>>>>>>>>>>>> ci-hadoop:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> * Try to move to GitHub Actions
>>>>>>>>>>>>>>>>>>> * Try to move to Travis CI
>>>>>>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
>>>>>> (presumably
>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>> soliciting project specific donations for worker nodes
>> on
>>>>>> cloud
>>>>>>>>>>>>>>>>>>> vendors)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> It's important to remember that as a project we have a
>>>> heavy
>>>>>>>>>>>>>>>>> footprint
>>>>>>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
>>>> branch's
>>>>>>>>>>>>>>>>> nightly
>>>>>>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
>>>>>>>>> builds.a.o
>>>>>>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
>>>> bringing
>>>>>>>>> down
>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>> it takes to run the full test suite, but applying that
>>> work
>>>>>> to
>>>>>>>>>>>>>>>>> nightly
>>>>>>>>>>>>>>>>>>> is itself a significant undertaking.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready
>> to
>>>>>> work
>>>>>>>>>>>>>>>>> towards
>>>>>>>>>>>>>>>>>>> any given approach?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
>>>>>>>>>>>>>>>>>>> https://s.apache.org/fux1o
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> 
>> https://lists.apache.org/list.html?hadoop-migrations@infra.apache.org
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
>>>>>> ci-hadoop
>>>>>>>>>>>>>>>>>>> https://s.apache.org/7e1nq
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [6]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
And one more thing is that, due to the limited resources(infra has not
migrated all the build nodes), I filtered out all the feature branches for
nightly and flaky jobs.

Will add them back after infra has done all the migration.

And I wonder whether we really need to test all the feature branches.

Thanks.

Nick Dimiduk <nd...@apache.org> 于2020年8月12日周三 上午2:02写道:

> Thank you so much for taking on these migrations! I very much appreciate
> it!
>
> -n
>
> On Tue, Aug 11, 2020 at 8:08 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.
> >
> > There is a known issue that our flaky dashboard is broken, due to this
> new
> > feature of jenkins
> >
> >
> https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy
> >
> > Josh is contacting the infra team to see if they can relax the policy
> but I
> > do not think it is easy as the policy is per site, not per job...
> >
> > Anyway, there is a chrome plugin to temporarily disable CSP, so you can
> > view the correct flaky dashboard.
> >
> >
> >
> https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden
> >
> > Thanks.
> >
> > Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道:
> >
> > > https://issues.apache.org/jira/browse/INFRA-20613
> > >
> > >
> > >
> > > > On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> > > >
> > > > This never worked in the past...
> > > >
> > > > But it would be great if you can kick the infra team to get this done
> > :)
> > > >
> > > > File an infra issue?
> > > >
> > > > Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道:
> > > >
> > > >> You’re having the same issue with HBase Robot btw. At the end of
> > console
> > > >> outputs:
> > > >>
> > > >> "Could not update commit status, please check if your scan
> credentials
> > > >> belong to a member of the organization or a collaborator of the
> > > repository
> > > >> and repo:status scope is selected”
> > > >>
> > > >> ...and shortly after that:
> > > >>
> > > >> "GitHub has been notified of this commit’s build result”
> > > >>
> > > >> Whatever does it mean.
> > > >>
> > > >> Andor
> > > >>
> > > >>
> > > >>
> > > >>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
> > > >>>
> > > >>> Yep, we’ve finally received it. It’s done.
> > > >>>
> > > >>> Current issue is that Jenkins is unable to set Github build status.
> > > I’ve
> > > >> added repo:status permission, but it’s also asking to be member of
> the
> > > >> project/organization and not sure how to do that.
> > > >>>
> > > >>> Andor
> > > >>>
> > > >>>
> > > >>>
> > > >>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > > >>>>
> > > >>>> Seems you have already made it?
> > > >>>>
> > > >>>> Usually there are several moderators for the private list, you
> need
> > to
> > > >> ask
> > > >>>> them to let the GitHub registration go through.
> > > >>>>
> > > >>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道:
> > > >>>>
> > > >>>>> Thanks Duo, that’s very helpful.
> > > >>>>> I cannot set private@zookeeper as a verified e-mail address,
> > because
> > > >> the
> > > >>>>> verification e-mail cannot be sent to the list. Isn’t that
> > restricted
> > > >> for
> > > >>>>> members only (by default)?
> > > >>>>>
> > > >>>>> Andor
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > > >> wrote:
> > > >>>>>>
> > > >>>>>> Hi Andor,
> > > >>>>>>
> > > >>>>>> The Apache-HBase account is registered by me, using the
> > > private@hbase
> > > >>>>>> mailing list, so all the PMC members can maintain the password.
> > > >>>>>>
> > > >>>>>> I generated an access token and added it to our jenkins, so we
> can
> > > >> use it
> > > >>>>>> to post comments back to GitHub.
> > > >>>>>>
> > > >>>>>> I think you could do the same to register an Apache-ZooKeeper
> > > >> account? Or
> > > >>>>>> if you want  to use the hadoop-yetus account, you'd better ask
> the
> > > >> hadoop
> > > >>>>>> PMC members or Gavin to add the token to jenkins so you can use
> > it.
> > > >>>>>>
> > > >>>>>> Thanks.
> > > >>>>>>
> > > >>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道:
> > > >>>>>>
> > > >>>>>>> Hi Duo,
> > > >>>>>>>
> > > >>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
> > > >>>>> unfortunately
> > > >>>>>>> haven’t got too much help on the Apache builds@ list so far,
> so
> > > I’m
> > > >>>>>>> rather asking you if you don’t mind.
> > > >>>>>>>
> > > >>>>>>> First, how have you set up the Hbase Github account that you
> use
> > in
> > > >> this
> > > >>>>>>> job to access the repo?
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Andor
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
> > > >>>>>>>>
> > > >>>>>>>> I have disabled periodical scan for the old job on builds.a.o,
> > as
> > > we
> > > >>>>>>> still
> > > >>>>>>>> need to view the pre commit result on it do not delete for
> now.
> > > Will
> > > >>>>>>> delete
> > > >>>>>>>> it later, maybe after several weeks.
> > > >>>>>>>>
> > > >>>>>>>> The new job is here
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
> > > >>>>>>>>
> > > >>>>>>>> Thanks.
> > > >>>>>>>>
> > > >>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午9:44写道:
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>
> > >
> >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> We successfully finished a nightly build.
> > > >>>>>>>>>
> > > >>>>>>>>> But seems the jiraComment did not work. I haven't seen the
> > > comment
> > > >>>>>>>>> on HBASE-24757...
> > > >>>>>>>>>
> > > >>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六
> 下午4:51写道:
> > > >>>>>>>>>
> > > >>>>>>>>>> After installing two new jenkins plugins, the pre commit job
> > > seems
> > > >>>>> fine
> > > >>>>>>>>>> now.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The last failure is because of a timeout, I assume the
> problem
> > > is
> > > >>>>> that
> > > >>>>>>> we
> > > >>>>>>>>>> do not have enough executors so all the jobs are executed
> > > >>>>> sequentially.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Maybe we could move the pre commit job to the new env first?
> > The
> > > >>>>>>> nightly
> > > >>>>>>>>>> job and flaky job require more resources, and we need the
> > output
> > > >> of
> > > >>>>>>> these
> > > >>>>>>>>>> jenkins jobs(the flaky test list).
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月24日周五
> 下午4:36写道:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> The problem seems because of this:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I triggered the job again, it passed the timestamps call,
> and
> > > >> will
> > > >>>>>>> keep
> > > >>>>>>>>>>> an eye on it.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月21日周二
> > 上午11:18写道:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> On the sponsors, we could have a try.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> The problem here is the process of the donation? IIRC
> there
> > > is a
> > > >>>>>>> thread
> > > >>>>>>>>>>>> on the infra mailing list about how to donate machines to
> a
> > > >>>>> specific
> > > >>>>>>>>>>>> project and the discussion did not go well...
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午11:13写道:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> We could check with ASF infra for the current state of
> > things
> > > >> wrt
> > > >>>>>>>>>>>>> GitHub
> > > >>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
> > > projects.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> It has the same resource issue Travis had; things are
> fine
> > > >> until
> > > >>>>>>> some
> > > >>>>>>>>>>>>> critical mass of projects seeking better perf realize
> some
> > > new
> > > >>>>>>> option
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>> available and then quickly all available resources are
> > > >> consumed.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> AFAICT the only option that gets us the same or better as
> > the
> > > >> H*
> > > >>>>>>> nodes
> > > >>>>>>>>>>>>> will
> > > >>>>>>>>>>>>> be finding sponsors and running our own.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
> > > >> palomino219@gmail.com>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should
> be
> > > >>>>>>>>>>>>> transferred as a
> > > >>>>>>>>>>>>>> whole? They depend on each other.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I offer my help on the transition.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> And on github CI, does ASF have a special deal with
> > github?
> > > If
> > > >>>>> not,
> > > >>>>>>>>>>>>> I do
> > > >>>>>>>>>>>>>> not think the default resource can fit our
> requirements...
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午1:49写道:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi folks!
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
> > > >> Infra's
> > > >>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
> > > >> currently
> > > >>>>>>>>>>>>> slated
> > > >>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and
> related
> > > >>>>>>>>>>>>> projects".
> > > >>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the
> H*
> > > >> worker
> > > >>>>>>>>>>>>> nodes
> > > >>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've
> > been
> > > >>>>> running
> > > >>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>> for ages[2].
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Migration discussion still happens on the
> > > >>>>> hadoop-migrations@i.a.o
> > > >>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of
> > August
> > > >> 15th
> > > >>>>>>> for
> > > >>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up
> > and
> > > >>>>> working
> > > >>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
> > > clear
> > > >> to
> > > >>>>> me
> > > >>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth
> > what
> > > we
> > > >>>>> get
> > > >>>>>>>>>>>>> out
> > > >>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I
> did
> > a
> > > >> quick
> > > >>>>>>>>>>>>> test
> > > >>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run
> just
> > > >>>>> branch-2,
> > > >>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed
> with
> > a
> > > >>>>> Jenkins
> > > >>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating
> > will
> > > >> be a
> > > >>>>>>>>>>>>> slog.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> As far as I can see our options are:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
> > > >> publication
> > > >>>>> in
> > > >>>>>>>>>>>>> mid
> > > >>>>>>>>>>>>>>> August.
> > > >>>>>>>>>>>>>>> * Transition website publication and nothing else
> > (probably
> > > >> can
> > > >>>>> be
> > > >>>>>>>>>>>>>>> done in a day)
> > > >>>>>>>>>>>>>>> * Transition just precommit testing for various repos
> > > >> (probably
> > > >>>>>>>>>>>>> can be
> > > >>>>>>>>>>>>>>> done in a few days)
> > > >>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due
> to
> > > >>>>> nightly,
> > > >>>>>>>>>>>>>>> flaky stuff, etc)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> The alternatives if we do not transition any given job
> to
> > > >>>>>>>>>>>>> ci-hadoop:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> * Try to move to GitHub Actions
> > > >>>>>>>>>>>>>>> * Try to move to Travis CI
> > > >>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
> > > >> (presumably
> > > >>>>>>> by
> > > >>>>>>>>>>>>>>> soliciting project specific donations for worker nodes
> on
> > > >> cloud
> > > >>>>>>>>>>>>>>> vendors)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> It's important to remember that as a project we have a
> > > heavy
> > > >>>>>>>>>>>>> footprint
> > > >>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
> > > branch's
> > > >>>>>>>>>>>>> nightly
> > > >>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
> > > >>>>> builds.a.o
> > > >>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
> > > bringing
> > > >>>>> down
> > > >>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>> it takes to run the full test suite, but applying that
> > work
> > > >> to
> > > >>>>>>>>>>>>> nightly
> > > >>>>>>>>>>>>>>> is itself a significant undertaking.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready
> to
> > > >> work
> > > >>>>>>>>>>>>> towards
> > > >>>>>>>>>>>>>>> any given approach?
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
> > > >>>>>>>>>>>>>>> https://s.apache.org/fux1o
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [3]
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>
> > > >>
> https://lists.apache.org/list.html?hadoop-migrations@infra.apache.org
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
> > > >> ci-hadoop
> > > >>>>>>>>>>>>>>> https://s.apache.org/7e1nq
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [6]
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>
> > >
> >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Posted by Nick Dimiduk <nd...@apache.org>.
Thank you so much for taking on these migrations! I very much appreciate it!

-n

On Tue, Aug 11, 2020 at 8:08 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.
>
> There is a known issue that our flaky dashboard is broken, due to this new
> feature of jenkins
>
> https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy
>
> Josh is contacting the infra team to see if they can relax the policy but I
> do not think it is easy as the policy is per site, not per job...
>
> Anyway, there is a chrome plugin to temporarily disable CSP, so you can
> view the correct flaky dashboard.
>
>
> https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden
>
> Thanks.
>
> Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道:
>
> > https://issues.apache.org/jira/browse/INFRA-20613
> >
> >
> >
> > > On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
> > >
> > > This never worked in the past...
> > >
> > > But it would be great if you can kick the infra team to get this done
> :)
> > >
> > > File an infra issue?
> > >
> > > Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道:
> > >
> > >> You’re having the same issue with HBase Robot btw. At the end of
> console
> > >> outputs:
> > >>
> > >> "Could not update commit status, please check if your scan credentials
> > >> belong to a member of the organization or a collaborator of the
> > repository
> > >> and repo:status scope is selected”
> > >>
> > >> ...and shortly after that:
> > >>
> > >> "GitHub has been notified of this commit’s build result”
> > >>
> > >> Whatever does it mean.
> > >>
> > >> Andor
> > >>
> > >>
> > >>
> > >>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
> > >>>
> > >>> Yep, we’ve finally received it. It’s done.
> > >>>
> > >>> Current issue is that Jenkins is unable to set Github build status.
> > I’ve
> > >> added repo:status permission, but it’s also asking to be member of the
> > >> project/organization and not sure how to do that.
> > >>>
> > >>> Andor
> > >>>
> > >>>
> > >>>
> > >>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> > >>>>
> > >>>> Seems you have already made it?
> > >>>>
> > >>>> Usually there are several moderators for the private list, you need
> to
> > >> ask
> > >>>> them to let the GitHub registration go through.
> > >>>>
> > >>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道:
> > >>>>
> > >>>>> Thanks Duo, that’s very helpful.
> > >>>>> I cannot set private@zookeeper as a verified e-mail address,
> because
> > >> the
> > >>>>> verification e-mail cannot be sent to the list. Isn’t that
> restricted
> > >> for
> > >>>>> members only (by default)?
> > >>>>>
> > >>>>> Andor
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <pa...@gmail.com>
> > >> wrote:
> > >>>>>>
> > >>>>>> Hi Andor,
> > >>>>>>
> > >>>>>> The Apache-HBase account is registered by me, using the
> > private@hbase
> > >>>>>> mailing list, so all the PMC members can maintain the password.
> > >>>>>>
> > >>>>>> I generated an access token and added it to our jenkins, so we can
> > >> use it
> > >>>>>> to post comments back to GitHub.
> > >>>>>>
> > >>>>>> I think you could do the same to register an Apache-ZooKeeper
> > >> account? Or
> > >>>>>> if you want  to use the hadoop-yetus account, you'd better ask the
> > >> hadoop
> > >>>>>> PMC members or Gavin to add the token to jenkins so you can use
> it.
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>>
> > >>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道:
> > >>>>>>
> > >>>>>>> Hi Duo,
> > >>>>>>>
> > >>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
> > >>>>> unfortunately
> > >>>>>>> haven’t got too much help on the Apache builds@ list so far, so
> > I’m
> > >>>>>>> rather asking you if you don’t mind.
> > >>>>>>>
> > >>>>>>> First, how have you set up the Hbase Github account that you use
> in
> > >> this
> > >>>>>>> job to access the repo?
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Andor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
> > >>>>>>>>
> > >>>>>>>> I have disabled periodical scan for the old job on builds.a.o,
> as
> > we
> > >>>>>>> still
> > >>>>>>>> need to view the pre commit result on it do not delete for now.
> > Will
> > >>>>>>> delete
> > >>>>>>>> it later, maybe after several weeks.
> > >>>>>>>>
> > >>>>>>>> The new job is here
> > >>>>>>>>
> > >>>>>>>>
> > >> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
> > >>>>>>>>
> > >>>>>>>> Thanks.
> > >>>>>>>>
> > >>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午9:44写道:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>
> >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> We successfully finished a nightly build.
> > >>>>>>>>>
> > >>>>>>>>> But seems the jiraComment did not work. I haven't seen the
> > comment
> > >>>>>>>>> on HBASE-24757...
> > >>>>>>>>>
> > >>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月25日周六 下午4:51写道:
> > >>>>>>>>>
> > >>>>>>>>>> After installing two new jenkins plugins, the pre commit job
> > seems
> > >>>>> fine
> > >>>>>>>>>> now.
> > >>>>>>>>>>
> > >>>>>>>>>> The last failure is because of a timeout, I assume the problem
> > is
> > >>>>> that
> > >>>>>>> we
> > >>>>>>>>>> do not have enough executors so all the jobs are executed
> > >>>>> sequentially.
> > >>>>>>>>>>
> > >>>>>>>>>> Maybe we could move the pre commit job to the new env first?
> The
> > >>>>>>> nightly
> > >>>>>>>>>> job and flaky job require more resources, and we need the
> output
> > >> of
> > >>>>>>> these
> > >>>>>>>>>> jenkins jobs(the flaky test list).
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月24日周五 下午4:36写道:
> > >>>>>>>>>>
> > >>>>>>>>>>> The problem seems because of this:
> > >>>>>>>>>>>
> > >>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
> > >>>>>>>>>>>
> > >>>>>>>>>>> I triggered the job again, it passed the timestamps call, and
> > >> will
> > >>>>>>> keep
> > >>>>>>>>>>> an eye on it.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 张铎(Duo Zhang) <pa...@gmail.com> 于2020年7月21日周二
> 上午11:18写道:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On the sponsors, we could have a try.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The problem here is the process of the donation? IIRC there
> > is a
> > >>>>>>> thread
> > >>>>>>>>>>>> on the infra mailing list about how to donate machines to a
> > >>>>> specific
> > >>>>>>>>>>>> project and the discussion did not go well...
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午11:13写道:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> We could check with ASF infra for the current state of
> things
> > >> wrt
> > >>>>>>>>>>>>> GitHub
> > >>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
> > projects.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> It has the same resource issue Travis had; things are fine
> > >> until
> > >>>>>>> some
> > >>>>>>>>>>>>> critical mass of projects seeking better perf realize some
> > new
> > >>>>>>> option
> > >>>>>>>>>>>>> is
> > >>>>>>>>>>>>> available and then quickly all available resources are
> > >> consumed.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> AFAICT the only option that gets us the same or better as
> the
> > >> H*
> > >>>>>>> nodes
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>> be finding sponsors and running our own.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
> > >> palomino219@gmail.com>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should be
> > >>>>>>>>>>>>> transferred as a
> > >>>>>>>>>>>>>> whole? They depend on each other.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I offer my help on the transition.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> And on github CI, does ASF have a special deal with
> github?
> > If
> > >>>>> not,
> > >>>>>>>>>>>>> I do
> > >>>>>>>>>>>>>> not think the default resource can fit our requirements...
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sean Busbey <bu...@apache.org> 于2020年7月21日周二 上午1:49写道:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi folks!
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
> > >> Infra's
> > >>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
> > >> currently
> > >>>>>>>>>>>>> slated
> > >>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and related
> > >>>>>>>>>>>>> projects".
> > >>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the H*
> > >> worker
> > >>>>>>>>>>>>> nodes
> > >>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've
> been
> > >>>>> running
> > >>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>> for ages[2].
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Migration discussion still happens on the
> > >>>>> hadoop-migrations@i.a.o
> > >>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of
> August
> > >> 15th
> > >>>>>>> for
> > >>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up
> and
> > >>>>> working
> > >>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
> > clear
> > >> to
> > >>>>> me
> > >>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth
> what
> > we
> > >>>>> get
> > >>>>>>>>>>>>> out
> > >>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I did
> a
> > >> quick
> > >>>>>>>>>>>>> test
> > >>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run just
> > >>>>> branch-2,
> > >>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed with
> a
> > >>>>> Jenkins
> > >>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating
> will
> > >> be a
> > >>>>>>>>>>>>> slog.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> As far as I can see our options are:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
> > >> publication
> > >>>>> in
> > >>>>>>>>>>>>> mid
> > >>>>>>>>>>>>>>> August.
> > >>>>>>>>>>>>>>> * Transition website publication and nothing else
> (probably
> > >> can
> > >>>>> be
> > >>>>>>>>>>>>>>> done in a day)
> > >>>>>>>>>>>>>>> * Transition just precommit testing for various repos
> > >> (probably
> > >>>>>>>>>>>>> can be
> > >>>>>>>>>>>>>>> done in a few days)
> > >>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due to
> > >>>>> nightly,
> > >>>>>>>>>>>>>>> flaky stuff, etc)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The alternatives if we do not transition any given job to
> > >>>>>>>>>>>>> ci-hadoop:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Try to move to GitHub Actions
> > >>>>>>>>>>>>>>> * Try to move to Travis CI
> > >>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
> > >> (presumably
> > >>>>>>> by
> > >>>>>>>>>>>>>>> soliciting project specific donations for worker nodes on
> > >> cloud
> > >>>>>>>>>>>>>>> vendors)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> It's important to remember that as a project we have a
> > heavy
> > >>>>>>>>>>>>> footprint
> > >>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
> > branch's
> > >>>>>>>>>>>>> nightly
> > >>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
> > >>>>> builds.a.o
> > >>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
> > bringing
> > >>>>> down
> > >>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>> it takes to run the full test suite, but applying that
> work
> > >> to
> > >>>>>>>>>>>>> nightly
> > >>>>>>>>>>>>>>> is itself a significant undertaking.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready to
> > >> work
> > >>>>>>>>>>>>> towards
> > >>>>>>>>>>>>>>> any given approach?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
> > >>>>>>>>>>>>>>> https://s.apache.org/fux1o
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [3]
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>
> > >> https://lists.apache.org/list.html?hadoop-migrations@infra.apache.org
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
> > >> ci-hadoop
> > >>>>>>>>>>>>>>> https://s.apache.org/7e1nq
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> [6]
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>
> >
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> >
> >
>