You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hadoop-migrations@infra.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2020/07/13 11:37:02 UTC

Re: Migration of Hadoop labelled nodes to new dedicated Master

Hey Dmitry!

For Hive we took a path - from which some parts might be usefull for you; I just write about it a bit:

For one thing - to have the PR/JIRA links - you might just need to add a asf.yaml as described in [1].

To test your project you have a few options;
* you could try to dump/load the jobs at the new Jenkins CI instance.
* you could create a new - more modern CI by rethinking it:
   * option A (use jenkins):
     * to enable building github PRs I think the most painless approach is to use a Jenkinsfile with a multibranch pipeline job
     * the Jenkinsfile should contain the instructions to build the project - this way modification to the CI will also come in PRs as well [2]
     * iirc for this to work you will need to set a user who has write access to the repo - without that; it won't be able to request merge commits from github
     * for Hive we took this approach because we still struggle with all kind of test flakiness/etc...
     * this is just one way to do it...jenkins can be utilized in various ways
   * option B (github actions):
     * "github actions" have a lot of resources in the background (iirc: you may use 20 instances of 2core xeons at a time to test a project)
     * configuring github actions is a bit different - you may need to get used to it - but so far I find it usefull :)
     * some asf projects are already utilizing "github actions" like ozone and calcite
     * there are some limitations of GA: like publishing test reports is not really possible.. if red/green is enough then that might not even be important...

[1] https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
[2] https://www.jenkins.io/doc/pipeline/examples/

cheers,
Zoltan

On 7/13/20 1:13 PM, Dmitry Grinenko wrote:
> Hi Gavin,
> 
> Thanks for the answer. Sadly, it is not me who previously configured the project on CI, so i would need to do it from the scratch and can be a bit slow due to that.
> 
> The general idea is to:
> - use something like https://plugins.jenkins.io/ghprb/ to use it as trigger for Jenkins job to verify pull requests on GitHub
> - add link to the pull request in the https://issues.apache.org/jira automatically
> 
> 
> What would be needed in overall (let's call it a plan):
> - Create pipeline in Jenkins for the Ambari
> - Add GitHub Builder plugin, so we would able to integrate Jenkins CI with GitHub for automatic new pull request checking
> - configure the plugin to work with the pipeline and GitHub repository
> - add possibility to add links in ASF Jira to the GitHub Pull Request
> 
> 
> I'll start my work on the first item in the list.
> 
> 
> ------ Original Message ------
> From: "Gavin McDonald" <gm...@apache.org>
> To: hadoop-migrations@infra.apache.org
> Sent: 7/13/2020 12:15:47 PM
> Subject: Re: Re[2]: Migration of Hadoop labelled nodes to new dedicated Master
> 
>> Hi Dmitry,
>>
>> Welcome! - You are not too late no, we need to start ramping up testing so
>> we can perform the migration of all Hadoop related projects to
>> ci-hadoop.apache.org.
>>
>> Please let me know what you need to get going - you already have login and
>> build/create
>> job ability - please test that.
>>
>> Later today I'll grab two more H nodes from builds.apache.org and place
>> them in ci-hadoop.a.o pool
>>
>> Thanks
>>
>>
>> On Thu, Jul 9, 2020 at 3:19 PM Dmitry Grinenko
>> <dg...@cloudera.com.invalid> wrote:
>>
>>>  Hello All,
>>>
>>>  It seems i'm bit late to the event due to some circumstances, but is
>>>  there still an opportunity to
>>>  participate in the migration?
>>>
>>>  I'll like to step up as man from the Ambari team and like to migrate
>>>  GitHub pull request builds to new infra.
>>>
>>>
>>>  ------ Original Message ------
>>>  From: "Gavin McDonald" <gm...@apache.org>
>>>  To: hadoop-migrations@infra.apache.org
>>>  Sent: 4/29/2020 4:05:49 PM
>>>  Subject: Re: Migration of Hadoop labelled nodes to new dedicated Master
>>>
>>>  >Hi All,
>>>  >
>>>  >Following on from the below email I sent *11 DAYS ago now*, so far we have
>>>  >had *one reply *from mahout cc:d to me (thank you Trevor) , and had *ONE
>>>  >PERSON* sign up to the new hadoop-migrations@infra.apache.org -
>>>  >that is out of a total of *OVER 7000* people signed up to the 13 mailing
>>>  >lists emailed.
>>>  >
>>>  >To recap what I asked for:-
>>>  >
>>>  >"...What I would like from each community, is to decide who is going to
>>>  >help with their project in performing these migrations - ideally 2 or 3
>>>  >folks who use the current builds.a.o regularly. Those folks should then
>>>  >subscribe to the new dedicated hadoop-migrations@infra.apache.org mailing
>>>  >lists as soon as possible so we can get started..."
>>>  >
>>>  >This will be last email I sent to your dev list directly. I am now
>>>  building
>>>  >a new Jenkins Master, and as soon as it is ready I will start to migrate
>>>  >the Jenkins Nodes/Agents over to the new system.
>>>  >And; when I am done, the existing builds.apache.org *WILL BE TURNED OFF*.
>>>  >
>>>  >I am now going to continue all conversations on the
>>>  >hadoop-migrations@infra.apache.org list *only.*
>>>  >
>>>  >Thanks
>>>  >
>>>  >Gavin McDonald (ASF Infra)
>>>  >
>>>  >
>>>  >On Sat, Apr 18, 2020 at 4:21 PM Gavin McDonald <gm...@apache.org>
>>>  wrote:
>>>  >
>>>  >>  Hi All,
>>>  >>
>>>  >>  A couple of months ago, I wrote to a few project private lists
>>>  mentioning
>>>  >>  the need to migrate Hadoop labelled nodes (H0-H21) over to a new
>>>  dedicated
>>>  >>  Jenkins Master [1] (a Cloudbees Client Master.).
>>>  >>
>>>  >>  I'd like to revisit this now that I have more time to dedicate to
>>>  getting
>>>  >>  this done. However, keeping track across multiple mailing lists,
>>>  >>  separate conversations that spring up in various places is cumbersome
>>>  and
>>>  >>  not realistic. To that end, I have created a new specific mailing list
>>>  >>  dedicated to the migrations of these nodes, and the projects that use
>>>  them,
>>>  >>  over to the new system.
>>>  >>
>>>  >>  The mailing list 'hadoop-migrations@infra.apache.org' is up and
>>>  running
>>>  >>  now (and this will be the first post to it). Previous discussions were
>>>  on
>>>  >>  the private PMC lists, (there was some debate about that but I wanted
>>>  the
>>>  >>  PMCs initially to be aware of the change,) this new list is public and
>>>  >>  archived.
>>>  >>
>>>  >>  This email is BCC'd to 13 projects dev lists [2] determined by the
>>>  https:/
>>>  >>  hadoop.apache.org list of Related projects, minus Cassandra whom
>>>  already
>>>  >>  have their own dedicated client master [3] and I added Yetus as I think
>>>  >>  they cross collaborate with many Hadoop based projects. If anyone
>>>  thinks a
>>>  >>  project is missing, or should not be on the list, let me know.
>>>  >>
>>>  >>  What I would like from each community, is to decide who is going to
>>>  help
>>>  >>  with their project in performing these migrations - ideally 2 or 3
>>>  folks
>>>  >>  who use the current builds.a.o regularly. Those folks should then
>>>  subscribe
>>>  >>  to the new dedicated hadoop-migrations@infra.apache.org mailing lists
>>>  as
>>>  >>  soon as possible so we can get started.
>>>  >>
>>>  >>  About the current setup - and I hope this answers previously asked
>>>  >>  questions on private lists - the new dedicated master is a Cloudbees
>>>  Client
>>>  >>  Master 2.204.3.7-rolling. It is not the same setup as the current
>>>  Jenkins
>>>  >>  master on builds.a.o - it is not intended to be. It is more or less a
>>>  >>  'clean install' in that I have not installed over 500 plugins as is the
>>>  >>  case on builds.a.o , I would rather we install plugins as we find we
>>>  need
>>>  >>  them. So yes, there may be some features missing - the point of having
>>>  >>  people sign up to the new list is to find out what those are, get them
>>>  >>  installed, and get your builds to at least the same state they are in
>>>  >>  currently.
>>>  >>
>>>  >>  We have 2 nodes on there currently for testing, as things progress we
>>>  can
>>>  >>  transfer over a couple more, projects can start to migrate their jobs
>>>  over
>>>  >>  at any time they are happy , until done. We also need to test auth -
>>>  the
>>>  >>  master; and its nodes will be restricted to just Hadoop + Related
>>>  projects
>>>  >>  (which is important this list of related projects is correct). No
>>>  longer
>>>  >>  will other projects be able to hop on to Hadoop nodes, and no longer
>>>  will
>>>  >>  Hadoop related projects be able to hop onto other folks nodes. This is
>>>  a
>>>  >>  good thing, and may encourage some providers to donate a few more VMs
>>>  for
>>>  >>  dedicated use.
>>>  >>
>>>  >>  For now then, decide who will help with this process, and sign up to
>>>  the
>>>  >>  new mailing list, and lets get started!
>>>  >>
>>>  >>  Note I am NOT subscribed to any of your dev lists, so replies please cc
>>>  >>  the new list. and I will await your presence there to get started.
>>>  >>
>>>  >>  Thanks all.
>>>  >>
>>>  >>  Gavin McDonald (ASF Infra)
>>>  >>
>>>  >>  [1] - https://ci-hadoop.apache.org
>>>  >>  [2] -
>>>  >>
>>>  hadoop,chukwa,avro,ambari,hbase,hive,mahout,pig,spark,submarine,tez,zookeeper,yetus
>>>  >>  [3] - https://ci-cassandra.apache.org
>>>  >>
>>>
>>>
>>
>> -- 
>>
>> *Gavin McDonald*
>> Systems Administrator
>> ASF Infrastructure Team
>

Re[2]: Migration of Hadoop labelled nodes to new dedicated Master

Posted by Dmitry Grinenko <dg...@cloudera.com.INVALID>.

Hi Zoltan,

You information is really useful.

While i would definitely need Jenkins for such things as automatic pull 
request linking to the jira i would like to give a shot for github 
actions and try them and decide what is better.
It's like green/red is enough for us, but with the possibility to check 
the test report (no need to publish it anywhere)

One question is - how to access the configuration section for the ASF 
GitHub project?  Is there any doc or i need some special permissions?



Thanks in advance

------ Original Message ------
From: "Zoltan Haindrich" <ki...@rxd.hu>
To: hadoop-migrations@infra.apache.org; "Dmitry Grinenko" 
<dg...@cloudera.com.invalid>
Sent: 7/13/2020 2:37:02 PM
Subject: Re: Migration of Hadoop labelled nodes to new dedicated Master

>Hey Dmitry!
>
>For Hive we took a path - from which some parts might be usefull for you; I just write about it a bit:
>
>For one thing - to have the PR/JIRA links - you might just need to add a asf.yaml as described in [1].
>
>To test your project you have a few options;
>* you could try to dump/load the jobs at the new Jenkins CI instance.
>* you could create a new - more modern CI by rethinking it:
>   * option A (use jenkins):
>     * to enable building github PRs I think the most painless approach is to use a Jenkinsfile with a multibranch pipeline job
>     * the Jenkinsfile should contain the instructions to build the project - this way modification to the CI will also come in PRs as well [2]
>     * iirc for this to work you will need to set a user who has write access to the repo - without that; it won't be able to request merge commits from github
>     * for Hive we took this approach because we still struggle with all kind of test flakiness/etc...
>     * this is just one way to do it...jenkins can be utilized in various ways
>   * option B (github actions):
>     * "github actions" have a lot of resources in the background (iirc: you may use 20 instances of 2core xeons at a time to test a project)
>     * configuring github actions is a bit different - you may need to get used to it - but so far I find it usefull :)
>     * some asf projects are already utilizing "github actions" like ozone and calcite
>     * there are some limitations of GA: like publishing test reports is not really possible.. if red/green is enough then that might not even be important...
>
>[1] https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
>[2] https://www.jenkins.io/doc/pipeline/examples/
>
>cheers,
>Zoltan
>
>On 7/13/20 1:13 PM, Dmitry Grinenko wrote:
>>Hi Gavin,
>>
>>Thanks for the answer. Sadly, it is not me who previously configured the project on CI, so i would need to do it from the scratch and can be a bit slow due to that.
>>
>>The general idea is to:
>>- use something like https://plugins.jenkins.io/ghprb/ to use it as trigger for Jenkins job to verify pull requests on GitHub
>>- add link to the pull request in the https://issues.apache.org/jira automatically
>>
>>
>>What would be needed in overall (let's call it a plan):
>>- Create pipeline in Jenkins for the Ambari
>>- Add GitHub Builder plugin, so we would able to integrate Jenkins CI with GitHub for automatic new pull request checking
>>- configure the plugin to work with the pipeline and GitHub repository
>>- add possibility to add links in ASF Jira to the GitHub Pull Request
>>
>>
>>I'll start my work on the first item in the list.
>>
>>
>>------ Original Message ------
>>From: "Gavin McDonald" <gm...@apache.org>
>>To: hadoop-migrations@infra.apache.org
>>Sent: 7/13/2020 12:15:47 PM
>>Subject: Re: Re[2]: Migration of Hadoop labelled nodes to new dedicated Master
>>
>>>Hi Dmitry,
>>>
>>>Welcome! - You are not too late no, we need to start ramping up testing so
>>>we can perform the migration of all Hadoop related projects to
>>>ci-hadoop.apache.org.
>>>
>>>Please let me know what you need to get going - you already have login and
>>>build/create
>>>job ability - please test that.
>>>
>>>Later today I'll grab two more H nodes from builds.apache.org and place
>>>them in ci-hadoop.a.o pool
>>>
>>>Thanks
>>>
>>>
>>>On Thu, Jul 9, 2020 at 3:19 PM Dmitry Grinenko
>>><dg...@cloudera.com.invalid> wrote:
>>>
>>>>  Hello All,
>>>>
>>>>  It seems i'm bit late to the event due to some circumstances, but is
>>>>  there still an opportunity to
>>>>  participate in the migration?
>>>>
>>>>  I'll like to step up as man from the Ambari team and like to migrate
>>>>  GitHub pull request builds to new infra.
>>>>
>>>>
>>>>  ------ Original Message ------
>>>>  From: "Gavin McDonald" <gm...@apache.org>
>>>>  To: hadoop-migrations@infra.apache.org
>>>>  Sent: 4/29/2020 4:05:49 PM
>>>>  Subject: Re: Migration of Hadoop labelled nodes to new dedicated Master
>>>>
>>>>  >Hi All,
>>>>  >
>>>>  >Following on from the below email I sent *11 DAYS ago now*, so far we have
>>>>  >had *one reply *from mahout cc:d to me (thank you Trevor) , and had *ONE
>>>>  >PERSON* sign up to the new hadoop-migrations@infra.apache.org -
>>>>  >that is out of a total of *OVER 7000* people signed up to the 13 mailing
>>>>  >lists emailed.
>>>>  >
>>>>  >To recap what I asked for:-
>>>>  >
>>>>  >"...What I would like from each community, is to decide who is going to
>>>>  >help with their project in performing these migrations - ideally 2 or 3
>>>>  >folks who use the current builds.a.o regularly. Those folks should then
>>>>  >subscribe to the new dedicated hadoop-migrations@infra.apache.org mailing
>>>>  >lists as soon as possible so we can get started..."
>>>>  >
>>>>  >This will be last email I sent to your dev list directly. I am now
>>>>  building
>>>>  >a new Jenkins Master, and as soon as it is ready I will start to migrate
>>>>  >the Jenkins Nodes/Agents over to the new system.
>>>>  >And; when I am done, the existing builds.apache.org *WILL BE TURNED OFF*.
>>>>  >
>>>>  >I am now going to continue all conversations on the
>>>>  >hadoop-migrations@infra.apache.org list *only.*
>>>>  >
>>>>  >Thanks
>>>>  >
>>>>  >Gavin McDonald (ASF Infra)
>>>>  >
>>>>  >
>>>>  >On Sat, Apr 18, 2020 at 4:21 PM Gavin McDonald <gm...@apache.org>
>>>>  wrote:
>>>>  >
>>>>  >>  Hi All,
>>>>  >>
>>>>  >>  A couple of months ago, I wrote to a few project private lists
>>>>  mentioning
>>>>  >>  the need to migrate Hadoop labelled nodes (H0-H21) over to a new
>>>>  dedicated
>>>>  >>  Jenkins Master [1] (a Cloudbees Client Master.).
>>>>  >>
>>>>  >>  I'd like to revisit this now that I have more time to dedicate to
>>>>  getting
>>>>  >>  this done. However, keeping track across multiple mailing lists,
>>>>  >>  separate conversations that spring up in various places is cumbersome
>>>>  and
>>>>  >>  not realistic. To that end, I have created a new specific mailing list
>>>>  >>  dedicated to the migrations of these nodes, and the projects that use
>>>>  them,
>>>>  >>  over to the new system.
>>>>  >>
>>>>  >>  The mailing list 'hadoop-migrations@infra.apache.org' is up and
>>>>  running
>>>>  >>  now (and this will be the first post to it). Previous discussions were
>>>>  on
>>>>  >>  the private PMC lists, (there was some debate about that but I wanted
>>>>  the
>>>>  >>  PMCs initially to be aware of the change,) this new list is public and
>>>>  >>  archived.
>>>>  >>
>>>>  >>  This email is BCC'd to 13 projects dev lists [2] determined by the
>>>>  https:/
>>>>  >>  hadoop.apache.org list of Related projects, minus Cassandra whom
>>>>  already
>>>>  >>  have their own dedicated client master [3] and I added Yetus as I think
>>>>  >>  they cross collaborate with many Hadoop based projects. If anyone
>>>>  thinks a
>>>>  >>  project is missing, or should not be on the list, let me know.
>>>>  >>
>>>>  >>  What I would like from each community, is to decide who is going to
>>>>  help
>>>>  >>  with their project in performing these migrations - ideally 2 or 3
>>>>  folks
>>>>  >>  who use the current builds.a.o regularly. Those folks should then
>>>>  subscribe
>>>>  >>  to the new dedicated hadoop-migrations@infra.apache.org mailing lists
>>>>  as
>>>>  >>  soon as possible so we can get started.
>>>>  >>
>>>>  >>  About the current setup - and I hope this answers previously asked
>>>>  >>  questions on private lists - the new dedicated master is a Cloudbees
>>>>  Client
>>>>  >>  Master 2.204.3.7-rolling. It is not the same setup as the current
>>>>  Jenkins
>>>>  >>  master on builds.a.o - it is not intended to be. It is more or less a
>>>>  >>  'clean install' in that I have not installed over 500 plugins as is the
>>>>  >>  case on builds.a.o , I would rather we install plugins as we find we
>>>>  need
>>>>  >>  them. So yes, there may be some features missing - the point of having
>>>>  >>  people sign up to the new list is to find out what those are, get them
>>>>  >>  installed, and get your builds to at least the same state they are in
>>>>  >>  currently.
>>>>  >>
>>>>  >>  We have 2 nodes on there currently for testing, as things progress we
>>>>  can
>>>>  >>  transfer over a couple more, projects can start to migrate their jobs
>>>>  over
>>>>  >>  at any time they are happy , until done. We also need to test auth -
>>>>  the
>>>>  >>  master; and its nodes will be restricted to just Hadoop + Related
>>>>  projects
>>>>  >>  (which is important this list of related projects is correct). No
>>>>  longer
>>>>  >>  will other projects be able to hop on to Hadoop nodes, and no longer
>>>>  will
>>>>  >>  Hadoop related projects be able to hop onto other folks nodes. This is
>>>>  a
>>>>  >>  good thing, and may encourage some providers to donate a few more VMs
>>>>  for
>>>>  >>  dedicated use.
>>>>  >>
>>>>  >>  For now then, decide who will help with this process, and sign up to
>>>>  the
>>>>  >>  new mailing list, and lets get started!
>>>>  >>
>>>>  >>  Note I am NOT subscribed to any of your dev lists, so replies please cc
>>>>  >>  the new list. and I will await your presence there to get started.
>>>>  >>
>>>>  >>  Thanks all.
>>>>  >>
>>>>  >>  Gavin McDonald (ASF Infra)
>>>>  >>
>>>>  >>  [1] - https://ci-hadoop.apache.org
>>>>  >>  [2] -
>>>>  >>
>>>>  hadoop,chukwa,avro,ambari,hbase,hive,mahout,pig,spark,submarine,tez,zookeeper,yetus
>>>>  >>  [3] - https://ci-cassandra.apache.org
>>>>  >>
>>>>
>>>>
>>>
>>>--
>>>*Gavin McDonald*
>>>Systems Administrator
>>>ASF Infrastructure Team
>>