You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Etienne Chauchot <ec...@apache.org> on 2018/05/30 08:14:14 UTC

[PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Hi guys
Part of the CI improvement work, I would suggest to enable running the integration tests of the IOs from the github PR.

Indeed, when doing a review, either the reviewer or the author needs to run the IT. The problem is that the results are
private. It would be good to be able to run IT using a phrase in github (like the validates runner tests) to have the
results public like any other test in the PR. 
But it would require the backend IT infrastructures (kubernates/docker ...) to be always up and also to set their
credentials/location in the related jenkins groovy script.

I opened:
https://issues.apache.org/jira/browse/BEAM-4427

Thoughts?

Best
Etienne

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Posted by Etienne Chauchot <ec...@apache.org>.
@Łukasz true ElasticsearchIOIT and CasasndraIOIT were the first IO integration tests@Kenn: I think the jiras do not
exist, I will create a main jira and one subtask per missing job
Le mercredi 30 mai 2018 à 19:45 -0700, Kenneth Knowles a écrit :
> This all seems extremely useful. Is there some action to be taken other than advertising these related JIRAs?
> Kenn
> On Wed, May 30, 2018 at 5:45 AM Łukasz Gajowy <lu...@gmail.com> wrote:
> > +1 to generalizing IT. I think the tests you mentioned were developed earlier than the general idea of how the IOIT
> > should look like emerged. AFAIK the same goes for the tests in io/google-cloud-platform module. I recently created
> > some issues that address that [1], [2], [3]. If there's anyone willing to take those - feel free (I can help with
> > this). 
> > 
> > [1] https://issues.apache.org/jira/browse/BEAM-4416
> > [2] https://issues.apache.org/jira/browse/BEAM-4399
> > [3] https://issues.apache.org/jira/browse/BEAM-4398
> > 
> > 2018-05-30 14:00 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
> > > Hi Łukasz
> > > 
> > > Thanks for the details.
> > > 
> > > I was more thinking about generalizing IT test integration. For example some IOs like Cassandra and Elasticsearch
> > > have IT but no groovy scripts. 
> > > Also I agree with your list
> > > And thanks for the details about backend services automatic provisioning, I did not know that.
> > > 
> > > EtienneLe mercredi 30 mai 2018 à 11:21 +0200, Łukasz Gajowy a écrit : 
> > > > Hi Etienne, 
> > > > 
> > > > it is already possible, provided that there is appropriate Jenkins job defined (see examples here: [1],[2]).
> > > > Either the reviewer or the author can run the seed job to load job definitions (by typing "Run seed job" in
> > > > comment) and then run the test he/she is interested to run (by specifying the correct phrase in the GitHub
> > > > comment, eg. "Run Java JdbcIO Performance Test". The results are then available on Jenkins so those are public
> > > > too.
> > > > 
> > > > Regarding the infrastructure: currently, if a test requires any Kubernetes' infrastructure, it is set up by
> > > > PerfKitBenchmarker tool before the test is actually run. After the test execution, all the infrastructure is
> > > > torn down. This also is made automatically provided that all necessary Kubernetes' scripts are there. 
> > > > 
> > > > Despite the fact that it is possible, I must say that all the "Performance Testing Framework" needs improvement
> > > > in the following areas (so should be considered as an ongoing work in progress):
> > > >  - documentation and instructions for the community (this is getting more urgent!)
> > > >  - support for other runners (currently only direct and Dataflow are supported, as there were some issues when
> > > > we tried to integrate it with Spark and Flink) - support for other filesystems (currently only local and HDFS
> > > > are supported)
> > > >  - rename and reorganize IT jobs in Jenkins (see: [3])
> > > >  
> > > > Also, I think it's worthy to look improvement in terms of job definitions (seed jobs overwrite all jobs so this
> > > > can collide with other developers work). See the thread I started a while ago in [4] for further info.
> > > > 
> > > > Best regards, 
> > > > Łukasz Gajowy
> > > > 
> > > > 
> > > > [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy[2] https://g
> > > > ithub.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
> > > > [3] https://issues.apache.org/jira/browse/BEAM-4298
> > > > [4] https://lists.apache.org/thread.html/b1aaea2c7eadc7ca1d1326b94a8c4c3a67befc0753897fd7fa4a3a4e@%3Cdev.beam.ap
> > > > ache.org%3E
> > > > 2018-05-30 10:14 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
> > > > > Hi guys
> > > > > Part of the CI improvement work, I would suggest to enable running the integration tests of the IOs from the
> > > > > github PR.
> > > > > 
> > > > > Indeed, when doing a review, either the reviewer or the author needs to run the IT. The problem is that the
> > > > > results are private. It would be good to be able to run IT using a phrase in github (like the validates runner
> > > > > tests) to have the results public like any other test in the PR. 
> > > > > But it would require the backend IT infrastructures (kubernates/docker ...) to be always up and also to set
> > > > > their credentials/location in the related jenkins groovy script.
> > > > > 
> > > > > I opened:
> > > > > https://issues.apache.org/jira/browse/BEAM-4427
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Best
> > > > > Etienne

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Posted by Kenneth Knowles <kl...@google.com>.
This all seems extremely useful. Is there some action to be taken other
than advertising these related JIRAs?

Kenn

On Wed, May 30, 2018 at 5:45 AM Łukasz Gajowy <lu...@gmail.com>
wrote:

> +1 to generalizing IT. I think the tests you mentioned were developed
> earlier than the general idea of how the IOIT should look like emerged.
> AFAIK the same goes for the tests in io/google-cloud-platform module. I
> recently created some issues that address that [1], [2], [3]. If there's
> anyone willing to take those - feel free (I can help with this).
>
> [1] https://issues.apache.org/jira/browse/BEAM-4416
> [2] https://issues.apache.org/jira/browse/BEAM-4399
> [3] https://issues.apache.org/jira/browse/BEAM-4398
>
> 2018-05-30 14:00 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
>
>> Hi Łukasz
>>
>> Thanks for the details.
>>
>> I was more thinking about generalizing IT test integration. For example
>> some IOs like Cassandra and Elasticsearch have IT but no groovy scripts.
>> Also I agree with your list
>> And thanks for the details about backend services automatic provisioning,
>> I did not know that.
>>
>> Etienne
>> Le mercredi 30 mai 2018 à 11:21 +0200, Łukasz Gajowy a écrit :
>>
>> Hi Etienne,
>>
>> it is already possible, provided that there is appropriate Jenkins job
>> defined (see examples here: [1],[2]). Either the reviewer or the author can
>> run the seed job to load job definitions (by typing "Run seed job" in
>> comment) and then run the test he/she is interested to run (by specifying
>> the correct phrase in the GitHub comment, eg. "Run Java JdbcIO Performance
>> Test". The results are then available on Jenkins so those are public too.
>>
>> Regarding the infrastructure: currently, if a test requires any
>> Kubernetes' infrastructure, it is set up by PerfKitBenchmarker tool before
>> the test is actually run. After the test execution, all the infrastructure
>> is torn down. This also is made automatically provided that all necessary
>> Kubernetes' scripts are there.
>>
>> Despite the fact that it is possible, I must say that all the
>> "Performance Testing Framework" needs improvement in the following areas
>> (so should be considered as an ongoing work in progress):
>>  - documentation and instructions for the community (this is getting more
>> urgent!)
>>  - support for other runners (currently only direct and Dataflow are
>> supported, as there were some issues when we tried to integrate it with
>> Spark and Flink)
>>  - support for other filesystems (currently only local and HDFS are
>> supported)
>>  - rename and reorganize IT jobs in Jenkins (see: [3])
>>
>> Also, I think it's worthy to look improvement in terms of job definitions
>> (seed jobs overwrite all jobs so this can collide with other developers
>> work). See the thread I started a while ago in [4] for further info.
>>
>> Best regards,
>> Łukasz Gajowy
>>
>>
>> [1]
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy
>> [2]
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
>> [3] https://issues.apache.org/jira/browse/BEAM-4298
>> [4]
>> https://lists.apache.org/thread.html/b1aaea2c7eadc7ca1d1326b94a8c4c3a67befc0753897fd7fa4a3a4e@%3Cdev.beam.apache.org%3E
>>
>> 2018-05-30 10:14 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
>>
>> Hi guys
>> Part of the CI improvement work, I would suggest to enable running the
>> integration tests of the IOs from the github PR.
>>
>> Indeed, when doing a review, either the reviewer or the author needs to
>> run the IT. The problem is that the results are private. It would be good
>> to be able to run IT using a phrase in github (like the validates runner
>> tests) to have the results public like any other test in the PR.
>> But it would require the backend IT infrastructures (kubernates/docker
>> ...) to be always up and also to set their credentials/location in the
>> related jenkins groovy script.
>>
>> I opened:
>> https://issues.apache.org/jira/browse/BEAM-4427
>>
>> Thoughts?
>>
>> Best
>> Etienne
>>
>>
>>
>

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Posted by Łukasz Gajowy <lu...@gmail.com>.
+1 to generalizing IT. I think the tests you mentioned were developed
earlier than the general idea of how the IOIT should look like emerged.
AFAIK the same goes for the tests in io/google-cloud-platform module. I
recently created some issues that address that [1], [2], [3]. If there's
anyone willing to take those - feel free (I can help with this).

[1] https://issues.apache.org/jira/browse/BEAM-4416
[2] https://issues.apache.org/jira/browse/BEAM-4399
[3] https://issues.apache.org/jira/browse/BEAM-4398

2018-05-30 14:00 GMT+02:00 Etienne Chauchot <ec...@apache.org>:

> Hi Łukasz
>
> Thanks for the details.
>
> I was more thinking about generalizing IT test integration. For example
> some IOs like Cassandra and Elasticsearch have IT but no groovy scripts.
> Also I agree with your list
> And thanks for the details about backend services automatic provisioning,
> I did not know that.
>
> Etienne
> Le mercredi 30 mai 2018 à 11:21 +0200, Łukasz Gajowy a écrit :
>
> Hi Etienne,
>
> it is already possible, provided that there is appropriate Jenkins job
> defined (see examples here: [1],[2]). Either the reviewer or the author can
> run the seed job to load job definitions (by typing "Run seed job" in
> comment) and then run the test he/she is interested to run (by specifying
> the correct phrase in the GitHub comment, eg. "Run Java JdbcIO Performance
> Test". The results are then available on Jenkins so those are public too.
>
> Regarding the infrastructure: currently, if a test requires any
> Kubernetes' infrastructure, it is set up by PerfKitBenchmarker tool before
> the test is actually run. After the test execution, all the infrastructure
> is torn down. This also is made automatically provided that all necessary
> Kubernetes' scripts are there.
>
> Despite the fact that it is possible, I must say that all the "Performance
> Testing Framework" needs improvement in the following areas (so should be
> considered as an ongoing work in progress):
>  - documentation and instructions for the community (this is getting more
> urgent!)
>  - support for other runners (currently only direct and Dataflow are
> supported, as there were some issues when we tried to integrate it with
> Spark and Flink)
>  - support for other filesystems (currently only local and HDFS are
> supported)
>  - rename and reorganize IT jobs in Jenkins (see: [3])
>
> Also, I think it's worthy to look improvement in terms of job definitions
> (seed jobs overwrite all jobs so this can collide with other developers
> work). See the thread I started a while ago in [4] for further info.
>
> Best regards,
> Łukasz Gajowy
>
>
> [1] https://github.com/apache/beam/blob/master/.test-infra/
> jenkins/job_PerformanceTests_JDBC.groovy
> [2] https://github.com/apache/beam/blob/master/.test-infra/
> jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
> [3] https://issues.apache.org/jira/browse/BEAM-4298
> [4] https://lists.apache.org/thread.html/b1aaea2c7eadc7ca1d1326b94a8c4c
> 3a67befc0753897fd7fa4a3a4e@%3Cdev.beam.apache.org%3E
>
> 2018-05-30 10:14 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
>
> Hi guys
> Part of the CI improvement work, I would suggest to enable running the
> integration tests of the IOs from the github PR.
>
> Indeed, when doing a review, either the reviewer or the author needs to
> run the IT. The problem is that the results are private. It would be good
> to be able to run IT using a phrase in github (like the validates runner
> tests) to have the results public like any other test in the PR.
> But it would require the backend IT infrastructures (kubernates/docker
> ...) to be always up and also to set their credentials/location in the
> related jenkins groovy script.
>
> I opened:
> https://issues.apache.org/jira/browse/BEAM-4427
>
> Thoughts?
>
> Best
> Etienne
>
>
>

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Posted by Etienne Chauchot <ec...@apache.org>.
Hi Łukasz
Thanks for the details.
I was more thinking about generalizing IT test integration. For example some IOs like Cassandra and Elasticsearch have
IT but no groovy scripts. Also I agree with your listAnd thanks for the details about backend services automatic
provisioning, I did not know that.
EtienneLe mercredi 30 mai 2018 à 11:21 +0200, Łukasz Gajowy a écrit : 
> Hi Etienne, 
> 
> it is already possible, provided that there is appropriate Jenkins job defined (see examples here: [1],[2]). Either
> the reviewer or the author can run the seed job to load job definitions (by typing "Run seed job" in comment) and then
> run the test he/she is interested to run (by specifying the correct phrase in the GitHub comment, eg. "Run Java JdbcIO
> Performance Test". The results are then available on Jenkins so those are public too.
> 
> Regarding the infrastructure: currently, if a test requires any Kubernetes' infrastructure, it is set up by
> PerfKitBenchmarker tool before the test is actually run. After the test execution, all the infrastructure is torn
> down. This also is made automatically provided that all necessary Kubernetes' scripts are there. 
> 
> Despite the fact that it is possible, I must say that all the "Performance Testing Framework" needs improvement in the
> following areas (so should be considered as an ongoing work in progress):
>  - documentation and instructions for the community (this is getting more urgent!)
>  - support for other runners (currently only direct and Dataflow are supported, as there were some issues when we
> tried to integrate it with Spark and Flink) - support for other filesystems (currently only local and HDFS are
> supported)
>  - rename and reorganize IT jobs in Jenkins (see: [3])
>  
> Also, I think it's worthy to look improvement in terms of job definitions (seed jobs overwrite all jobs so this can
> collide with other developers work). See the thread I started a while ago in [4] for further info.
> 
> Best regards, 
> Łukasz Gajowy
> 
> 
> [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy[2] https://github.
> com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
> [3] https://issues.apache.org/jira/browse/BEAM-4298
> [4] https://lists.apache.org/thread.html/b1aaea2c7eadc7ca1d1326b94a8c4c3a67befc0753897fd7fa4a3a4e@%3Cdev.beam.apache.o
> rg%3E
> 2018-05-30 10:14 GMT+02:00 Etienne Chauchot <ec...@apache.org>:
> > Hi guys
> > Part of the CI improvement work, I would suggest to enable running the integration tests of the IOs from the github
> > PR.
> > 
> > Indeed, when doing a review, either the reviewer or the author needs to run the IT. The problem is that the results
> > are private. It would be good to be able to run IT using a phrase in github (like the validates runner tests) to
> > have the results public like any other test in the PR. 
> > But it would require the backend IT infrastructures (kubernates/docker ...) to be always up and also to set their
> > credentials/location in the related jenkins groovy script.
> > 
> > I opened:
> > https://issues.apache.org/jira/browse/BEAM-4427
> > 
> > Thoughts?
> > 
> > Best
> > Etienne

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

Posted by Łukasz Gajowy <lu...@gmail.com>.
Hi Etienne,

it is already possible, provided that there is appropriate Jenkins job
defined (see examples here: [1],[2]). Either the reviewer or the author can
run the seed job to load job definitions (by typing "Run seed job" in
comment) and then run the test he/she is interested to run (by specifying
the correct phrase in the GitHub comment, eg. "Run Java JdbcIO Performance
Test". The results are then available on Jenkins so those are public too.

Regarding the infrastructure: currently, if a test requires any
Kubernetes' infrastructure, it is set up by PerfKitBenchmarker tool before
the test is actually run. After the test execution, all the infrastructure
is torn down. This also is made automatically provided that all necessary
Kubernetes' scripts are there.

Despite the fact that it is possible, I must say that all the "Performance
Testing Framework" needs improvement in the following areas (so should be
considered as an ongoing work in progress):
 - documentation and instructions for the community (this is getting more
urgent!)
 - support for other runners (currently only direct and Dataflow are
supported, as there were some issues when we tried to integrate it with
Spark and Flink)
 - support for other filesystems (currently only local and HDFS are
supported)
 - rename and reorganize IT jobs in Jenkins (see: [3])

Also, I think it's worthy to look improvement in terms of job definitions
(seed jobs overwrite all jobs so this can collide with other developers
work). See the thread I started a while ago in [4] for further info.

Best regards,
Łukasz Gajowy


[1]
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy
[2]
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy
[3] https://issues.apache.org/jira/browse/BEAM-4298
[4]
https://lists.apache.org/thread.html/b1aaea2c7eadc7ca1d1326b94a8c4c3a67befc0753897fd7fa4a3a4e@%3Cdev.beam.apache.org%3E

2018-05-30 10:14 GMT+02:00 Etienne Chauchot <ec...@apache.org>:

> Hi guys
> Part of the CI improvement work, I would suggest to enable running the
> integration tests of the IOs from the github PR.
>
> Indeed, when doing a review, either the reviewer or the author needs to
> run the IT. The problem is that the results are private. It would be good
> to be able to run IT using a phrase in github (like the validates runner
> tests) to have the results public like any other test in the PR.
> But it would require the backend IT infrastructures (kubernates/docker
> ...) to be always up and also to set their credentials/location in the
> related jenkins groovy script.
>
> I opened:
> https://issues.apache.org/jira/browse/BEAM-4427
>
> Thoughts?
>
> Best
> Etienne
>