You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by James Turton <dz...@apache.org> on 2023/03/14 08:00:50 UTC

Automatically publish Calcite SNAPSHOT artefacts?

Hi Calcite and Drill devs

Here's an idea that, while it doesn't equate to the upstreaming of Drill 
tests to the Calcite test suite, could mean that all of Drill's tests 
would automatically and frequently be run with the HEAD of the Calcite 
main branch.

Something Drill started doing last year is the continuous publication of 
SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer 
Drill plugin developers targeting the upcoming version of Drill the 
ability to pull in up to date libraries without having to build Drill 
from its master branch themselves.

Being concerned about causing an explosion of artefact versions we asked 
Infra whether it would be okay for Drill to republish every time a 
commit is merged into master and they told us that we are not the first 
and can publish there as often as we like. The only tricky bit to 
setting this up was obtaining Nexus credentials from GitHub Secrets and 
using them in a new GitHub workflow but we do now have a working example 
in Drill [2] and I'd happy to help if Calcite is interested in doing the 
same.

If Drill then bases its master branch on these proposed SNAPSHOT 
artefacts then Drill's normal CI runs would continuously test it with 
Calcite as at its most recent commit and we'd be made aware of 
compatibility problems early. If declaring a dependency on a SNAPSHOT 
version in master is too much malpractice, instability or extra process 
[3] for Drill devs then I expect that the same thing could still be 
achieved in a new "calcite-next" branch. In any event we'd continue to 
have the stable Drill branch which would of course only ever depend on a 
released version of Calcite.

Regards
James

[1] 
https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
[2] 
https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
[3] If we did this in Drill master then I guess a new step in the Drill 
release process would need to see us push a commit that pins the Calcite 
dependency to the latest released version. We'd probably also become 
motivated to time Drill releases so that they happen just after Calcite 
releases. Personally I'd be prepared to try this out for a while.

Re: Automatically publish Calcite SNAPSHOT artefacts?

Posted by James Turton <dz...@apache.org>.
Hi Calcite and Drill devs

I have a quick follow up on the discussion of running Drill tests 
against the Calcite main branch.

Firstly, as feedback on Stamatis' comments below, the Drill unit test 
suite is big, diverse, slow, resource hungry and temperamental. By which 
I mean that it does work but a clean run needs about two hours, all of 
the RAM that can be squeezed from a GitHub Runner and any number of 
dependencies including some in Docker containers. Masochists may browse 
the Drill CI run logs [1]. That said, there are many tests that Calcite 
would never want in its CI and it would be possible to run a much more 
reasonable subset of the tests in the context of testing Calcite main.

Given the above, we have in the meantime opted to try basing Drill 
master on Calcite main [2] instead of some or other released version of 
Calcite. What's appealing about this is the one line change to make it 
happen. What could be difficult is the "two moving targets" mentioned by 
Stamatis. Nonetheless we want to try it for a while and in the worst 
case we'll simply retreat back to testing with a released Calcite.

My prediction is that there will be more "Something changed! Was it in 
Drill? No, it looks like this Calcite commit. Was it a regression? No, I 
asked and it's an intentional change that happens to affect a 
customisation that Drill has made. Okay let's deal with it this way", 
and permutations thereof. That would be useful stuff, to Drill at the 
very least.

Thanks
James

[1] https://github.com/apache/drill/actions
[2] 
https://github.com/apache/drill/commit/b6d59eaf39ef4c2f4417b633a97121223bc2569f

On 2023/03/14 10:42, Stamatis Zampetakis wrote:
> Hi James,
>
> Thanks for starting this discussion.
>
> Regarding Calcite snapshots there is already a Jenkins job [1] publishing
> regularly artifacts to the snapshots repo [2].
>
> Apart from that, having regular integration tests between Drill and Calcite
> is a very good idea. The tricky part is to decide where the integration
> tests are going to run:
> A) Part of Calcite CI
> B) Part of Drill CI
> C) Both
>
> I will mostly comment about the option of adding extra tests in Calcite CI
> since I am most familiar with it.
> If this happens, I would prefer to use a fixed Drill commit as a reference.
> This is more or less the same with pinning the versions in other Calcite
> adapters. I am afraid that having two (or more) moving targets will make
> things very unstable.
> Second, I would be mindful of the total duration and frequency of the runs
> especially if the intention is to run it as part of every PR.
> Finally it would be nice to select a subset of Drill tests to run that are
> relevant and hopefully meaningful to calcite devs and not everything so
> that when they fail somebody familiar with Calcite can understand what
> happens.
>
> Apart from running integration tests anything that can be captured as a
> simple Calcite unit test would be of immense help in long term stability.
>
> Best,
> Stamatis
>
> [1] https://ci-builds.apache.org/job/Calcite/job/Calcite-snapshots/
> [2] https://repository.apache.org/content/groups/snapshots/org/apache/calcite/
>
>
>
>
>
> On Tue, Mar 14, 2023 at 9:02 AM James Turton <dz...@apache.org> wrote:
>
>> Hi Calcite and Drill devs
>>
>> Here's an idea that, while it doesn't equate to the upstreaming of Drill
>> tests to the Calcite test suite, could mean that all of Drill's tests
>> would automatically and frequently be run with the HEAD of the Calcite
>> main branch.
>>
>> Something Drill started doing last year is the continuous publication of
>> SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer
>> Drill plugin developers targeting the upcoming version of Drill the
>> ability to pull in up to date libraries without having to build Drill
>> from its master branch themselves.
>>
>> Being concerned about causing an explosion of artefact versions we asked
>> Infra whether it would be okay for Drill to republish every time a
>> commit is merged into master and they told us that we are not the first
>> and can publish there as often as we like. The only tricky bit to
>> setting this up was obtaining Nexus credentials from GitHub Secrets and
>> using them in a new GitHub workflow but we do now have a working example
>> in Drill [2] and I'd happy to help if Calcite is interested in doing the
>> same.
>>
>> If Drill then bases its master branch on these proposed SNAPSHOT
>> artefacts then Drill's normal CI runs would continuously test it with
>> Calcite as at its most recent commit and we'd be made aware of
>> compatibility problems early. If declaring a dependency on a SNAPSHOT
>> version in master is too much malpractice, instability or extra process
>> [3] for Drill devs then I expect that the same thing could still be
>> achieved in a new "calcite-next" branch. In any event we'd continue to
>> have the stable Drill branch which would of course only ever depend on a
>> released version of Calcite.
>>
>> Regards
>> James
>>
>> [1]
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
>> [2]
>>
>> https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
>> [3] If we did this in Drill master then I guess a new step in the Drill
>> release process would need to see us push a commit that pins the Calcite
>> dependency to the latest released version. We'd probably also become
>> motivated to time Drill releases so that they happen just after Calcite
>> releases. Personally I'd be prepared to try this out for a while.
>>


Re: Automatically publish Calcite SNAPSHOT artefacts?

Posted by James Turton <dz...@apache.org>.
Hi Calcite and Drill devs

I have a quick follow up on the discussion of running Drill tests 
against the Calcite main branch.

Firstly, as feedback on Stamatis' comments below, the Drill unit test 
suite is big, diverse, slow, resource hungry and temperamental. By which 
I mean that it does work but a clean run needs about two hours, all of 
the RAM that can be squeezed from a GitHub Runner and any number of 
dependencies including some in Docker containers. Masochists may browse 
the Drill CI run logs [1]. That said, there are many tests that Calcite 
would never want in its CI and it would be possible to run a much more 
reasonable subset of the tests in the context of testing Calcite main.

Given the above, we have in the meantime opted to try basing Drill 
master on Calcite main [2] instead of some or other released version of 
Calcite. What's appealing about this is the one line change to make it 
happen. What could be difficult is the "two moving targets" mentioned by 
Stamatis. Nonetheless we want to try it for a while and in the worst 
case we'll simply retreat back to testing with a released Calcite.

My prediction is that there will be more "Something changed! Was it in 
Drill? No, it looks like this Calcite commit. Was it a regression? No, I 
asked and it's an intentional change that happens to affect a 
customisation that Drill has made. Okay let's deal with it this way", 
and permutations thereof. That would be useful stuff, to Drill at the 
very least.

Thanks
James

[1] https://github.com/apache/drill/actions
[2] 
https://github.com/apache/drill/commit/b6d59eaf39ef4c2f4417b633a97121223bc2569f

On 2023/03/14 10:42, Stamatis Zampetakis wrote:
> Hi James,
>
> Thanks for starting this discussion.
>
> Regarding Calcite snapshots there is already a Jenkins job [1] publishing
> regularly artifacts to the snapshots repo [2].
>
> Apart from that, having regular integration tests between Drill and Calcite
> is a very good idea. The tricky part is to decide where the integration
> tests are going to run:
> A) Part of Calcite CI
> B) Part of Drill CI
> C) Both
>
> I will mostly comment about the option of adding extra tests in Calcite CI
> since I am most familiar with it.
> If this happens, I would prefer to use a fixed Drill commit as a reference.
> This is more or less the same with pinning the versions in other Calcite
> adapters. I am afraid that having two (or more) moving targets will make
> things very unstable.
> Second, I would be mindful of the total duration and frequency of the runs
> especially if the intention is to run it as part of every PR.
> Finally it would be nice to select a subset of Drill tests to run that are
> relevant and hopefully meaningful to calcite devs and not everything so
> that when they fail somebody familiar with Calcite can understand what
> happens.
>
> Apart from running integration tests anything that can be captured as a
> simple Calcite unit test would be of immense help in long term stability.
>
> Best,
> Stamatis
>
> [1] https://ci-builds.apache.org/job/Calcite/job/Calcite-snapshots/
> [2] https://repository.apache.org/content/groups/snapshots/org/apache/calcite/
>
>
>
>
>
> On Tue, Mar 14, 2023 at 9:02 AM James Turton <dz...@apache.org> wrote:
>
>> Hi Calcite and Drill devs
>>
>> Here's an idea that, while it doesn't equate to the upstreaming of Drill
>> tests to the Calcite test suite, could mean that all of Drill's tests
>> would automatically and frequently be run with the HEAD of the Calcite
>> main branch.
>>
>> Something Drill started doing last year is the continuous publication of
>> SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer
>> Drill plugin developers targeting the upcoming version of Drill the
>> ability to pull in up to date libraries without having to build Drill
>> from its master branch themselves.
>>
>> Being concerned about causing an explosion of artefact versions we asked
>> Infra whether it would be okay for Drill to republish every time a
>> commit is merged into master and they told us that we are not the first
>> and can publish there as often as we like. The only tricky bit to
>> setting this up was obtaining Nexus credentials from GitHub Secrets and
>> using them in a new GitHub workflow but we do now have a working example
>> in Drill [2] and I'd happy to help if Calcite is interested in doing the
>> same.
>>
>> If Drill then bases its master branch on these proposed SNAPSHOT
>> artefacts then Drill's normal CI runs would continuously test it with
>> Calcite as at its most recent commit and we'd be made aware of
>> compatibility problems early. If declaring a dependency on a SNAPSHOT
>> version in master is too much malpractice, instability or extra process
>> [3] for Drill devs then I expect that the same thing could still be
>> achieved in a new "calcite-next" branch. In any event we'd continue to
>> have the stable Drill branch which would of course only ever depend on a
>> released version of Calcite.
>>
>> Regards
>> James
>>
>> [1]
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
>> [2]
>>
>> https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
>> [3] If we did this in Drill master then I guess a new step in the Drill
>> release process would need to see us push a commit that pins the Calcite
>> dependency to the latest released version. We'd probably also become
>> motivated to time Drill releases so that they happen just after Calcite
>> releases. Personally I'd be prepared to try this out for a while.
>>


Re: Automatically publish Calcite SNAPSHOT artefacts?

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi James,

Thanks for starting this discussion.

Regarding Calcite snapshots there is already a Jenkins job [1] publishing
regularly artifacts to the snapshots repo [2].

Apart from that, having regular integration tests between Drill and Calcite
is a very good idea. The tricky part is to decide where the integration
tests are going to run:
A) Part of Calcite CI
B) Part of Drill CI
C) Both

I will mostly comment about the option of adding extra tests in Calcite CI
since I am most familiar with it.
If this happens, I would prefer to use a fixed Drill commit as a reference.
This is more or less the same with pinning the versions in other Calcite
adapters. I am afraid that having two (or more) moving targets will make
things very unstable.
Second, I would be mindful of the total duration and frequency of the runs
especially if the intention is to run it as part of every PR.
Finally it would be nice to select a subset of Drill tests to run that are
relevant and hopefully meaningful to calcite devs and not everything so
that when they fail somebody familiar with Calcite can understand what
happens.

Apart from running integration tests anything that can be captured as a
simple Calcite unit test would be of immense help in long term stability.

Best,
Stamatis

[1] https://ci-builds.apache.org/job/Calcite/job/Calcite-snapshots/
[2]
https://repository.apache.org/content/groups/snapshots/org/apache/calcite/





On Tue, Mar 14, 2023 at 9:02 AM James Turton <dz...@apache.org> wrote:

> Hi Calcite and Drill devs
>
> Here's an idea that, while it doesn't equate to the upstreaming of Drill
> tests to the Calcite test suite, could mean that all of Drill's tests
> would automatically and frequently be run with the HEAD of the Calcite
> main branch.
>
> Something Drill started doing last year is the continuous publication of
> SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer
> Drill plugin developers targeting the upcoming version of Drill the
> ability to pull in up to date libraries without having to build Drill
> from its master branch themselves.
>
> Being concerned about causing an explosion of artefact versions we asked
> Infra whether it would be okay for Drill to republish every time a
> commit is merged into master and they told us that we are not the first
> and can publish there as often as we like. The only tricky bit to
> setting this up was obtaining Nexus credentials from GitHub Secrets and
> using them in a new GitHub workflow but we do now have a working example
> in Drill [2] and I'd happy to help if Calcite is interested in doing the
> same.
>
> If Drill then bases its master branch on these proposed SNAPSHOT
> artefacts then Drill's normal CI runs would continuously test it with
> Calcite as at its most recent commit and we'd be made aware of
> compatibility problems early. If declaring a dependency on a SNAPSHOT
> version in master is too much malpractice, instability or extra process
> [3] for Drill devs then I expect that the same thing could still be
> achieved in a new "calcite-next" branch. In any event we'd continue to
> have the stable Drill branch which would of course only ever depend on a
> released version of Calcite.
>
> Regards
> James
>
> [1]
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
> [2]
>
> https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
> [3] If we did this in Drill master then I guess a new step in the Drill
> release process would need to see us push a commit that pins the Calcite
> dependency to the latest released version. We'd probably also become
> motivated to time Drill releases so that they happen just after Calcite
> releases. Personally I'd be prepared to try this out for a while.
>

Re: Automatically publish Calcite SNAPSHOT artefacts?

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi James,

Thanks for starting this discussion.

Regarding Calcite snapshots there is already a Jenkins job [1] publishing
regularly artifacts to the snapshots repo [2].

Apart from that, having regular integration tests between Drill and Calcite
is a very good idea. The tricky part is to decide where the integration
tests are going to run:
A) Part of Calcite CI
B) Part of Drill CI
C) Both

I will mostly comment about the option of adding extra tests in Calcite CI
since I am most familiar with it.
If this happens, I would prefer to use a fixed Drill commit as a reference.
This is more or less the same with pinning the versions in other Calcite
adapters. I am afraid that having two (or more) moving targets will make
things very unstable.
Second, I would be mindful of the total duration and frequency of the runs
especially if the intention is to run it as part of every PR.
Finally it would be nice to select a subset of Drill tests to run that are
relevant and hopefully meaningful to calcite devs and not everything so
that when they fail somebody familiar with Calcite can understand what
happens.

Apart from running integration tests anything that can be captured as a
simple Calcite unit test would be of immense help in long term stability.

Best,
Stamatis

[1] https://ci-builds.apache.org/job/Calcite/job/Calcite-snapshots/
[2]
https://repository.apache.org/content/groups/snapshots/org/apache/calcite/





On Tue, Mar 14, 2023 at 9:02 AM James Turton <dz...@apache.org> wrote:

> Hi Calcite and Drill devs
>
> Here's an idea that, while it doesn't equate to the upstreaming of Drill
> tests to the Calcite test suite, could mean that all of Drill's tests
> would automatically and frequently be run with the HEAD of the Calcite
> main branch.
>
> Something Drill started doing last year is the continuous publication of
> SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer
> Drill plugin developers targeting the upcoming version of Drill the
> ability to pull in up to date libraries without having to build Drill
> from its master branch themselves.
>
> Being concerned about causing an explosion of artefact versions we asked
> Infra whether it would be okay for Drill to republish every time a
> commit is merged into master and they told us that we are not the first
> and can publish there as often as we like. The only tricky bit to
> setting this up was obtaining Nexus credentials from GitHub Secrets and
> using them in a new GitHub workflow but we do now have a working example
> in Drill [2] and I'd happy to help if Calcite is interested in doing the
> same.
>
> If Drill then bases its master branch on these proposed SNAPSHOT
> artefacts then Drill's normal CI runs would continuously test it with
> Calcite as at its most recent commit and we'd be made aware of
> compatibility problems early. If declaring a dependency on a SNAPSHOT
> version in master is too much malpractice, instability or extra process
> [3] for Drill devs then I expect that the same thing could still be
> achieved in a new "calcite-next" branch. In any event we'd continue to
> have the stable Drill branch which would of course only ever depend on a
> released version of Calcite.
>
> Regards
> James
>
> [1]
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
> [2]
>
> https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
> [3] If we did this in Drill master then I guess a new step in the Drill
> release process would need to see us push a commit that pins the Calcite
> dependency to the latest released version. We'd probably also become
> motivated to time Drill releases so that they happen just after Calcite
> releases. Personally I'd be prepared to try this out for a while.
>