You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Josh McKenzie <jm...@apache.org> on 2022/10/19 19:50:21 UTC

[DISCUSS] Potential circleci config and workflow changes

While working w/Andres on CASSANDRA-17939 a variety of things came up regarding our circleci config and opportunities to improve it. Figured I'd hit the list up here to see what people's thoughts are since many of us intersect with these systems daily and having your workflow disrupted without having a chance to provide input is bad.

The ideas:
1. Tune parallelism levels per job (David and Ekaterina have insight on this)
2. Rename jobs on circle to be more indicative of their function
3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc
4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html
4b. Testing: https://cassandra.apache.org/_/development/testing.html
5. Flag on generate.sh to allow auto-run on push
6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message

Curious to see what folks think.

~Josh

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by David Capwell <dc...@apple.com>.
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this) 
> Question for David, do you tune only parallelism and use only xlarge? If yes, we need to talk :D 

Yes, and this is 100% because I am lazy.  Too high parallel jobs are a problem for circle as 100% of resources need to be free to start a job; so if you ask for 100 resources and 99 are free, everyone is blocked until 1 resource frees up for that job… :sadpanda:.  

Now, do we need xlarge?  Nope, but I don’t change as that doesn’t impact me… I am 100% cool getting rid of LOW/MID/HIGH and tuning our jobs to what actually is needed… I hate that we all do something different (MID, HIGH, and custom HIGH)


> On Oct 21, 2022, at 10:39 AM, David Capwell <dc...@apple.com> wrote:
> 
> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
> 
> 
> 
>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
>> 
>>> I believe it's original intention to be just about CircleCI.
>> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
>> 
>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>> 
>>> 
>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <derek@chen-becker.org <ma...@chen-becker.org>> wrote:
>>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>>> 
>>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>>> 
>>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>>> 
>>> 
>>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>>> 
>>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>>> 
>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>>> 
>>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>>> 
>>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.
> 


Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Derek Chen-Becker <de...@chen-becker.org>.
I'm writing up a more complete proposal, but here are some examples.
Parameters can be set from either the UI (not my intent) or via the
circleci CLI. Effectively, the config-2_1.yml can have parameters specified
like:

parameters:
  run_extra_test:
    type: boolean
    default: false


jobs:
  extra_test:
    steps:
      when:
        condition: << pipeline.parameters.run_extra_test >>
        steps:
          - run : # something
          - run : # something

And then jobs can conditionally execute or not. The "when" clause can also
be applied at the workflows level. When you generate the configuration, you
pass a 2nd yaml file (or text) to the circleci CLI and it will override any
parameters with what you provide. For example, if I wanted to disable the
test:

circleci config process --pipeline-parameters "run_extra_test: false"
config-2_1.yml

The tradeoff is that the config will be more verbose with conditionals. My
first proposal is actually to just use parameters to get rid of the patch
files, since the current patch files simply change values. In this case,
parameters allow us to use the config yaml like a template, which is less
error prone than patch files. However, things like conditional execution
and matrix jobs might allow us to streamline the config, especially as we
support more JDK versions.

Cheers,

Derek

On Tue, Oct 25, 2022 at 10:40 AM David Capwell <dc...@apple.com> wrote:

> This could also be a pipeline parameter instead of hacking it in
> generate.sh
>
>
> Curious how this works… I run a script that deletes all the approvals and
> removes the testing workflows… I really don’t want to use the UI at all….
> I assumed pipeline params are a UI thing, but I think the goal here for
> many of us are to ignore the UI other than looking at the results… and even
> that can be scripted...
>
>
> On Oct 24, 2022, at 4:44 PM, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> This could also be a pipeline parameter instead of hacking it in
> generate.sh. I promise I'll have a proposal before the end of the week.
>
> Derek
>
> On Mon, Oct 24, 2022 at 2:13 PM Josh McKenzie <jm...@apache.org>
> wrote:
>
>> @Ekaterina: I recall us going back and forth on whether default should be
>> require approval or not and there not being a consensus. I'm fine not
>> changing the status quo and just parameterizing that in generate.sh so
>> folks can locally script how they want to setup when they alias up
>> generate.sh.
>>
>> I'll add C-17113 to the epic as well and any other tickets anyone has in
>> flight we can link up.
>>
>> Maybe we should remove them from the workflow when the free option is used
>>
>> That'd put us in the position of having a "smoke testing suite" for free
>> tier users and the expectation of a committer running the full suite
>> pre-merge. Which, now that I type it out, is a lot more representative of
>> our current reality so we should probably do that.
>>
>> Noted re: the -f flag; I could have checked that but just hacked that out
>> in the email spur of the moment. We could just default to low / free /
>> smoke test and have -p for paid tier.
>>
>>
>> On Mon, Oct 24, 2022, at 3:23 PM, Andrés de la Peña wrote:
>>
>> - Ticket for: remove -h, have -f and -p (free and paid)
>>
>>
>> +1 to this, probably there isn't anyone using -h. There are some jobs
>> that can't pass with the free option. Maybe we should remove them from the
>> workflow when the free option is used. Perhaps that could save new
>> contributors some confusion. Or should we leave them because a subset of
>> the tests inside those jobs can still pass even with the free tier?
>>
>> By the way, the generate.sh script already accepts a -f flag. It's used
>> to stop checking that the specified environment variables are known. It was
>> meant to be a kind of general "--force" flag.
>>
>> On Mon, 24 Oct 2022 at 20:07, Ekaterina Dimitrova <e....@gmail.com>
>> wrote:
>>
>> Seems like my email crashed with Andres’ one.
>> My understanding is we will use the ticket CASSANDRA-17113 as
>> placeholder, the work there will be rebased/reworked etc depending on what
>> we agree with.
>> I also agree with the other points he made. Sounds reasonable to me
>>
>> On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e....@gmail.com>
>> wrote:
>>
>> Thank you Josh
>>
>> So about push with/without a single click, I guess you mean to
>> parameterize whether the step build needs approval or not? Pre-commit the
>> new flag will use the “no-approval” version, but during development we
>> still will be able to push the tests without immediately starting all
>> tests, right?
>> - parallelism + -h being removed - just to confirm, that means we will
>> not use xlarge containers. As David confirmed, this is not needed for all
>> jibs and it is important as otherwise whoever uses paid account will burn
>> their credits time faster for very similar duration runs.
>>
>> CASSANDRA-17930 - I will use the opportunity also to mention that many of
>> the identified missing jobs in CircleCI will be soon there - Andres is
>> working on all variations unit tests, I am doing final testing on fixing
>> the Python upgrade tests (we weren’t using the right parameters and running
>> way more jobs then we should) and Derek is looking into the rest of the
>> Python test. I still need to check whether we need something regarding
>> in-jvm etc, the simulator ones are running only for jdk8 for now,
>> confirmed. All this should unblock us to be able to do next releases based
>> on CircleCI as we agreed. Then we move to do some
>> changes/additions/improvements to Jenkins. And of course, the future
>> improvements we agreed on.
>>
>> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:
>>
>>
>> Auto-run on push? Can you elaborate?
>>
>> Yep - instead of having to go to circle and click, when you push your
>> branch the circle hook picks it up and kicks off the top level job
>> automatically. I tend to be paranoid and push a lot of incremental work
>> that's not ready for CI remotely so it's not great for me, but I think
>> having it be optional is the Right Thing.
>>
>> So here's the outstanding work I've distilled from this thread:
>> - Create an epic for circleci improvement work (we have a lot of little
>> augments to do here; keep it organized and try and avoid redundancy)
>> - Include CASSANDRA-17600 in epic umbrella
>> - Include CASSANDRA-17930 in epic umbrella
>> - Ticket to tune parallelism per job
>>     -
>>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
>> lambda a, b: True):
>>     >     d = os.path.join(src_dir, 'test', kind)
>>     >     num_files = 0
>>     >     for root, dirs, files in os.walk(d):
>>     >         for f in files:
>>     >             if f.endswith('Test.java') and
>> include(os.path.join(root, f), f):
>>     >                 num_files += 1
>>     >     return math.floor(num_files / num_file_in_worker)
>>     >
>>     > def fix_parallelism(args, contents):
>>     >     jobs = contents['jobs']
>>     >
>>     >     unit_parallelism                = java_parallelism(args.src,
>> 'unit', 20)
>>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>>     - `TL;DR - I find all test files we are going to run, and based off a
>> pre-defined variable that says “idea” number of files per worker, I then
>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
>> workers.  Can I be “smarter” by knowing which files have higher cost?
>> Sure… but the “perfect” and the “average” are too similar that it wasn’t
>> worth it...`
>> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>>     - Path to activate all supported JDK's for pre-commit at root
>> (one-click pre-merge full validation)
>>     - Path to activate per JDK below that (interim work partial
>> validation)
>> - Ticket to rename jobs in circleci
>>     - Reference comment:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>>     - (buildjdk)_(runjdk)_(testsuite) format:
>>     - j8_j8_jvm_dtests
>>     - j8_j11_jvm_dtests
>>     - j11_j11_jvm_dtest_vnode
>>     etc
>> - Ticket for flag in generate.sh to support auto run on push (see
>> response above)
>> - Ticket for: remove -h, have -f and -p (free and paid) (probably
>> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>>
>> Anything wrong w/the above or anything missed? If not, I'll go do some
>> JIRA'ing.
>>
>>
>> ~Josh
>>
>>
>> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>>
>> I am cool with removing circle if apache CI is stable and works, we do
>> need to solve the non-committer issue but would argue that partially exists
>> in circle today (you can be a non-commuter with a paid account, but you
>> can’t be a non-committer with a free account)
>>
>> There's a few threads here:
>> 1. non-committers should be able to run ci
>> 2. People that have resources and want to run ci faster should be able to
>> do so (assuming the ci of record could serve to be faster)
>> 3. ci should be stable
>>
>> Thus far we haven't landed on 1 system that satisfies all 3. There's some
>> background discussions brainstorming how to get there; when / if things
>> come from that they'll as always be brought to the list for discussion.
>>
>> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>>
>> I agree with David with one caveat - last time I checked only some Python
>> tests lack enough resources with the free tier. The rest run slower than
>> with a paid account, but they do fine. In fact I use the free tier if I
>> want to test only unit or in-jvm tests sometimes. I guess that is what he
>> meant by partially but even being able to run the non-Python tests is a win
>> IMHO. If we find a solution for all tests though… even better.
>> @Derek your idea sounds interesting, I will be happy to see a proposal.
>> Thank you
>>
>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>>
>> I am cool with removing circle if apache CI is stable and works, we do
>> need to solve the non-committer issue but would argue that partially exists
>> in circle today (you can be a non-commuter with a paid account, but you
>> can’t be a non-committer with a free account)
>>
>>
>>
>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>
>> I believe it's original intention to be just about CircleCI.
>>
>> It was but fwiw I'm good w/us exploring adjacent things regarding CI
>> here. I'm planning on deep diving on the thread tomorrow and distilling a
>> snapshot of the work we have a consensus on for circle and summarizing here
>> so we don't lose that. Seems like it's fairly non-controversial.
>>
>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>
>>
>>
>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
>> wrote:
>>
>> Would the preclusion of non-committers also prevent us from configuring
>> Jenkins to auto-test on PR independent of who opens it?
>>
>> One of my current concerns is that we're maintaining 2x the CI for 1x the
>> benefit, and I don't currently see an easy way to unify them (perhaps a
>> lack of imagination?). I know there's a long history behind the choice of
>> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
>> went into that decision, but that decision has costs beyond just a paid
>> CircleCI account. My long term, probably naive, goals for CI would be to:
>>
>> 1. Have a CI system that is *fully* available to *any* contributor,
>> modulo safeguards to prevent abuse
>>
>>
>>
>> This thread is going off-topic, as I believe it's original intention to
>> be just about CircleCI.
>>
>> But on your point… our community CI won't be allowed (by ASF), nor have
>> capacity (limited donated resources), to run pre-commit testing by anyone
>> and everyone.
>>
>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o
>> (make sure to label them so they can be revoked easily), but we still face
>> the issue that too many pre-commit runs impacts the throughput and quality
>> of the post-commit runs (though this has improved recently).
>>
>> It's on my wishlist to be able to: with a single command line; spin up
>> the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it
>> and collect results, and tear it down. Variations on this would solve
>> non-committers being able to repeat, use, and work on their own (or a
>> separately donated) CI system, and folk/companies with money to be able to
>> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
>> time. Having this reproducibility of the CI system would make testing
>> changes to it easier as well, so I'd expect a positive feedback loop here.
>>
>> I have some rough ideas on how to get started on this, if anyone would
>> like to buddy up on it.
>>
>>
>>
>>
>>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by David Capwell <dc...@apple.com>.
> This could also be a pipeline parameter instead of hacking it in generate.sh


Curious how this works… I run a script that deletes all the approvals and removes the testing workflows… I really don’t want to use the UI at all….  I assumed pipeline params are a UI thing, but I think the goal here for many of us are to ignore the UI other than looking at the results… and even that can be scripted...


> On Oct 24, 2022, at 4:44 PM, Derek Chen-Becker <de...@chen-becker.org> wrote:
> 
> This could also be a pipeline parameter instead of hacking it in generate.sh. I promise I'll have a proposal before the end of the week.
> 
> Derek
> 
> On Mon, Oct 24, 2022 at 2:13 PM Josh McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
> @Ekaterina: I recall us going back and forth on whether default should be require approval or not and there not being a consensus. I'm fine not changing the status quo and just parameterizing that in generate.sh so folks can locally script how they want to setup when they alias up generate.sh.
> 
> I'll add C-17113 to the epic as well and any other tickets anyone has in flight we can link up.
> 
>> Maybe we should remove them from the workflow when the free option is used
> That'd put us in the position of having a "smoke testing suite" for free tier users and the expectation of a committer running the full suite pre-merge. Which, now that I type it out, is a lot more representative of our current reality so we should probably do that.
> 
> Noted re: the -f flag; I could have checked that but just hacked that out in the email spur of the moment. We could just default to low / free / smoke test and have -p for paid tier.
> 
> 
> On Mon, Oct 24, 2022, at 3:23 PM, Andrés de la Peña wrote:
>> - Ticket for: remove -h, have -f and -p (free and paid)
>> 
>> +1 to this, probably there isn't anyone using -h. There are some jobs that can't pass with the free option. Maybe we should remove them from the workflow when the free option is used. Perhaps that could save new contributors some confusion. Or should we leave them because a subset of the tests inside those jobs can still pass even with the free tier?
>> 
>> By the way, the generate.sh script already accepts a -f flag. It's used to stop checking that the specified environment variables are known. It was meant to be a kind of general "--force" flag.
>> 
>> On Mon, 24 Oct 2022 at 20:07, Ekaterina Dimitrova <e.dimitrova@gmail.com <ma...@gmail.com>> wrote:
>> Seems like my email crashed with Andres’ one. 
>> My understanding is we will use the ticket CASSANDRA-17113 as placeholder, the work there will be rebased/reworked etc depending on what we agree with. 
>> I also agree with the other points he made. Sounds reasonable to me
>> 
>> On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e.dimitrova@gmail.com <ma...@gmail.com>> wrote:
>> Thank you Josh
>> 
>> So about push with/without a single click, I guess you mean to parameterize whether the step build needs approval or not? Pre-commit the new flag will use the “no-approval” version, but during development we still will be able to push the tests without immediately starting all tests, right?
>> - parallelism + -h being removed - just to confirm, that means we will not use xlarge containers. As David confirmed, this is not needed for all jibs and it is important as otherwise whoever uses paid account will burn their credits time faster for very similar duration runs. 
>> 
>> CASSANDRA-17930 - I will use the opportunity also to mention that many of the identified missing jobs in CircleCI will be soon there - Andres is working on all variations unit tests, I am doing final testing on fixing the Python upgrade tests (we weren’t using the right parameters and running way more jobs then we should) and Derek is looking into the rest of the Python test. I still need to check whether we need something regarding in-jvm etc, the simulator ones are running only for jdk8 for now, confirmed. All this should unblock us to be able to do next releases based on CircleCI as we agreed. Then we move to do some changes/additions/improvements to Jenkins. And of course, the future improvements we agreed on. 
>> 
>> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
>> 
>>> Auto-run on push? Can you elaborate?
>> Yep - instead of having to go to circle and click, when you push your branch the circle hook picks it up and kicks off the top level job automatically. I tend to be paranoid and push a lot of incremental work that's not ready for CI remotely so it's not great for me, but I think having it be optional is the Right Thing.
>> 
>> So here's the outstanding work I've distilled from this thread:
>> - Create an epic for circleci improvement work (we have a lot of little augments to do here; keep it organized and try and avoid redundancy)
>> - Include CASSANDRA-17600 in epic umbrella  
>> - Include CASSANDRA-17930 in epic umbrella
>> - Ticket to tune parallelism per job  
>>     -  
>>     > def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
>>     >     d = os.path.join(src_dir, 'test', kind)
>>     >     num_files = 0
>>     >     for root, dirs, files in os.walk(d):
>>     >         for f in files:
>>     >             if f.endswith('Test.java') and include(os.path.join(root, f), f):
>>     >                 num_files += 1
>>     >     return math.floor(num_files / num_file_in_worker)
>>     > 
>>     > def fix_parallelism(args, contents):
>>     >     jobs = contents['jobs']
>>     > 
>>     >     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
>>     >     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
>>     - `TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...`  
>> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>>     - Path to activate all supported JDK's for pre-commit at root (one-click pre-merge full validation)
>>     - Path to activate per JDK below that (interim work partial validation)
>> - Ticket to rename jobs in circleci
>>     - Reference comment: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016 <https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016>
>>     - (buildjdk)_(runjdk)_(testsuite) format:
>>     - j8_j8_jvm_dtests
>>     - j8_j11_jvm_dtests
>>     - j11_j11_jvm_dtest_vnode
>>     etc
>> - Ticket for flag in generate.sh to support auto run on push (see response above)
>> - Ticket for: remove -h, have -f and -p (free and paid) (probably intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600 <https://issues.apache.org/jira/browse/CASSANDRA-17600>)
>> 
>> Anything wrong w/the above or anything missed? If not, I'll go do some JIRA'ing.
>> 
>> 
>> ~Josh
>> 
>> 
>> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>>>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>>> There's a few threads here:
>>> 1. non-committers should be able to run ci
>>> 2. People that have resources and want to run ci faster should be able to do so (assuming the ci of record could serve to be faster)
>>> 3. ci should be stable
>>> 
>>> Thus far we haven't landed on 1 system that satisfies all 3. There's some background discussions brainstorming how to get there; when / if things come from that they'll as always be brought to the list for discussion.
>>> 
>>> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>>>> I agree with David with one caveat - last time I checked only some Python tests lack enough resources with the free tier. The rest run slower than with a paid account, but they do fine. In fact I use the free tier if I want to test only unit or in-jvm tests sometimes. I guess that is what he meant by partially but even being able to run the non-Python tests is a win IMHO. If we find a solution for all tests though… even better.
>>>> @Derek your idea sounds interesting, I will be happy to see a proposal. Thank you
>>>> 
>>>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dcapwell@apple.com <ma...@apple.com>> wrote:
>>>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>>>> 
>>>> 
>>>> 
>>>>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
>>>>> 
>>>>>> I believe it's original intention to be just about CircleCI.
>>>>> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
>>>>> 
>>>>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>>>>> 
>>>>>> 
>>>>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <derek@chen-becker.org <ma...@chen-becker.org>> wrote:
>>>>>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>>>>>> 
>>>>>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>>>>>> 
>>>>>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>>>>>> 
>>>>>> 
>>>>>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>>>>>> 
>>>>>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>>>>>> 
>>>>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>>>>>> 
>>>>>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>>>>>> 
>>>>>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.
>>> 
>> 
> 
> 
> 
> -- 
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker <https://keybase.io/dchenbecker> and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org <https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
> 


Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Derek Chen-Becker <de...@chen-becker.org>.
This could also be a pipeline parameter instead of hacking it in
generate.sh. I promise I'll have a proposal before the end of the week.

Derek

On Mon, Oct 24, 2022 at 2:13 PM Josh McKenzie <jm...@apache.org> wrote:

> @Ekaterina: I recall us going back and forth on whether default should be
> require approval or not and there not being a consensus. I'm fine not
> changing the status quo and just parameterizing that in generate.sh so
> folks can locally script how they want to setup when they alias up
> generate.sh.
>
> I'll add C-17113 to the epic as well and any other tickets anyone has in
> flight we can link up.
>
> Maybe we should remove them from the workflow when the free option is used
>
> That'd put us in the position of having a "smoke testing suite" for free
> tier users and the expectation of a committer running the full suite
> pre-merge. Which, now that I type it out, is a lot more representative of
> our current reality so we should probably do that.
>
> Noted re: the -f flag; I could have checked that but just hacked that out
> in the email spur of the moment. We could just default to low / free /
> smoke test and have -p for paid tier.
>
>
> On Mon, Oct 24, 2022, at 3:23 PM, Andrés de la Peña wrote:
>
> - Ticket for: remove -h, have -f and -p (free and paid)
>
>
> +1 to this, probably there isn't anyone using -h. There are some jobs that
> can't pass with the free option. Maybe we should remove them from the
> workflow when the free option is used. Perhaps that could save new
> contributors some confusion. Or should we leave them because a subset of
> the tests inside those jobs can still pass even with the free tier?
>
> By the way, the generate.sh script already accepts a -f flag. It's used to
> stop checking that the specified environment variables are known. It was
> meant to be a kind of general "--force" flag.
>
> On Mon, 24 Oct 2022 at 20:07, Ekaterina Dimitrova <e....@gmail.com>
> wrote:
>
> Seems like my email crashed with Andres’ one.
> My understanding is we will use the ticket CASSANDRA-17113 as
> placeholder, the work there will be rebased/reworked etc depending on what
> we agree with.
> I also agree with the other points he made. Sounds reasonable to me
>
> On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e....@gmail.com>
> wrote:
>
> Thank you Josh
>
> So about push with/without a single click, I guess you mean to
> parameterize whether the step build needs approval or not? Pre-commit the
> new flag will use the “no-approval” version, but during development we
> still will be able to push the tests without immediately starting all
> tests, right?
> - parallelism + -h being removed - just to confirm, that means we will not
> use xlarge containers. As David confirmed, this is not needed for all jibs
> and it is important as otherwise whoever uses paid account will burn their
> credits time faster for very similar duration runs.
>
> CASSANDRA-17930 - I will use the opportunity also to mention that many of
> the identified missing jobs in CircleCI will be soon there - Andres is
> working on all variations unit tests, I am doing final testing on fixing
> the Python upgrade tests (we weren’t using the right parameters and running
> way more jobs then we should) and Derek is looking into the rest of the
> Python test. I still need to check whether we need something regarding
> in-jvm etc, the simulator ones are running only for jdk8 for now,
> confirmed. All this should unblock us to be able to do next releases based
> on CircleCI as we agreed. Then we move to do some
> changes/additions/improvements to Jenkins. And of course, the future
> improvements we agreed on.
>
> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:
>
>
> Auto-run on push? Can you elaborate?
>
> Yep - instead of having to go to circle and click, when you push your
> branch the circle hook picks it up and kicks off the top level job
> automatically. I tend to be paranoid and push a lot of incremental work
> that's not ready for CI remotely so it's not great for me, but I think
> having it be optional is the Right Thing.
>
> So here's the outstanding work I've distilled from this thread:
> - Create an epic for circleci improvement work (we have a lot of little
> augments to do here; keep it organized and try and avoid redundancy)
> - Include CASSANDRA-17600 in epic umbrella
> - Include CASSANDRA-17930 in epic umbrella
> - Ticket to tune parallelism per job
>     -
>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
> lambda a, b: True):
>     >     d = os.path.join(src_dir, 'test', kind)
>     >     num_files = 0
>     >     for root, dirs, files in os.walk(d):
>     >         for f in files:
>     >             if f.endswith('Test.java') and
> include(os.path.join(root, f), f):
>     >                 num_files += 1
>     >     return math.floor(num_files / num_file_in_worker)
>     >
>     > def fix_parallelism(args, contents):
>     >     jobs = contents['jobs']
>     >
>     >     unit_parallelism                = java_parallelism(args.src,
> 'unit', 20)
>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
> 'distributed', 2, lambda full, name: 'upgrade' in full)
>     - `TL;DR - I find all test files we are going to run, and based off a
> pre-defined variable that says “idea” number of files per worker, I then
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
> workers.  Can I be “smarter” by knowing which files have higher cost?
> Sure… but the “perfect” and the “average” are too similar that it wasn’t
> worth it...`
> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>     - Path to activate all supported JDK's for pre-commit at root
> (one-click pre-merge full validation)
>     - Path to activate per JDK below that (interim work partial validation)
> - Ticket to rename jobs in circleci
>     - Reference comment:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>     - (buildjdk)_(runjdk)_(testsuite) format:
>     - j8_j8_jvm_dtests
>     - j8_j11_jvm_dtests
>     - j11_j11_jvm_dtest_vnode
>     etc
> - Ticket for flag in generate.sh to support auto run on push (see response
> above)
> - Ticket for: remove -h, have -f and -p (free and paid) (probably
> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>
> Anything wrong w/the above or anything missed? If not, I'll go do some
> JIRA'ing.
>
>
> ~Josh
>
>
> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
> There's a few threads here:
> 1. non-committers should be able to run ci
> 2. People that have resources and want to run ci faster should be able to
> do so (assuming the ci of record could serve to be faster)
> 3. ci should be stable
>
> Thus far we haven't landed on 1 system that satisfies all 3. There's some
> background discussions brainstorming how to get there; when / if things
> come from that they'll as always be brought to the list for discussion.
>
> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>
> I agree with David with one caveat - last time I checked only some Python
> tests lack enough resources with the free tier. The rest run slower than
> with a paid account, but they do fine. In fact I use the free tier if I
> want to test only unit or in-jvm tests sometimes. I guess that is what he
> meant by partially but even being able to run the non-Python tests is a win
> IMHO. If we find a solution for all tests though… even better.
> @Derek your idea sounds interesting, I will be happy to see a proposal.
> Thank you
>
> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
>
>
> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>
> I believe it's original intention to be just about CircleCI.
>
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here.
> I'm planning on deep diving on the thread tomorrow and distilling a
> snapshot of the work we have a consensus on for circle and summarizing here
> so we don't lose that. Seems like it's fairly non-controversial.
>
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>
>
>
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>
>
>
> This thread is going off-topic, as I believe it's original intention to be
> just about CircleCI.
>
> But on your point… our community CI won't be allowed (by ASF), nor have
> capacity (limited donated resources), to run pre-commit testing by anyone
> and everyone.
>
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
> sure to label them so they can be revoked easily), but we still face the
> issue that too many pre-commit runs impacts the throughput and quality of
> the post-commit runs (though this has improved recently).
>
> It's on my wishlist to be able to: with a single command line; spin up the
> ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
> collect results, and tear it down. Variations on this would solve
> non-committers being able to repeat, use, and work on their own (or a
> separately donated) CI system, and folk/companies with money to be able to
> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
> time. Having this reproducibility of the CI system would make testing
> changes to it easier as well, so I'd expect a positive feedback loop here.
>
> I have some rough ideas on how to get started on this, if anyone would
> like to buddy up on it.
>
>
>
>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Josh McKenzie <jm...@apache.org>.
@Ekaterina: I recall us going back and forth on whether default should be require approval or not and there not being a consensus. I'm fine not changing the status quo and just parameterizing that in generate.sh so folks can locally script how they want to setup when they alias up generate.sh.

I'll add C-17113 to the epic as well and any other tickets anyone has in flight we can link up.

> Maybe we should remove them from the workflow when the free option is used
That'd put us in the position of having a "smoke testing suite" for free tier users and the expectation of a committer running the full suite pre-merge. Which, now that I type it out, is a lot more representative of our current reality so we should probably do that.

Noted re: the -f flag; I could have checked that but just hacked that out in the email spur of the moment. We could just default to low / free / smoke test and have -p for paid tier.


On Mon, Oct 24, 2022, at 3:23 PM, Andrés de la Peña wrote:
>> - Ticket for: remove -h, have -f and -p (free and paid)
> 
> +1 to this, probably there isn't anyone using -h. There are some jobs that can't pass with the free option. Maybe we should remove them from the workflow when the free option is used. Perhaps that could save new contributors some confusion. Or should we leave them because a subset of the tests inside those jobs can still pass even with the free tier?
> 
> By the way, the generate.sh script already accepts a -f flag. It's used to stop checking that the specified environment variables are known. It was meant to be a kind of general "--force" flag.
> 
> On Mon, 24 Oct 2022 at 20:07, Ekaterina Dimitrova <e....@gmail.com> wrote:
>> Seems like my email crashed with Andres’ one. 
>> My understanding is we will use the ticket CASSANDRA-17113 as placeholder, the work there will be rebased/reworked etc depending on what we agree with. 
>> I also agree with the other points he made. Sounds reasonable to me
>> 
>> On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e....@gmail.com> wrote:
>>> Thank you Josh
>>> 
>>> So about push with/without a single click, I guess you mean to parameterize whether the step build needs approval or not? Pre-commit the new flag will use the “no-approval” version, but during development we still will be able to push the tests without immediately starting all tests, right?
>>> - parallelism + -h being removed - just to confirm, that means we will not use xlarge containers. As David confirmed, this is not needed for all jibs and it is important as otherwise whoever uses paid account will burn their credits time faster for very similar duration runs. 
>>> 
>>> CASSANDRA-17930 - I will use the opportunity also to mention that many of the identified missing jobs in CircleCI will be soon there - Andres is working on all variations unit tests, I am doing final testing on fixing the Python upgrade tests (we weren’t using the right parameters and running way more jobs then we should) and Derek is looking into the rest of the Python test. I still need to check whether we need something regarding in-jvm etc, the simulator ones are running only for jdk8 for now, confirmed. All this should unblock us to be able to do next releases based on CircleCI as we agreed. Then we move to do some changes/additions/improvements to Jenkins. And of course, the future improvements we agreed on. 
>>> 
>>> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:
>>>> __
>>>>> Auto-run on push? Can you elaborate?
>>>> Yep - instead of having to go to circle and click, when you push your branch the circle hook picks it up and kicks off the top level job automatically. I tend to be paranoid and push a lot of incremental work that's not ready for CI remotely so it's not great for me, but I think having it be optional is the Right Thing.
>>>> 
>>>> So here's the outstanding work I've distilled from this thread:
>>>> - Create an epic for circleci improvement work (we have a lot of little augments to do here; keep it organized and try and avoid redundancy)
>>>> - Include CASSANDRA-17600 in epic umbrella  
>>>> - Include CASSANDRA-17930 in epic umbrella
>>>> - Ticket to tune parallelism per job  
>>>>     -  
>>>>     > def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
>>>>     >     d = os.path.join(src_dir, 'test', kind)
>>>>     >     num_files = 0
>>>>     >     for root, dirs, files in os.walk(d):
>>>>     >         for f in files:
>>>>     >             if f.endswith('Test.java') and include(os.path.join(root, f), f):
>>>>     >                 num_files += 1
>>>>     >     return math.floor(num_files / num_file_in_worker)
>>>>     > 
>>>>     > def fix_parallelism(args, contents):
>>>>     >     jobs = contents['jobs']
>>>>     > 
>>>>     >     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
>>>>     >     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>>>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
>>>>     - `TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...`  
>>>> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>>>>     - Path to activate all supported JDK's for pre-commit at root (one-click pre-merge full validation)
>>>>     - Path to activate per JDK below that (interim work partial validation)
>>>> - Ticket to rename jobs in circleci
>>>>     - Reference comment: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>>>>     - (buildjdk)_(runjdk)_(testsuite) format:
>>>>     - j8_j8_jvm_dtests
>>>>     - j8_j11_jvm_dtests
>>>>     - j11_j11_jvm_dtest_vnode
>>>>     etc
>>>> - Ticket for flag in generate.sh to support auto run on push (see response above)
>>>> - Ticket for: remove -h, have -f and -p (free and paid) (probably intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>>>> 
>>>> Anything wrong w/the above or anything missed? If not, I'll go do some JIRA'ing.
>>>> 
>>>> 
>>>> ~Josh
>>>> 
>>>> 
>>>> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>>>>>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>>>>> There's a few threads here:
>>>>> 1. non-committers should be able to run ci
>>>>> 2. People that have resources and want to run ci faster should be able to do so (assuming the ci of record could serve to be faster)
>>>>> 3. ci should be stable
>>>>> 
>>>>> Thus far we haven't landed on 1 system that satisfies all 3. There's some background discussions brainstorming how to get there; when / if things come from that they'll as always be brought to the list for discussion.
>>>>> 
>>>>> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>>>>>> I agree with David with one caveat - last time I checked only some Python tests lack enough resources with the free tier. The rest run slower than with a paid account, but they do fine. In fact I use the free tier if I want to test only unit or in-jvm tests sometimes. I guess that is what he meant by partially but even being able to run the non-Python tests is a win IMHO. If we find a solution for all tests though… even better.
>>>>>> @Derek your idea sounds interesting, I will be happy to see a proposal. Thank you
>>>>>> 
>>>>>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>>>>>>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> I believe it's original intention to be just about CircleCI.
>>>>>>>> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
>>>>>>>> 
>>>>>>>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org> wrote:
>>>>>>>>>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>>>>>>>>>> 
>>>>>>>>>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>>>>>>>>>> 
>>>>>>>>>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>>>>>>>>> 
>>>>>>>>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>>>>>>>>> 
>>>>>>>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>>>>>>>>> 
>>>>>>>>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>>>>>>>>> 
>>>>>>>>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.
>>>>> 
>>>> 

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Andrés de la Peña <ad...@apache.org>.
>
> - Ticket for: remove -h, have -f and -p (free and paid)


+1 to this, probably there isn't anyone using -h. There are some jobs that
can't pass with the free option. Maybe we should remove them from the
workflow when the free option is used. Perhaps that could save new
contributors some confusion. Or should we leave them because a subset of
the tests inside those jobs can still pass even with the free tier?

By the way, the generate.sh script already accepts a -f flag. It's used to
stop checking that the specified environment variables are known. It was
meant to be a kind of general "--force" flag.

On Mon, 24 Oct 2022 at 20:07, Ekaterina Dimitrova <e....@gmail.com>
wrote:

> Seems like my email crashed with Andres’ one.
> My understanding is we will use the ticket CASSANDRA-17113 as
> placeholder, the work there will be rebased/reworked etc depending on what
> we agree with.
> I also agree with the other points he made. Sounds reasonable to me
>
> On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e....@gmail.com>
> wrote:
>
>> Thank you Josh
>>
>> So about push with/without a single click, I guess you mean to
>> parameterize whether the step build needs approval or not? Pre-commit the
>> new flag will use the “no-approval” version, but during development we
>> still will be able to push the tests without immediately starting all
>> tests, right?
>> - parallelism + -h being removed - just to confirm, that means we will
>> not use xlarge containers. As David confirmed, this is not needed for all
>> jibs and it is important as otherwise whoever uses paid account will burn
>> their credits time faster for very similar duration runs.
>>
>> CASSANDRA-17930 - I will use the opportunity also to mention that many of
>> the identified missing jobs in CircleCI will be soon there - Andres is
>> working on all variations unit tests, I am doing final testing on fixing
>> the Python upgrade tests (we weren’t using the right parameters and running
>> way more jobs then we should) and Derek is looking into the rest of the
>> Python test. I still need to check whether we need something regarding
>> in-jvm etc, the simulator ones are running only for jdk8 for now,
>> confirmed. All this should unblock us to be able to do next releases based
>> on CircleCI as we agreed. Then we move to do some
>> changes/additions/improvements to Jenkins. And of course, the future
>> improvements we agreed on.
>>
>> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:
>>
>>> Auto-run on push? Can you elaborate?
>>>
>>> Yep - instead of having to go to circle and click, when you push your
>>> branch the circle hook picks it up and kicks off the top level job
>>> automatically. I tend to be paranoid and push a lot of incremental work
>>> that's not ready for CI remotely so it's not great for me, but I think
>>> having it be optional is the Right Thing.
>>>
>>> So here's the outstanding work I've distilled from this thread:
>>> - Create an epic for circleci improvement work (we have a lot of little
>>> augments to do here; keep it organized and try and avoid redundancy)
>>> - Include CASSANDRA-17600 in epic umbrella
>>> - Include CASSANDRA-17930 in epic umbrella
>>> - Ticket to tune parallelism per job
>>>     -
>>>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
>>> lambda a, b: True):
>>>     >     d = os.path.join(src_dir, 'test', kind)
>>>     >     num_files = 0
>>>     >     for root, dirs, files in os.walk(d):
>>>     >         for f in files:
>>>     >             if f.endswith('Test.java') and
>>> include(os.path.join(root, f), f):
>>>     >                 num_files += 1
>>>     >     return math.floor(num_files / num_file_in_worker)
>>>     >
>>>     > def fix_parallelism(args, contents):
>>>     >     jobs = contents['jobs']
>>>     >
>>>     >     unit_parallelism                = java_parallelism(args.src,
>>> 'unit', 20)
>>>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
>>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
>>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>>>     - `TL;DR - I find all test files we are going to run, and based off
>>> a pre-defined variable that says “idea” number of files per worker, I then
>>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
>>> workers.  Can I be “smarter” by knowing which files have higher cost?
>>> Sure… but the “perfect” and the “average” are too similar that it wasn’t
>>> worth it...`
>>> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>>>     - Path to activate all supported JDK's for pre-commit at root
>>> (one-click pre-merge full validation)
>>>     - Path to activate per JDK below that (interim work partial
>>> validation)
>>> - Ticket to rename jobs in circleci
>>>     - Reference comment:
>>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>>>     - (buildjdk)_(runjdk)_(testsuite) format:
>>>     - j8_j8_jvm_dtests
>>>     - j8_j11_jvm_dtests
>>>     - j11_j11_jvm_dtest_vnode
>>>     etc
>>> - Ticket for flag in generate.sh to support auto run on push (see
>>> response above)
>>> - Ticket for: remove -h, have -f and -p (free and paid) (probably
>>> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>>>
>>> Anything wrong w/the above or anything missed? If not, I'll go do some
>>> JIRA'ing.
>>>
>>>
>>> ~Josh
>>>
>>>
>>> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>>>
>>> I am cool with removing circle if apache CI is stable and works, we do
>>> need to solve the non-committer issue but would argue that partially exists
>>> in circle today (you can be a non-commuter with a paid account, but you
>>> can’t be a non-committer with a free account)
>>>
>>> There's a few threads here:
>>> 1. non-committers should be able to run ci
>>> 2. People that have resources and want to run ci faster should be able
>>> to do so (assuming the ci of record could serve to be faster)
>>> 3. ci should be stable
>>>
>>> Thus far we haven't landed on 1 system that satisfies all 3. There's
>>> some background discussions brainstorming how to get there; when / if
>>> things come from that they'll as always be brought to the list for
>>> discussion.
>>>
>>> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>>>
>>> I agree with David with one caveat - last time I checked only some
>>> Python tests lack enough resources with the free tier. The rest run slower
>>> than with a paid account, but they do fine. In fact I use the free tier if
>>> I want to test only unit or in-jvm tests sometimes. I guess that is what he
>>> meant by partially but even being able to run the non-Python tests is a win
>>> IMHO. If we find a solution for all tests though… even better.
>>> @Derek your idea sounds interesting, I will be happy to see a proposal.
>>> Thank you
>>>
>>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>>>
>>> I am cool with removing circle if apache CI is stable and works, we do
>>> need to solve the non-committer issue but would argue that partially exists
>>> in circle today (you can be a non-commuter with a paid account, but you
>>> can’t be a non-committer with a free account)
>>>
>>>
>>>
>>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>>
>>> I believe it's original intention to be just about CircleCI.
>>>
>>> It was but fwiw I'm good w/us exploring adjacent things regarding CI
>>> here. I'm planning on deep diving on the thread tomorrow and distilling a
>>> snapshot of the work we have a consensus on for circle and summarizing here
>>> so we don't lose that. Seems like it's fairly non-controversial.
>>>
>>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>>
>>>
>>>
>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
>>> wrote:
>>>
>>> Would the preclusion of non-committers also prevent us from configuring
>>> Jenkins to auto-test on PR independent of who opens it?
>>>
>>> One of my current concerns is that we're maintaining 2x the CI for 1x
>>> the benefit, and I don't currently see an easy way to unify them (perhaps a
>>> lack of imagination?). I know there's a long history behind the choice of
>>> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
>>> went into that decision, but that decision has costs beyond just a paid
>>> CircleCI account. My long term, probably naive, goals for CI would be to:
>>>
>>> 1. Have a CI system that is *fully* available to *any* contributor,
>>> modulo safeguards to prevent abuse
>>>
>>>
>>>
>>> This thread is going off-topic, as I believe it's original intention to
>>> be just about CircleCI.
>>>
>>> But on your point… our community CI won't be allowed (by ASF), nor have
>>> capacity (limited donated resources), to run pre-commit testing by anyone
>>> and everyone.
>>>
>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o
>>> (make sure to label them so they can be revoked easily), but we still face
>>> the issue that too many pre-commit runs impacts the throughput and quality
>>> of the post-commit runs (though this has improved recently).
>>>
>>> It's on my wishlist to be able to: with a single command line; spin up
>>> the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it
>>> and collect results, and tear it down. Variations on this would solve
>>> non-committers being able to repeat, use, and work on their own (or a
>>> separately donated) CI system, and folk/companies with money to be able to
>>> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
>>> time. Having this reproducibility of the CI system would make testing
>>> changes to it easier as well, so I'd expect a positive feedback loop here.
>>>
>>> I have some rough ideas on how to get started on this, if anyone would
>>> like to buddy up on it.
>>>
>>>
>>>
>>>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Seems like my email crashed with Andres’ one.
My understanding is we will use the ticket CASSANDRA-17113 as placeholder,
the work there will be rebased/reworked etc depending on what we agree
with.
I also agree with the other points he made. Sounds reasonable to me

On Mon, 24 Oct 2022 at 15:03, Ekaterina Dimitrova <e....@gmail.com>
wrote:

> Thank you Josh
>
> So about push with/without a single click, I guess you mean to
> parameterize whether the step build needs approval or not? Pre-commit the
> new flag will use the “no-approval” version, but during development we
> still will be able to push the tests without immediately starting all
> tests, right?
> - parallelism + -h being removed - just to confirm, that means we will not
> use xlarge containers. As David confirmed, this is not needed for all jibs
> and it is important as otherwise whoever uses paid account will burn their
> credits time faster for very similar duration runs.
>
> CASSANDRA-17930 - I will use the opportunity also to mention that many of
> the identified missing jobs in CircleCI will be soon there - Andres is
> working on all variations unit tests, I am doing final testing on fixing
> the Python upgrade tests (we weren’t using the right parameters and running
> way more jobs then we should) and Derek is looking into the rest of the
> Python test. I still need to check whether we need something regarding
> in-jvm etc, the simulator ones are running only for jdk8 for now,
> confirmed. All this should unblock us to be able to do next releases based
> on CircleCI as we agreed. Then we move to do some
> changes/additions/improvements to Jenkins. And of course, the future
> improvements we agreed on.
>
> On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:
>
>> Auto-run on push? Can you elaborate?
>>
>> Yep - instead of having to go to circle and click, when you push your
>> branch the circle hook picks it up and kicks off the top level job
>> automatically. I tend to be paranoid and push a lot of incremental work
>> that's not ready for CI remotely so it's not great for me, but I think
>> having it be optional is the Right Thing.
>>
>> So here's the outstanding work I've distilled from this thread:
>> - Create an epic for circleci improvement work (we have a lot of little
>> augments to do here; keep it organized and try and avoid redundancy)
>> - Include CASSANDRA-17600 in epic umbrella
>> - Include CASSANDRA-17930 in epic umbrella
>> - Ticket to tune parallelism per job
>>     -
>>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
>> lambda a, b: True):
>>     >     d = os.path.join(src_dir, 'test', kind)
>>     >     num_files = 0
>>     >     for root, dirs, files in os.walk(d):
>>     >         for f in files:
>>     >             if f.endswith('Test.java') and
>> include(os.path.join(root, f), f):
>>     >                 num_files += 1
>>     >     return math.floor(num_files / num_file_in_worker)
>>     >
>>     > def fix_parallelism(args, contents):
>>     >     jobs = contents['jobs']
>>     >
>>     >     unit_parallelism                = java_parallelism(args.src,
>> 'unit', 20)
>>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>>     - `TL;DR - I find all test files we are going to run, and based off a
>> pre-defined variable that says “idea” number of files per worker, I then
>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
>> workers.  Can I be “smarter” by knowing which files have higher cost?
>> Sure… but the “perfect” and the “average” are too similar that it wasn’t
>> worth it...`
>> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>>     - Path to activate all supported JDK's for pre-commit at root
>> (one-click pre-merge full validation)
>>     - Path to activate per JDK below that (interim work partial
>> validation)
>> - Ticket to rename jobs in circleci
>>     - Reference comment:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>>     - (buildjdk)_(runjdk)_(testsuite) format:
>>     - j8_j8_jvm_dtests
>>     - j8_j11_jvm_dtests
>>     - j11_j11_jvm_dtest_vnode
>>     etc
>> - Ticket for flag in generate.sh to support auto run on push (see
>> response above)
>> - Ticket for: remove -h, have -f and -p (free and paid) (probably
>> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>>
>> Anything wrong w/the above or anything missed? If not, I'll go do some
>> JIRA'ing.
>>
>>
>> ~Josh
>>
>>
>> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>>
>> I am cool with removing circle if apache CI is stable and works, we do
>> need to solve the non-committer issue but would argue that partially exists
>> in circle today (you can be a non-commuter with a paid account, but you
>> can’t be a non-committer with a free account)
>>
>> There's a few threads here:
>> 1. non-committers should be able to run ci
>> 2. People that have resources and want to run ci faster should be able to
>> do so (assuming the ci of record could serve to be faster)
>> 3. ci should be stable
>>
>> Thus far we haven't landed on 1 system that satisfies all 3. There's some
>> background discussions brainstorming how to get there; when / if things
>> come from that they'll as always be brought to the list for discussion.
>>
>> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>>
>> I agree with David with one caveat - last time I checked only some Python
>> tests lack enough resources with the free tier. The rest run slower than
>> with a paid account, but they do fine. In fact I use the free tier if I
>> want to test only unit or in-jvm tests sometimes. I guess that is what he
>> meant by partially but even being able to run the non-Python tests is a win
>> IMHO. If we find a solution for all tests though… even better.
>> @Derek your idea sounds interesting, I will be happy to see a proposal.
>> Thank you
>>
>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>>
>> I am cool with removing circle if apache CI is stable and works, we do
>> need to solve the non-committer issue but would argue that partially exists
>> in circle today (you can be a non-commuter with a paid account, but you
>> can’t be a non-committer with a free account)
>>
>>
>>
>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>
>> I believe it's original intention to be just about CircleCI.
>>
>> It was but fwiw I'm good w/us exploring adjacent things regarding CI
>> here. I'm planning on deep diving on the thread tomorrow and distilling a
>> snapshot of the work we have a consensus on for circle and summarizing here
>> so we don't lose that. Seems like it's fairly non-controversial.
>>
>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>
>>
>>
>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
>> wrote:
>>
>> Would the preclusion of non-committers also prevent us from configuring
>> Jenkins to auto-test on PR independent of who opens it?
>>
>> One of my current concerns is that we're maintaining 2x the CI for 1x the
>> benefit, and I don't currently see an easy way to unify them (perhaps a
>> lack of imagination?). I know there's a long history behind the choice of
>> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
>> went into that decision, but that decision has costs beyond just a paid
>> CircleCI account. My long term, probably naive, goals for CI would be to:
>>
>> 1. Have a CI system that is *fully* available to *any* contributor,
>> modulo safeguards to prevent abuse
>>
>>
>>
>> This thread is going off-topic, as I believe it's original intention to
>> be just about CircleCI.
>>
>> But on your point… our community CI won't be allowed (by ASF), nor have
>> capacity (limited donated resources), to run pre-commit testing by anyone
>> and everyone.
>>
>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o
>> (make sure to label them so they can be revoked easily), but we still face
>> the issue that too many pre-commit runs impacts the throughput and quality
>> of the post-commit runs (though this has improved recently).
>>
>> It's on my wishlist to be able to: with a single command line; spin up
>> the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it
>> and collect results, and tear it down. Variations on this would solve
>> non-committers being able to repeat, use, and work on their own (or a
>> separately donated) CI system, and folk/companies with money to be able to
>> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
>> time. Having this reproducibility of the CI system would make testing
>> changes to it easier as well, so I'd expect a positive feedback loop here.
>>
>> I have some rough ideas on how to get started on this, if anyone would
>> like to buddy up on it.
>>
>>
>>
>>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Thank you Josh

So about push with/without a single click, I guess you mean to parameterize
whether the step build needs approval or not? Pre-commit the new flag will
use the “no-approval” version, but during development we still will be able
to push the tests without immediately starting all tests, right?
- parallelism + -h being removed - just to confirm, that means we will not
use xlarge containers. As David confirmed, this is not needed for all jibs
and it is important as otherwise whoever uses paid account will burn their
credits time faster for very similar duration runs.

CASSANDRA-17930 - I will use the opportunity also to mention that many of
the identified missing jobs in CircleCI will be soon there - Andres is
working on all variations unit tests, I am doing final testing on fixing
the Python upgrade tests (we weren’t using the right parameters and running
way more jobs then we should) and Derek is looking into the rest of the
Python test. I still need to check whether we need something regarding
in-jvm etc, the simulator ones are running only for jdk8 for now,
confirmed. All this should unblock us to be able to do next releases based
on CircleCI as we agreed. Then we move to do some
changes/additions/improvements to Jenkins. And of course, the future
improvements we agreed on.

On Mon, 24 Oct 2022 at 14:10, Josh McKenzie <jm...@apache.org> wrote:

> Auto-run on push? Can you elaborate?
>
> Yep - instead of having to go to circle and click, when you push your
> branch the circle hook picks it up and kicks off the top level job
> automatically. I tend to be paranoid and push a lot of incremental work
> that's not ready for CI remotely so it's not great for me, but I think
> having it be optional is the Right Thing.
>
> So here's the outstanding work I've distilled from this thread:
> - Create an epic for circleci improvement work (we have a lot of little
> augments to do here; keep it organized and try and avoid redundancy)
> - Include CASSANDRA-17600 in epic umbrella
> - Include CASSANDRA-17930 in epic umbrella
> - Ticket to tune parallelism per job
>     -
>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
> lambda a, b: True):
>     >     d = os.path.join(src_dir, 'test', kind)
>     >     num_files = 0
>     >     for root, dirs, files in os.walk(d):
>     >         for f in files:
>     >             if f.endswith('Test.java') and
> include(os.path.join(root, f), f):
>     >                 num_files += 1
>     >     return math.floor(num_files / num_file_in_worker)
>     >
>     > def fix_parallelism(args, contents):
>     >     jobs = contents['jobs']
>     >
>     >     unit_parallelism                = java_parallelism(args.src,
> 'unit', 20)
>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
> 'distributed', 2, lambda full, name: 'upgrade' in full)
>     - `TL;DR - I find all test files we are going to run, and based off a
> pre-defined variable that says “idea” number of files per worker, I then
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
> workers.  Can I be “smarter” by knowing which files have higher cost?
> Sure… but the “perfect” and the “average” are too similar that it wasn’t
> worth it...`
> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>     - Path to activate all supported JDK's for pre-commit at root
> (one-click pre-merge full validation)
>     - Path to activate per JDK below that (interim work partial validation)
> - Ticket to rename jobs in circleci
>     - Reference comment:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>     - (buildjdk)_(runjdk)_(testsuite) format:
>     - j8_j8_jvm_dtests
>     - j8_j11_jvm_dtests
>     - j11_j11_jvm_dtest_vnode
>     etc
> - Ticket for flag in generate.sh to support auto run on push (see response
> above)
> - Ticket for: remove -h, have -f and -p (free and paid) (probably
> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>
> Anything wrong w/the above or anything missed? If not, I'll go do some
> JIRA'ing.
>
>
> ~Josh
>
>
> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
> There's a few threads here:
> 1. non-committers should be able to run ci
> 2. People that have resources and want to run ci faster should be able to
> do so (assuming the ci of record could serve to be faster)
> 3. ci should be stable
>
> Thus far we haven't landed on 1 system that satisfies all 3. There's some
> background discussions brainstorming how to get there; when / if things
> come from that they'll as always be brought to the list for discussion.
>
> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>
> I agree with David with one caveat - last time I checked only some Python
> tests lack enough resources with the free tier. The rest run slower than
> with a paid account, but they do fine. In fact I use the free tier if I
> want to test only unit or in-jvm tests sometimes. I guess that is what he
> meant by partially but even being able to run the non-Python tests is a win
> IMHO. If we find a solution for all tests though… even better.
> @Derek your idea sounds interesting, I will be happy to see a proposal.
> Thank you
>
> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
>
>
> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>
> I believe it's original intention to be just about CircleCI.
>
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here.
> I'm planning on deep diving on the thread tomorrow and distilling a
> snapshot of the work we have a consensus on for circle and summarizing here
> so we don't lose that. Seems like it's fairly non-controversial.
>
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>
>
>
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>
>
>
> This thread is going off-topic, as I believe it's original intention to be
> just about CircleCI.
>
> But on your point… our community CI won't be allowed (by ASF), nor have
> capacity (limited donated resources), to run pre-commit testing by anyone
> and everyone.
>
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
> sure to label them so they can be revoked easily), but we still face the
> issue that too many pre-commit runs impacts the throughput and quality of
> the post-commit runs (though this has improved recently).
>
> It's on my wishlist to be able to: with a single command line; spin up the
> ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
> collect results, and tear it down. Variations on this would solve
> non-committers being able to repeat, use, and work on their own (or a
> separately donated) CI system, and folk/companies with money to be able to
> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
> time. Having this reproducibility of the CI system would make testing
> changes to it easier as well, so I'd expect a positive feedback loop here.
>
> I have some rough ideas on how to get started on this, if anyone would
> like to buddy up on it.
>
>
>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Andrés de la Peña <ad...@apache.org>.
>
> Yep - instead of having to go to circle and click, when you push your
> branch the circle hook picks it up and kicks off the top level job
> automatically. I tend to be paranoid and push a lot of incremental work
> that's not ready for CI remotely so it's not great for me, but I think
> having it be optional is the Right Thing.

- Ticket for flag in generate.sh to support auto run on push (see response
> above)


CASSANDRA-17113 was created almost a year ago for this. While we can have
flags to specify whether the runs tart automatically or not, we'd still
need to have a default. I think the default should be not starting anything
without either manual approval or the usage of those flags when generating
the config, as we decided during CASSANDRA-16882 and the discussions around
it.

- Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
> - Ticket to rename jobs in circleci


I'd say these two things should be in a single ticket, since the problems
with naming appear when we try to unify the two workflows.


On Mon, 24 Oct 2022 at 19:10, Josh McKenzie <jm...@apache.org> wrote:

> Auto-run on push? Can you elaborate?
>
> Yep - instead of having to go to circle and click, when you push your
> branch the circle hook picks it up and kicks off the top level job
> automatically. I tend to be paranoid and push a lot of incremental work
> that's not ready for CI remotely so it's not great for me, but I think
> having it be optional is the Right Thing.
>
> So here's the outstanding work I've distilled from this thread:
> - Create an epic for circleci improvement work (we have a lot of little
> augments to do here; keep it organized and try and avoid redundancy)
> - Include CASSANDRA-17600 in epic umbrella
> - Include CASSANDRA-17930 in epic umbrella
> - Ticket to tune parallelism per job
>     -
>     > def java_parallelism(src_dir, kind, num_file_in_worker, include =
> lambda a, b: True):
>     >     d = os.path.join(src_dir, 'test', kind)
>     >     num_files = 0
>     >     for root, dirs, files in os.walk(d):
>     >         for f in files:
>     >             if f.endswith('Test.java') and
> include(os.path.join(root, f), f):
>     >                 num_files += 1
>     >     return math.floor(num_files / num_file_in_worker)
>     >
>     > def fix_parallelism(args, contents):
>     >     jobs = contents['jobs']
>     >
>     >     unit_parallelism                = java_parallelism(args.src,
> 'unit', 20)
>     >     jvm_dtest_parallelism           = java_parallelism(args.src,
> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>     >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
> 'distributed', 2, lambda full, name: 'upgrade' in full)
>     - `TL;DR - I find all test files we are going to run, and based off a
> pre-defined variable that says “idea” number of files per worker, I then
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
> workers.  Can I be “smarter” by knowing which files have higher cost?
> Sure… but the “perfect” and the “average” are too similar that it wasn’t
> worth it...`
> - Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
>     - Path to activate all supported JDK's for pre-commit at root
> (one-click pre-merge full validation)
>     - Path to activate per JDK below that (interim work partial validation)
> - Ticket to rename jobs in circleci
>     - Reference comment:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
>     - (buildjdk)_(runjdk)_(testsuite) format:
>     - j8_j8_jvm_dtests
>     - j8_j11_jvm_dtests
>     - j11_j11_jvm_dtest_vnode
>     etc
> - Ticket for flag in generate.sh to support auto run on push (see response
> above)
> - Ticket for: remove -h, have -f and -p (free and paid) (probably
> intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)
>
> Anything wrong w/the above or anything missed? If not, I'll go do some
> JIRA'ing.
>
> ~Josh
>
>
> On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
> There's a few threads here:
> 1. non-committers should be able to run ci
> 2. People that have resources and want to run ci faster should be able to
> do so (assuming the ci of record could serve to be faster)
> 3. ci should be stable
>
> Thus far we haven't landed on 1 system that satisfies all 3. There's some
> background discussions brainstorming how to get there; when / if things
> come from that they'll as always be brought to the list for discussion.
>
> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>
> I agree with David with one caveat - last time I checked only some Python
> tests lack enough resources with the free tier. The rest run slower than
> with a paid account, but they do fine. In fact I use the free tier if I
> want to test only unit or in-jvm tests sometimes. I guess that is what he
> meant by partially but even being able to run the non-Python tests is a win
> IMHO. If we find a solution for all tests though… even better.
> @Derek your idea sounds interesting, I will be happy to see a proposal.
> Thank you
>
> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>
> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
>
>
> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>
> I believe it's original intention to be just about CircleCI.
>
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here.
> I'm planning on deep diving on the thread tomorrow and distilling a
> snapshot of the work we have a consensus on for circle and summarizing here
> so we don't lose that. Seems like it's fairly non-controversial.
>
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>
>
>
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>
>
>
> This thread is going off-topic, as I believe it's original intention to be
> just about CircleCI.
>
> But on your point… our community CI won't be allowed (by ASF), nor have
> capacity (limited donated resources), to run pre-commit testing by anyone
> and everyone.
>
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
> sure to label them so they can be revoked easily), but we still face the
> issue that too many pre-commit runs impacts the throughput and quality of
> the post-commit runs (though this has improved recently).
>
> It's on my wishlist to be able to: with a single command line; spin up the
> ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
> collect results, and tear it down. Variations on this would solve
> non-committers being able to repeat, use, and work on their own (or a
> separately donated) CI system, and folk/companies with money to be able to
> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
> time. Having this reproducibility of the CI system would make testing
> changes to it easier as well, so I'd expect a positive feedback loop here.
>
> I have some rough ideas on how to get started on this, if anyone would
> like to buddy up on it.
>
>
>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Josh McKenzie <jm...@apache.org>.
> Auto-run on push? Can you elaborate?
Yep - instead of having to go to circle and click, when you push your branch the circle hook picks it up and kicks off the top level job automatically. I tend to be paranoid and push a lot of incremental work that's not ready for CI remotely so it's not great for me, but I think having it be optional is the Right Thing.

So here's the outstanding work I've distilled from this thread:
- Create an epic for circleci improvement work (we have a lot of little augments to do here; keep it organized and try and avoid redundancy)
- Include CASSANDRA-17600 in epic umbrella  
- Include CASSANDRA-17930 in epic umbrella
- Ticket to tune parallelism per job  
    -  
    > def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
    >     d = os.path.join(src_dir, 'test', kind)
    >     num_files = 0
    >     for root, dirs, files in os.walk(d):
    >         for f in files:
    >             if f.endswith('Test.java') and include(os.path.join(root, f), f):
    >                 num_files += 1
    >     return math.floor(num_files / num_file_in_worker)
    > 
    > def fix_parallelism(args, contents):
    >     jobs = contents['jobs']
    > 
    >     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
    >     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
    >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
    - `TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...`  
- Ticket to combine pre-commit jobs into 1 pipeline for all JDK's
    - Path to activate all supported JDK's for pre-commit at root (one-click pre-merge full validation)
    - Path to activate per JDK below that (interim work partial validation)
- Ticket to rename jobs in circleci
    - Reference comment: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617016&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617016
    - (buildjdk)_(runjdk)_(testsuite) format:
    - j8_j8_jvm_dtests
    - j8_j11_jvm_dtests
    - j11_j11_jvm_dtest_vnode
    etc
- Ticket for flag in generate.sh to support auto run on push (see response above)
- Ticket for: remove -h, have -f and -p (free and paid) (probably intersects with https://issues.apache.org/jira/browse/CASSANDRA-17600)

Anything wrong w/the above or anything missed? If not, I'll go do some JIRA'ing.

~Josh


On Fri, Oct 21, 2022, at 3:50 PM, Josh McKenzie wrote:
>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
> There's a few threads here:
> 1. non-committers should be able to run ci
> 2. People that have resources and want to run ci faster should be able to do so (assuming the ci of record could serve to be faster)
> 3. ci should be stable
> 
> Thus far we haven't landed on 1 system that satisfies all 3. There's some background discussions brainstorming how to get there; when / if things come from that they'll as always be brought to the list for discussion.
> 
> On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
>> I agree with David with one caveat - last time I checked only some Python tests lack enough resources with the free tier. The rest run slower than with a paid account, but they do fine. In fact I use the free tier if I want to test only unit or in-jvm tests sometimes. I guess that is what he meant by partially but even being able to run the non-Python tests is a win IMHO. If we find a solution for all tests though… even better.
>> @Derek your idea sounds interesting, I will be happy to see a proposal. Thank you
>> 
>> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>>> 
>>> 
>>> 
>>>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>>> 
>>>>> I believe it's original intention to be just about CircleCI.
>>>> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
>>>> 
>>>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>>>> 
>>>>> 
>>>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org> wrote:
>>>>>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>>>>>> 
>>>>>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>>>>>> 
>>>>>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>>>>> 
>>>>> 
>>>>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>>>>> 
>>>>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>>>>> 
>>>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>>>>> 
>>>>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>>>>> 
>>>>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.
> 

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Josh McKenzie <jm...@apache.org>.
> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
There's a few threads here:
1. non-committers should be able to run ci
2. People that have resources and want to run ci faster should be able to do so (assuming the ci of record could serve to be faster)
3. ci should be stable

Thus far we haven't landed on 1 system that satisfies all 3. There's some background discussions brainstorming how to get there; when / if things come from that they'll as always be brought to the list for discussion.

On Fri, Oct 21, 2022, at 1:44 PM, Ekaterina Dimitrova wrote:
> I agree with David with one caveat - last time I checked only some Python tests lack enough resources with the free tier. The rest run slower than with a paid account, but they do fine. In fact I use the free tier if I want to test only unit or in-jvm tests sometimes. I guess that is what he meant by partially but even being able to run the non-Python tests is a win IMHO. If we find a solution for all tests though… even better.
> @Derek your idea sounds interesting, I will be happy to see a proposal. Thank you
> 
> On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:
>> I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)
>> 
>> 
>> 
>>> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>>> 
>>>> I believe it's original intention to be just about CircleCI.
>>> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
>>> 
>>> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>>>> 
>>>> 
>>>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org> wrote:
>>>>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>>>>> 
>>>>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>>>>> 
>>>>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>>>> 
>>>> 
>>>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>>>> 
>>>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>>>> 
>>>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>>>> 
>>>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>>>> 
>>>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
I agree with David with one caveat - last time I checked only some Python
tests lack enough resources with the free tier. The rest run slower than
with a paid account, but they do fine. In fact I use the free tier if I
want to test only unit or in-jvm tests sometimes. I guess that is what he
meant by partially but even being able to run the non-Python tests is a win
IMHO. If we find a solution for all tests though… even better.
@Derek your idea sounds interesting, I will be happy to see a proposal.
Thank you

On Fri, 21 Oct 2022 at 13:39, David Capwell <dc...@apple.com> wrote:

> I am cool with removing circle if apache CI is stable and works, we do
> need to solve the non-committer issue but would argue that partially exists
> in circle today (you can be a non-commuter with a paid account, but you
> can’t be a non-committer with a free account)
>
>
>
> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jm...@apache.org> wrote:
>
> I believe it's original intention to be just about CircleCI.
>
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here.
> I'm planning on deep diving on the thread tomorrow and distilling a
> snapshot of the work we have a consensus on for circle and summarizing here
> so we don't lose that. Seems like it's fairly non-controversial.
>
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>
>
>
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>
>
>
> This thread is going off-topic, as I believe it's original intention to be
> just about CircleCI.
>
> But on your point… our community CI won't be allowed (by ASF), nor have
> capacity (limited donated resources), to run pre-commit testing by anyone
> and everyone.
>
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
> sure to label them so they can be revoked easily), but we still face the
> issue that too many pre-commit runs impacts the throughput and quality of
> the post-commit runs (though this has improved recently).
>
> It's on my wishlist to be able to: with a single command line; spin up the
> ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
> collect results, and tear it down. Variations on this would solve
> non-committers being able to repeat, use, and work on their own (or a
> separately donated) CI system, and folk/companies with money to be able to
> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
> time. Having this reproducibility of the CI system would make testing
> changes to it easier as well, so I'd expect a positive feedback loop here.
>
> I have some rough ideas on how to get started on this, if anyone would
> like to buddy up on it.
>
>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by David Capwell <dc...@apple.com>.
I am cool with removing circle if apache CI is stable and works, we do need to solve the non-committer issue but would argue that partially exists in circle today (you can be a non-commuter with a paid account, but you can’t be a non-committer with a free account)



> On Oct 20, 2022, at 2:20 PM, Josh McKenzie <jmckenzie@apache.org <ma...@apache.org>> wrote:
> 
>> I believe it's original intention to be just about CircleCI.
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.
> 
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>> 
>> 
>> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <derek@chen-becker.org <ma...@chen-becker.org>> wrote:
>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>> 
>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>> 
>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
>> 
>> 
>> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
>> 
>> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
>> 
>> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
>> 
>> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
>> 
>> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.


Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Derek Chen-Becker <de...@chen-becker.org>.
Random thought (and on-topic, even!) now that I'm starting to understand
CircleCI config better: we should use conditionals and parameters so that
we can have a single, uniform config across version branches, and limit the
diffs across branches to version related flags to enable or disable sets of
tests. I would even go so far as to say we might be able to put the config
into a submodule while pulling in branch-specific config from the top level
repo.

Cheers,

Derek

On Thu, Oct 20, 2022, 3:21 PM Josh McKenzie <jm...@apache.org> wrote:

> I believe it's original intention to be just about CircleCI.
>
> It was but fwiw I'm good w/us exploring adjacent things regarding CI here.
> I'm planning on deep diving on the thread tomorrow and distilling a
> snapshot of the work we have a consensus on for circle and summarizing here
> so we don't lose that. Seems like it's fairly non-controversial.
>
> On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
>
>
>
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
> wrote:
>
> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>
>
>
> This thread is going off-topic, as I believe it's original intention to be
> just about CircleCI.
>
> But on your point… our community CI won't be allowed (by ASF), nor have
> capacity (limited donated resources), to run pre-commit testing by anyone
> and everyone.
>
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
> sure to label them so they can be revoked easily), but we still face the
> issue that too many pre-commit runs impacts the throughput and quality of
> the post-commit runs (though this has improved recently).
>
> It's on my wishlist to be able to: with a single command line; spin up the
> ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
> collect results, and tear it down. Variations on this would solve
> non-committers being able to repeat, use, and work on their own (or a
> separately donated) CI system, and folk/companies with money to be able to
> run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
> time. Having this reproducibility of the CI system would make testing
> changes to it easier as well, so I'd expect a positive feedback loop here.
>
> I have some rough ideas on how to get started on this, if anyone would
> like to buddy up on it.
>
>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Josh McKenzie <jm...@apache.org>.
> I believe it's original intention to be just about CircleCI.
It was but fwiw I'm good w/us exploring adjacent things regarding CI here. I'm planning on deep diving on the thread tomorrow and distilling a snapshot of the work we have a consensus on for circle and summarizing here so we don't lose that. Seems like it's fairly non-controversial.

On Thu, Oct 20, 2022, at 5:14 PM, Mick Semb Wever wrote:
> 
> 
> On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org> wrote:
>> Would the preclusion of non-committers also prevent us from configuring Jenkins to auto-test on PR independent of who opens it?
>> 
>> One of my current concerns is that we're maintaining 2x the CI for 1x the benefit, and I don't currently see an easy way to unify them (perhaps a lack of imagination?). I know there's a long history behind the choice of CircleCI, so I'm not trying to be hand-wavy about all of the thought that went into that decision, but that decision has costs beyond just a paid CircleCI account. My long term, probably naive, goals for CI would be to:
>> 
>> 1. Have a CI system that is *fully* available to *any* contributor, modulo safeguards to prevent abuse
> 
> 
> This thread is going off-topic, as I believe it's original intention to be just about CircleCI.
> 
> But on your point… our community CI won't be allowed (by ASF), nor have capacity (limited donated resources), to run pre-commit testing by anyone and everyone.
> 
> Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make sure to label them so they can be revoked easily), but we still face the issue that too many pre-commit runs impacts the throughput and quality of the post-commit runs (though this has improved recently).
> 
> It's on my wishlist to be able to: with a single command line; spin up the ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and collect results, and tear it down. Variations on this would solve non-committers being able to repeat, use, and work on their own (or a separately donated) CI system, and folk/companies with money to be able to run their own ci-cassandra.a.o stacks for faster pre-commit turnaround time. Having this reproducibility of the CI system would make testing changes to it easier as well, so I'd expect a positive feedback loop here. 
> 
> I have some rough ideas on how to get started on this, if anyone would like to buddy up on it.
> 

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Mick Semb Wever <mc...@apache.org>.
On Thu, 20 Oct 2022 at 22:07, Derek Chen-Becker <de...@chen-becker.org>
wrote:

> Would the preclusion of non-committers also prevent us from configuring
> Jenkins to auto-test on PR independent of who opens it?
>
> One of my current concerns is that we're maintaining 2x the CI for 1x the
> benefit, and I don't currently see an easy way to unify them (perhaps a
> lack of imagination?). I know there's a long history behind the choice of
> CircleCI, so I'm not trying to be hand-wavy about all of the thought that
> went into that decision, but that decision has costs beyond just a paid
> CircleCI account. My long term, probably naive, goals for CI would be to:
>
> 1. Have a CI system that is *fully* available to *any* contributor, modulo
> safeguards to prevent abuse
>


This thread is going off-topic, as I believe it's original intention to be
just about CircleCI.

But on your point… our community CI won't be allowed (by ASF), nor have
capacity (limited donated resources), to run pre-commit testing by anyone
and everyone.

Today, trusted contributors can be handed tokens to ci-cassandra.a.o (make
sure to label them so they can be revoked easily), but we still face the
issue that too many pre-commit runs impacts the throughput and quality of
the post-commit runs (though this has improved recently).

It's on my wishlist to be able to: with a single command line; spin up the
ci-cassandra.a.o stack on any k8s cluster, run any git sha through it and
collect results, and tear it down. Variations on this would solve
non-committers being able to repeat, use, and work on their own (or a
separately donated) CI system, and folk/companies with money to be able to
run their own ci-cassandra.a.o stacks for faster pre-commit turnaround
time. Having this reproducibility of the CI system would make testing
changes to it easier as well, so I'd expect a positive feedback loop here.

I have some rough ideas on how to get started on this, if anyone would like
to buddy up on it.

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Derek Chen-Becker <de...@chen-becker.org>.
Would the preclusion of non-committers also prevent us from configuring
Jenkins to auto-test on PR independent of who opens it?

One of my current concerns is that we're maintaining 2x the CI for 1x the
benefit, and I don't currently see an easy way to unify them (perhaps a
lack of imagination?). I know there's a long history behind the choice of
CircleCI, so I'm not trying to be hand-wavy about all of the thought that
went into that decision, but that decision has costs beyond just a paid
CircleCI account. My long term, probably naive, goals for CI would be to:

1. Have a CI system that is *fully* available to *any* contributor, modulo
safeguards to prevent abuse
2. Have a CI system that is easy to maintain, with clear instructions and
examples for adding a new test suite
3. Have a CI system with reporting that makes it easy to quickly identify
test failures, as well as test *stability* over time

CI isn't glamorous, is often underfunded, and yet is critical to project
velocity and stability. I really appreciate Josh and others driving these
discussions!

Cheers,

Derek

On Thu, Oct 20, 2022 at 1:25 PM Ekaterina Dimitrova <e....@gmail.com>
wrote:

> Sounds like great plan to me.
> Just wanted to mention one caveat. Non-committers do not have access to
> ASF CI. I do not think this will change. While no one of us ever said no to
> push a patch for testing, it is good to have a good backup plan people can
> do it themselves. Currently this is CircleCI. Maybe we can leave it just as
> is at some point and people who want to use it can continue supporting it
> themselves? I don’t have a clear answer now.  I guess time will show
>
> On Thu, 20 Oct 2022 at 14:51, Brandon Williams <dr...@gmail.com> wrote:
>
>> On Thu, Oct 20, 2022 at 1:45 PM Josh McKenzie <jm...@apache.org>
>> wrote:
>> >
>> > My high level hope is that we can:
>> >
>> > 1. Unstick mainline yearly releases (vote to accept circle results,
>> make circle more robust <- WE ARE HERE)
>> > 2. Invest resources into the ASF CI environment to get it to being a
>> viable replacement for circle (requirements for this qualification TBD)
>> > 3. Deprecate circle
>> >
>> > I'm doing my best to help make the above a reality. There's a lot of
>> hand-waving in "ASF CI as viable replacement" but it's 2022 and there's a
>> lot of modern build and ci system's learning our industry has gained in the
>> last decade we're not yet taking advantage of.
>>
>> I'm a strong +1 on this plan and will do everything I can to assist.
>> We should have a discussion on how to get this accomplished and
>> delineate the current problems we are facing.
>>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Sounds like great plan to me.
Just wanted to mention one caveat. Non-committers do not have access to ASF
CI. I do not think this will change. While no one of us ever said no to
push a patch for testing, it is good to have a good backup plan people can
do it themselves. Currently this is CircleCI. Maybe we can leave it just as
is at some point and people who want to use it can continue supporting it
themselves? I don’t have a clear answer now.  I guess time will show

On Thu, 20 Oct 2022 at 14:51, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, Oct 20, 2022 at 1:45 PM Josh McKenzie <jm...@apache.org>
> wrote:
> >
> > My high level hope is that we can:
> >
> > 1. Unstick mainline yearly releases (vote to accept circle results, make
> circle more robust <- WE ARE HERE)
> > 2. Invest resources into the ASF CI environment to get it to being a
> viable replacement for circle (requirements for this qualification TBD)
> > 3. Deprecate circle
> >
> > I'm doing my best to help make the above a reality. There's a lot of
> hand-waving in "ASF CI as viable replacement" but it's 2022 and there's a
> lot of modern build and ci system's learning our industry has gained in the
> last decade we're not yet taking advantage of.
>
> I'm a strong +1 on this plan and will do everything I can to assist.
> We should have a discussion on how to get this accomplished and
> delineate the current problems we are facing.
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Oct 20, 2022 at 1:45 PM Josh McKenzie <jm...@apache.org> wrote:
>
> My high level hope is that we can:
>
> 1. Unstick mainline yearly releases (vote to accept circle results, make circle more robust <- WE ARE HERE)
> 2. Invest resources into the ASF CI environment to get it to being a viable replacement for circle (requirements for this qualification TBD)
> 3. Deprecate circle
>
> I'm doing my best to help make the above a reality. There's a lot of hand-waving in "ASF CI as viable replacement" but it's 2022 and there's a lot of modern build and ci system's learning our industry has gained in the last decade we're not yet taking advantage of.

I'm a strong +1 on this plan and will do everything I can to assist.
We should have a discussion on how to get this accomplished and
delineate the current problems we are facing.

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Josh McKenzie <jm...@apache.org>.
My high level hope is that we can:

1. Unstick mainline yearly releases (vote to accept circle results, make circle more robust <- WE ARE HERE)
2. Invest resources into the ASF CI environment to get it to being a viable replacement for circle (requirements for this qualification TBD)
3. Deprecate circle

I'm doing my best to help make the above a reality. There's a lot of hand-waving in "ASF CI as viable replacement" but it's 2022 and there's a lot of modern build and ci system's learning our industry has gained in the last decade we're not yet taking advantage of.

~Josh

On Thu, Oct 20, 2022, at 1:17 PM, Derek Chen-Becker wrote:
> +1 to all of these, especially improving CircleCI generation and ergonomics. I still have a bunch of reservations around CircleCI in general, but in the short term we can make it less painful (to a point).
> 
> Cheers,
> 
> Derek
> 
> On Thu, Oct 20, 2022 at 6:38 AM Ekaterina Dimitrova <e....@gmail.com> wrote:
>> Yes, they do. This is the only test suite that gets max resources with -m. Probably you had some other issue Berenguer as I can confirm I was running them successfully these days
>> 
>> On Thu, 20 Oct 2022 at 6:54, Brandon Williams <dr...@gmail.com> wrote:
>>> They passed with -m for me recently.
>>> 
>>> Kind Regards,
>>> Brandon
>>> 
>>> On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
>>> <be...@gmail.com> wrote:
>>> >
>>> > Can python upgrade tests be ran without -h? Last time I tried iirc they fail on -m
>>> >
>>> > On 20/10/22 4:11, Ekaterina Dimitrova wrote:
>>> >
>>> > Thank you Josh. Glad to see that our CI is getting more attention. As no Cassandra feature will be there if we don't do proper testing, right? Important as all the suites and tools we have. With that being said I am glad to see Derek is volunteering to spend more time on this as I believe this is always the main issue - ideas and willingness for improvements are there but people are swamped with other things and we lack manpower for something so important.
>>> > 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>>> > Question for David, do you tune only parallelism and use only xlarge? If yes, we need to talk :D
>>> > Reading what Stefan shared as experience/feedback, I think we can revise the current config and move to a more reasonable config that can work for most people but there will always be someone who needs something a bit different. With that said maybe we can add to our scripts/menu an option to change from command line through parameters parallelism and/or resources? For those who want further customization? I see this as a separate additional ticket probably. In that case we might probably skip the use of circleci config process for that part of the menu. (but not for adding new jobs and meaningful permanent updates)
>>> > 2. Rename jobs on circle to be more indicative of their function
>>> > +0 I am probably super used to the current names but Derek brought it to my attention that there are names which are confusing for someone new to the cassandra world. With that said I would say we can do this in a separate ticket, mass update.
>>> > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
>>> > I am against unifying per JDK workflows but I am all in for unifying the pre-commit/separate workflows and getting back to 2 workflows as suggested by Andres. If we think of how that will look in the UI I think it will be super hard to follow. (the case of having unified both jdks in one workflow)
>>> > 4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc 4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>>> > I will open a ticket and post the guide I was working on. But it also doesn't make sense to fully update it now if we are going to significantly change the workflow soon. Until then I believe Andres has updated the circleci readme and provided good usage examples.
>>> > 5. Flag on generate.sh to allow auto-run on push
>>> > Auto-run on push? Can you elaborate? Like to start your whole workflow directly without using the UI? There is an approval step in the config file, we can probably add some flags to change pre-commit workflows to start build without approval when we use those mentioned flags. But having by default everything to start on push is an overkill in my opinion. People will be forgetting it and pushing builds for nothing on WIP branches. Talking from experience :D
>>> > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
>>> > If we agree except the free tier config file we want one more reasonable config which doesn't bump resources to the max without a need but provides balanced use of resources - absolutely. -h was kept as there was understanding there are people in the community actively using it.
>>> > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
>>> > +0
>>> > I also wanted to address a few of the points David made.
>>> > "Ekaterina is probably dealing with with her JDK17 work" - if you mean to ensure we have all jobs for all jdks properly, yes. That was my plan. Until Derek was so good at suggesting to work on adding missing jobs in CircleCI now so my work on that will be a bit less for certain things. This is an effort related to the recent changes in our release document. Ticket CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is welcome to join the party.
>>> > "1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place as I mentioned in my other email the other day. [1]
>>> >
>>> > "our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600" - I'd say the patching is annoying sometimes, indeed but with/without the patching any changes to config mean we need to check it by reading through diff and pushing a run to CI before commit. With that said I am all in for automation but this will not change the fact we need to push test runs and verify the changes did not hurt us in a way. Same as testing patches on all branches, running all needed tests and confirming no regressions. Nothing new or changing here IMHO
>>> >
>>> > "CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs.. we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it. Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue? Will it be even less tests? Why does the burden lie on everyone to “do the right thing” when all they want is a simple job?"
>>> >  Controlled and fleshed by humans it will always be but I agree we need to automate the steps to make it easier for people to add most of the combinations and not to skip any because it is too much work. We will always need a human to decide which jdks, cdc, vnodes, etc. With that said I shared your ticket/patch with Derek as he had similar thoughts, we need to get back to that one at some point. (CASSANDRA-17600) Thanks for working on that!
>>> >
>>> > "why do we require people to install “circleci” command to contribute? If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml…."
>>> > I'd say config-2_1.yml is mainly for those who will make permanent changes to config (like adding/removing jobs). config-2_1.yml is actually created as per the CircleCI automation rules - 1st we add and reuse executors, parameters and commands but I think we can reduce further things if we add even more parameters probably. I have to look more into the current file. I am sure there is room for further improvement. 2nd circleci cli tool can verify the config file for errors and helps with debugging before we push to CircleCI. There is circleci config validate. If we make changes manually we are on our own to verify the long yml and also deal with duplication in config.yml. My concern is that things that need to be almost identical might start to diverge easier. Though I made my suggestion in point 1 for what cases probably we can add menu options that potentially will not require using circleci cli tool. There might be more cases though.
>>> > Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. I'd say lots of duplication there
>>> >
>>> > [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>>> >
>>> > On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:
>>> >>
>>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>>> >>
>>> >>
>>> >> +1 to this!  I drastically lower our parallelism as only python-dtest upgrade tests need many resources…
>>> >>
>>> >> What I do for JVM unit/jvm-dtest is the following
>>> >>
>>> >> def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
>>> >>     d = os.path.join(src_dir, 'test', kind)
>>> >>     num_files = 0
>>> >>     for root, dirs, files in os.walk(d):
>>> >>         for f in files:
>>> >>             if f.endswith('Test.java') and include(os.path.join(root, f), f):
>>> >>                 num_files += 1
>>> >>     return math.floor(num_files / num_file_in_worker)
>>> >>
>>> >> def fix_parallelism(args, contents):
>>> >>     jobs = contents['jobs']
>>> >>
>>> >>     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
>>> >>     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>> >>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
>>> >>
>>> >> TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...
>>> >>
>>> >> 2. Rename jobs on circle to be more indicative of their function
>>> >>
>>> >>
>>> >> Have an example?  I am not against, I just don’t know the problem you are referring to.
>>> >>
>>> >> 3. Unify j8 and j11 workflow pairs into single
>>> >>
>>> >>
>>> >> Fine by me, but we need to keep in mind j17 is coming.  Also, most developmental CI builds don’t really need to run cross every JDK, so we need some way to disable different JDKs…
>>> >>
>>> >> When I am testing out a patch I tend to run the following (my script): "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know I am going to run them pre-merge so I know its safe for me.
>>> >>
>>> >> 5. Flag on generate.sh to allow auto-run on push
>>> >>
>>> >>
>>> >> I really hate that we don’t do this by default… I still to this day strongly feel you should opt-out of CI rather than opt-in… seen several commits get merged as they didn’t see a error in circle… because circle didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
>>> >>
>>> >> TL;DR +1
>>> >>
>>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
>>> >>
>>> >>
>>> >> +0 from me… I have seen people not realize you have to commit after typing “higher” (wrapper around my personal circleci-enable.py script to apply my defaults to the build) but not an issue I have… so I don’t mind if people want the tool to integrate with git…
>>> >>
>>> >>
>>> >> With all that said, I do feel there is more, and something I feel Ekaterina is probably dealing with with her JDK17 work…
>>> >>
>>> >> 1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600
>>> >> 2) our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600
>>> >> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs..  we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it.  Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue?  Will it be even less tests?  Why does the burden lie on everyone to “do the right thing” when all they want is a simple job?
>>> >> 4) why do we require people to install “circleci” command to contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml….
>>> >>
>>> >>
>>> >> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <St...@netapp.com> wrote:
>>> >>
>>> >> 1) would be nice to have. The first thing I do is that I change the parallelism to 20. None of committed config.yaml's are appropriate for our company CircleCI so I have to tweak this manually. I think we can not run more that 25/30 containers in parallel, something like that. HIGHRES has 100 and MIDRES has some jobs having parallelism equal to 50 or so so that is not good either. I would be happy with simple way to modify default config.yaml on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and leave parallelism: 1 where it does not make sense to increase it. However I noticed that there is not "4" set everywhere, some jobs have it set to "1" so I have to take extra care of these cases (I consider that to be a bug, I think there are two or three, I do not remember). Once set, I have that config in "git stash" so I just apply it every time I need it.
>>> >>
>>> >> 5) would be nice too.
>>> >> 7) is nice but not crucial, it takes no time to commit that.
>>> >>
>>> >> ________________________________________
>>> >> From: Josh McKenzie <jm...@apache.org>
>>> >> Sent: Wednesday, October 19, 2022 21:50
>>> >> To: dev
>>> >> Subject: [DISCUSS] Potential circleci config and workflow changes
>>> >>
>>> >> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>> >>
>>> >>
>>> >>
>>> >> While working w/Andres on CASSANDRA-17939 a variety of things came up regarding our circleci config and opportunities to improve it. Figured I'd hit the list up here to see what people's thoughts are since many of us intersect with these systems daily and having your workflow disrupted without having a chance to provide input is bad.
>>> >>
>>> >> The ideas:
>>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>>> >> 2. Rename jobs on circle to be more indicative of their function
>>> >> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
>>> >> 4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc
>>> >> 4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html
>>> >> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>>> >> 5. Flag on generate.sh to allow auto-run on push
>>> >> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
>>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
>>> >>
>>> >> Curious to see what folks think.
>>> >>
>>> >> ~Josh
>>> >>
>>> >>
> 
> 
> -- 
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
> 

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Derek Chen-Becker <de...@chen-becker.org>.
+1 to all of these, especially improving CircleCI generation and
ergonomics. I still have a bunch of reservations around CircleCI in
general, but in the short term we can make it less painful (to a point).

Cheers,

Derek

On Thu, Oct 20, 2022 at 6:38 AM Ekaterina Dimitrova <e....@gmail.com>
wrote:

> Yes, they do. This is the only test suite that gets max resources with -m.
> Probably you had some other issue Berenguer as I can confirm I was running
> them successfully these days
>
> On Thu, 20 Oct 2022 at 6:54, Brandon Williams <dr...@gmail.com> wrote:
>
>> They passed with -m for me recently.
>>
>> Kind Regards,
>> Brandon
>>
>> On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
>> <be...@gmail.com> wrote:
>> >
>> > Can python upgrade tests be ran without -h? Last time I tried iirc they
>> fail on -m
>> >
>> > On 20/10/22 4:11, Ekaterina Dimitrova wrote:
>> >
>> > Thank you Josh. Glad to see that our CI is getting more attention. As
>> no Cassandra feature will be there if we don't do proper testing, right?
>> Important as all the suites and tools we have. With that being said I am
>> glad to see Derek is volunteering to spend more time on this as I believe
>> this is always the main issue - ideas and willingness for improvements are
>> there but people are swamped with other things and we lack manpower for
>> something so important.
>> > 1. Tune parallelism levels per job (David and Ekaterina have insight on
>> this)
>> > Question for David, do you tune only parallelism and use only xlarge?
>> If yes, we need to talk :D
>> > Reading what Stefan shared as experience/feedback, I think we can
>> revise the current config and move to a more reasonable config that can
>> work for most people but there will always be someone who needs something a
>> bit different. With that said maybe we can add to our scripts/menu an
>> option to change from command line through parameters parallelism and/or
>> resources? For those who want further customization? I see this as a
>> separate additional ticket probably. In that case we might probably skip
>> the use of circleci config process for that part of the menu. (but not for
>> adding new jobs and meaningful permanent updates)
>> > 2. Rename jobs on circle to be more indicative of their function
>> > +0 I am probably super used to the current names but Derek brought it
>> to my attention that there are names which are confusing for someone new to
>> the cassandra world. With that said I would say we can do this in a
>> separate ticket, mass update.
>> > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
>> )
>> > I am against unifying per JDK workflows but I am all in for unifying
>> the pre-commit/separate workflows and getting back to 2 workflows as
>> suggested by Andres. If we think of how that will look in the UI I think it
>> will be super hard to follow. (the case of having unified both jdks in one
>> workflow)
>> > 4. Update documentation w/guidance on using circle,
>> .circleci/generate.sh examples, etc 4a. How to commit:
>> https://cassandra.apache.org/_/development/how_to_commit.html 4b.
>> Testing: https://cassandra.apache.org/_/development/testing.html
>> > I will open a ticket and post the guide I was working on. But it also
>> doesn't make sense to fully update it now if we are going to significantly
>> change the workflow soon. Until then I believe Andres has updated the
>> circleci readme and provided good usage examples.
>> > 5. Flag on generate.sh to allow auto-run on push
>> > Auto-run on push? Can you elaborate? Like to start your whole workflow
>> directly without using the UI? There is an approval step in the config
>> file, we can probably add some flags to change pre-commit workflows to
>> start build without approval when we use those mentioned flags. But having
>> by default everything to start on push is an overkill in my opinion. People
>> will be forgetting it and pushing builds for nothing on WIP branches.
>> Talking from experience :D
>> > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
>> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
>> and instead be a documentation issue
>> > If we agree except the free tier config file we want one more
>> reasonable config which doesn't bump resources to the max without a need
>> but provides balanced use of resources - absolutely. -h was kept as there
>> was understanding there are people in the community actively using it.
>> > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> > +0
>> > I also wanted to address a few of the points David made.
>> > "Ekaterina is probably dealing with with her JDK17 work" - if you mean
>> to ensure we have all jobs for all jdks properly, yes. That was my plan.
>> Until Derek was so good at suggesting to work on adding missing jobs in
>> CircleCI now so my work on that will be a bit less for certain things. This
>> is an effort related to the recent changes in our release document. Ticket
>> CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is
>> welcome to join the party.
>> > "1) resource_class used is not because its needed… in HIGHER file we
>> default to xlarge but only python upgrade tests need that… reported in
>> CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place
>> as I mentioned in my other email the other day. [1]
>> >
>> > "our current patching allows MID/HIGHER to drift as changes need new
>> patches else patching may do the wrong thing… reported in CASSANDRA-17600"
>> - I'd say the patching is annoying sometimes, indeed but with/without the
>> patching any changes to config mean we need to check it by reading through
>> diff and pushing a run to CI before commit. With that said I am all in for
>> automation but this will not change the fact we need to push test runs and
>> verify the changes did not hurt us in a way. Same as testing patches on all
>> branches, running all needed tests and confirming no regressions. Nothing
>> new or changing here IMHO
>> >
>> > "CI is a combinatorial problem, we need to run all jobs for all JDKs,
>> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently
>> controlled and fleshed out by humans who want to add new jobs.. we should
>> move away from maintaining .circleci/config-2_1.yml and instead
>> auto-generate it. Simple example of this problem is jdk11 support… we run a
>> subset of tests on jdk11 and say its supported… will jdk17 have the same
>> issue? Will it be even less tests? Why does the burden lie on everyone to
>> “do the right thing” when all they want is a simple job?"
>> >  Controlled and fleshed by humans it will always be but I agree we need
>> to automate the steps to make it easier for people to add most of the
>> combinations and not to skip any because it is too much work. We will
>> always need a human to decide which jdks, cdc, vnodes, etc. With that said
>> I shared your ticket/patch with Derek as he had similar thoughts, we need
>> to get back to that one at some point. (CASSANDRA-17600) Thanks for working
>> on that!
>> >
>> > "why do we require people to install “circleci” command to contribute?
>> If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will
>> work just fine… we don’t need to call “circleci config process” every time
>> we touch circle config…. Also, seems that w/e someone new to circle config
>> (but not cassandra) touch it they always mutate LOW/MID/HIGH and not
>> .circleci/config-2_1.yml… so I keep going back to fix
>> .circleci/config-2_1.yml…."
>> > I'd say config-2_1.yml is mainly for those who will make permanent
>> changes to config (like adding/removing jobs). config-2_1.yml is actually
>> created as per the CircleCI automation rules - 1st we add and reuse
>> executors, parameters and commands but I think we can reduce further things
>> if we add even more parameters probably. I have to look more into the
>> current file. I am sure there is room for further improvement. 2nd circleci
>> cli tool can verify the config file for errors and helps with debugging
>> before we push to CircleCI. There is circleci config validate. If we make
>> changes manually we are on our own to verify the long yml and also deal
>> with duplication in config.yml. My concern is that things that need to be
>> almost identical might start to diverge easier. Though I made my suggestion
>> in point 1 for what cases probably we can add menu options that potentially
>> will not require using circleci cli tool. There might be more cases though.
>> > Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines.
>> I'd say lots of duplication there
>> >
>> > [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>> >
>> > On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:
>> >>
>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight
>> on this)
>> >>
>> >>
>> >> +1 to this!  I drastically lower our parallelism as only python-dtest
>> upgrade tests need many resources…
>> >>
>> >> What I do for JVM unit/jvm-dtest is the following
>> >>
>> >> def java_parallelism(src_dir, kind, num_file_in_worker, include =
>> lambda a, b: True):
>> >>     d = os.path.join(src_dir, 'test', kind)
>> >>     num_files = 0
>> >>     for root, dirs, files in os.walk(d):
>> >>         for f in files:
>> >>             if f.endswith('Test.java') and include(os.path.join(root,
>> f), f):
>> >>                 num_files += 1
>> >>     return math.floor(num_files / num_file_in_worker)
>> >>
>> >> def fix_parallelism(args, contents):
>> >>     jobs = contents['jobs']
>> >>
>> >>     unit_parallelism                = java_parallelism(args.src,
>> 'unit', 20)
>> >>     jvm_dtest_parallelism           = java_parallelism(args.src,
>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>> >>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>> >>
>> >> TL;DR - I find all test files we are going to run, and based off a
>> pre-defined variable that says “idea” number of files per worker, I then
>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
>> workers.  Can I be “smarter” by knowing which files have higher cost?
>> Sure… but the “perfect” and the “average” are too similar that it wasn’t
>> worth it...
>> >>
>> >> 2. Rename jobs on circle to be more indicative of their function
>> >>
>> >>
>> >> Have an example?  I am not against, I just don’t know the problem you
>> are referring to.
>> >>
>> >> 3. Unify j8 and j11 workflow pairs into single
>> >>
>> >>
>> >> Fine by me, but we need to keep in mind j17 is coming.  Also, most
>> developmental CI builds don’t really need to run cross every JDK, so we
>> need some way to disable different JDKs…
>> >>
>> >> When I am testing out a patch I tend to run the following (my script):
>> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know
>> I am going to run them pre-merge so I know its safe for me.
>> >>
>> >> 5. Flag on generate.sh to allow auto-run on push
>> >>
>> >>
>> >> I really hate that we don’t do this by default… I still to this day
>> strongly feel you should opt-out of CI rather than opt-in… seen several
>> commits get merged as they didn’t see a error in circle… because circle
>> didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
>> >>
>> >> TL;DR +1
>> >>
>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> >>
>> >>
>> >> +0 from me… I have seen people not realize you have to commit after
>> typing “higher” (wrapper around my personal circleci-enable.py script to
>> apply my defaults to the build) but not an issue I have… so I don’t mind if
>> people want the tool to integrate with git…
>> >>
>> >>
>> >> With all that said, I do feel there is more, and something I feel
>> Ekaterina is probably dealing with with her JDK17 work…
>> >>
>> >> 1) resource_class used is not because its needed… in HIGHER file we
>> default to xlarge but only python upgrade tests need that… reported in
>> CASSANDRA-17600
>> >> 2) our current patching allows MID/HIGHER to drift as changes need new
>> patches else patching may do the wrong thing… reported in CASSANDRA-17600
>> >> 3) CI is a combinatorial problem, we need to run all jobs for all
>> JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is
>> currently controlled and fleshed out by humans who want to add new jobs..
>> we should move away from maintaining .circleci/config-2_1.yml and instead
>> auto-generate it.  Simple example of this problem is jdk11 support… we run
>> a subset of tests on jdk11 and say its supported… will jdk17 have the same
>> issue?  Will it be even less tests?  Why does the burden lie on everyone to
>> “do the right thing” when all they want is a simple job?
>> >> 4) why do we require people to install “circleci” command to
>> contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml
>> then CI will work just fine… we don’t need to call “circleci config
>> process” every time we touch circle config…. Also, seems that w/e someone
>> new to circle config (but not cassandra) touch it they always mutate
>> LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix
>> .circleci/config-2_1.yml….
>> >>
>> >>
>> >> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <
>> Stefan.Miklosovic@netapp.com> wrote:
>> >>
>> >> 1) would be nice to have. The first thing I do is that I change the
>> parallelism to 20. None of committed config.yaml's are appropriate for our
>> company CircleCI so I have to tweak this manually. I think we can not run
>> more that 25/30 containers in parallel, something like that. HIGHRES has
>> 100 and MIDRES has some jobs having parallelism equal to 50 or so so that
>> is not good either. I would be happy with simple way to modify default
>> config.yaml on parallelism. I use "sed" to change parallelism: 4 to
>> parallelism: 20 and leave parallelism: 1 where it does not make sense to
>> increase it. However I noticed that there is not "4" set everywhere, some
>> jobs have it set to "1" so I have to take extra care of these cases (I
>> consider that to be a bug, I think there are two or three, I do not
>> remember). Once set, I have that config in "git stash" so I just apply it
>> every time I need it.
>> >>
>> >> 5) would be nice too.
>> >> 7) is nice but not crucial, it takes no time to commit that.
>> >>
>> >> ________________________________________
>> >> From: Josh McKenzie <jm...@apache.org>
>> >> Sent: Wednesday, October 19, 2022 21:50
>> >> To: dev
>> >> Subject: [DISCUSS] Potential circleci config and workflow changes
>> >>
>> >> NetApp Security WARNING: This is an external email. Do not click links
>> or open attachments unless you recognize the sender and know the content is
>> safe.
>> >>
>> >>
>> >>
>> >> While working w/Andres on CASSANDRA-17939 a variety of things came up
>> regarding our circleci config and opportunities to improve it. Figured I'd
>> hit the list up here to see what people's thoughts are since many of us
>> intersect with these systems daily and having your workflow disrupted
>> without having a chance to provide input is bad.
>> >>
>> >> The ideas:
>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight
>> on this)
>> >> 2. Rename jobs on circle to be more indicative of their function
>> >> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
>> )
>> >> 4. Update documentation w/guidance on using circle,
>> .circleci/generate.sh examples, etc
>> >> 4a. How to commit:
>> https://cassandra.apache.org/_/development/how_to_commit.html
>> >> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>> >> 5. Flag on generate.sh to allow auto-run on push
>> >> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
>> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
>> and instead be a documentation issue
>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> >>
>> >> Curious to see what folks think.
>> >>
>> >> ~Josh
>> >>
>> >>
>>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Yes, they do. This is the only test suite that gets max resources with -m.
Probably you had some other issue Berenguer as I can confirm I was running
them successfully these days

On Thu, 20 Oct 2022 at 6:54, Brandon Williams <dr...@gmail.com> wrote:

> They passed with -m for me recently.
>
> Kind Regards,
> Brandon
>
> On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
> <be...@gmail.com> wrote:
> >
> > Can python upgrade tests be ran without -h? Last time I tried iirc they
> fail on -m
> >
> > On 20/10/22 4:11, Ekaterina Dimitrova wrote:
> >
> > Thank you Josh. Glad to see that our CI is getting more attention. As no
> Cassandra feature will be there if we don't do proper testing, right?
> Important as all the suites and tools we have. With that being said I am
> glad to see Derek is volunteering to spend more time on this as I believe
> this is always the main issue - ideas and willingness for improvements are
> there but people are swamped with other things and we lack manpower for
> something so important.
> > 1. Tune parallelism levels per job (David and Ekaterina have insight on
> this)
> > Question for David, do you tune only parallelism and use only xlarge? If
> yes, we need to talk :D
> > Reading what Stefan shared as experience/feedback, I think we can revise
> the current config and move to a more reasonable config that can work for
> most people but there will always be someone who needs something a bit
> different. With that said maybe we can add to our scripts/menu an option to
> change from command line through parameters parallelism and/or resources?
> For those who want further customization? I see this as a separate
> additional ticket probably. In that case we might probably skip the use of
> circleci config process for that part of the menu. (but not for adding new
> jobs and meaningful permanent updates)
> > 2. Rename jobs on circle to be more indicative of their function
> > +0 I am probably super used to the current names but Derek brought it to
> my attention that there are names which are confusing for someone new to
> the cassandra world. With that said I would say we can do this in a
> separate ticket, mass update.
> > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
> )
> > I am against unifying per JDK workflows but I am all in for unifying the
> pre-commit/separate workflows and getting back to 2 workflows as suggested
> by Andres. If we think of how that will look in the UI I think it will be
> super hard to follow. (the case of having unified both jdks in one workflow)
> > 4. Update documentation w/guidance on using circle,
> .circleci/generate.sh examples, etc 4a. How to commit:
> https://cassandra.apache.org/_/development/how_to_commit.html 4b.
> Testing: https://cassandra.apache.org/_/development/testing.html
> > I will open a ticket and post the guide I was working on. But it also
> doesn't make sense to fully update it now if we are going to significantly
> change the workflow soon. Until then I believe Andres has updated the
> circleci readme and provided good usage examples.
> > 5. Flag on generate.sh to allow auto-run on push
> > Auto-run on push? Can you elaborate? Like to start your whole workflow
> directly without using the UI? There is an approval step in the config
> file, we can probably add some flags to change pre-commit workflows to
> start build without approval when we use those mentioned flags. But having
> by default everything to start on push is an overkill in my opinion. People
> will be forgetting it and pushing builds for nothing on WIP branches.
> Talking from experience :D
> > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
> and instead be a documentation issue
> > If we agree except the free tier config file we want one more reasonable
> config which doesn't bump resources to the max without a need but provides
> balanced use of resources - absolutely. -h was kept as there was
> understanding there are people in the community actively using it.
> > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
> temporary circleci config" as the commit message
> > +0
> > I also wanted to address a few of the points David made.
> > "Ekaterina is probably dealing with with her JDK17 work" - if you mean
> to ensure we have all jobs for all jdks properly, yes. That was my plan.
> Until Derek was so good at suggesting to work on adding missing jobs in
> CircleCI now so my work on that will be a bit less for certain things. This
> is an effort related to the recent changes in our release document. Ticket
> CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is
> welcome to join the party.
> > "1) resource_class used is not because its needed… in HIGHER file we
> default to xlarge but only python upgrade tests need that… reported in
> CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place
> as I mentioned in my other email the other day. [1]
> >
> > "our current patching allows MID/HIGHER to drift as changes need new
> patches else patching may do the wrong thing… reported in CASSANDRA-17600"
> - I'd say the patching is annoying sometimes, indeed but with/without the
> patching any changes to config mean we need to check it by reading through
> diff and pushing a run to CI before commit. With that said I am all in for
> automation but this will not change the fact we need to push test runs and
> verify the changes did not hurt us in a way. Same as testing patches on all
> branches, running all needed tests and confirming no regressions. Nothing
> new or changing here IMHO
> >
> > "CI is a combinatorial problem, we need to run all jobs for all JDKs,
> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently
> controlled and fleshed out by humans who want to add new jobs.. we should
> move away from maintaining .circleci/config-2_1.yml and instead
> auto-generate it. Simple example of this problem is jdk11 support… we run a
> subset of tests on jdk11 and say its supported… will jdk17 have the same
> issue? Will it be even less tests? Why does the burden lie on everyone to
> “do the right thing” when all they want is a simple job?"
> >  Controlled and fleshed by humans it will always be but I agree we need
> to automate the steps to make it easier for people to add most of the
> combinations and not to skip any because it is too much work. We will
> always need a human to decide which jdks, cdc, vnodes, etc. With that said
> I shared your ticket/patch with Derek as he had similar thoughts, we need
> to get back to that one at some point. (CASSANDRA-17600) Thanks for working
> on that!
> >
> > "why do we require people to install “circleci” command to contribute?
> If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will
> work just fine… we don’t need to call “circleci config process” every time
> we touch circle config…. Also, seems that w/e someone new to circle config
> (but not cassandra) touch it they always mutate LOW/MID/HIGH and not
> .circleci/config-2_1.yml… so I keep going back to fix
> .circleci/config-2_1.yml…."
> > I'd say config-2_1.yml is mainly for those who will make permanent
> changes to config (like adding/removing jobs). config-2_1.yml is actually
> created as per the CircleCI automation rules - 1st we add and reuse
> executors, parameters and commands but I think we can reduce further things
> if we add even more parameters probably. I have to look more into the
> current file. I am sure there is room for further improvement. 2nd circleci
> cli tool can verify the config file for errors and helps with debugging
> before we push to CircleCI. There is circleci config validate. If we make
> changes manually we are on our own to verify the long yml and also deal
> with duplication in config.yml. My concern is that things that need to be
> almost identical might start to diverge easier. Though I made my suggestion
> in point 1 for what cases probably we can add menu options that potentially
> will not require using circleci cli tool. There might be more cases though.
> > Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines.
> I'd say lots of duplication there
> >
> > [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
> >
> > On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:
> >>
> >> 1. Tune parallelism levels per job (David and Ekaterina have insight on
> this)
> >>
> >>
> >> +1 to this!  I drastically lower our parallelism as only python-dtest
> upgrade tests need many resources…
> >>
> >> What I do for JVM unit/jvm-dtest is the following
> >>
> >> def java_parallelism(src_dir, kind, num_file_in_worker, include =
> lambda a, b: True):
> >>     d = os.path.join(src_dir, 'test', kind)
> >>     num_files = 0
> >>     for root, dirs, files in os.walk(d):
> >>         for f in files:
> >>             if f.endswith('Test.java') and include(os.path.join(root,
> f), f):
> >>                 num_files += 1
> >>     return math.floor(num_files / num_file_in_worker)
> >>
> >> def fix_parallelism(args, contents):
> >>     jobs = contents['jobs']
> >>
> >>     unit_parallelism                = java_parallelism(args.src,
> 'unit', 20)
> >>     jvm_dtest_parallelism           = java_parallelism(args.src,
> 'distributed', 4, lambda full, name: 'upgrade' not in full)
> >>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
> 'distributed', 2, lambda full, name: 'upgrade' in full)
> >>
> >> TL;DR - I find all test files we are going to run, and based off a
> pre-defined variable that says “idea” number of files per worker, I then
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
> workers.  Can I be “smarter” by knowing which files have higher cost?
> Sure… but the “perfect” and the “average” are too similar that it wasn’t
> worth it...
> >>
> >> 2. Rename jobs on circle to be more indicative of their function
> >>
> >>
> >> Have an example?  I am not against, I just don’t know the problem you
> are referring to.
> >>
> >> 3. Unify j8 and j11 workflow pairs into single
> >>
> >>
> >> Fine by me, but we need to keep in mind j17 is coming.  Also, most
> developmental CI builds don’t really need to run cross every JDK, so we
> need some way to disable different JDKs…
> >>
> >> When I am testing out a patch I tend to run the following (my script):
> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know
> I am going to run them pre-merge so I know its safe for me.
> >>
> >> 5. Flag on generate.sh to allow auto-run on push
> >>
> >>
> >> I really hate that we don’t do this by default… I still to this day
> strongly feel you should opt-out of CI rather than opt-in… seen several
> commits get merged as they didn’t see a error in circle… because circle
> didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
> >>
> >> TL;DR +1
> >>
> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
> temporary circleci config" as the commit message
> >>
> >>
> >> +0 from me… I have seen people not realize you have to commit after
> typing “higher” (wrapper around my personal circleci-enable.py script to
> apply my defaults to the build) but not an issue I have… so I don’t mind if
> people want the tool to integrate with git…
> >>
> >>
> >> With all that said, I do feel there is more, and something I feel
> Ekaterina is probably dealing with with her JDK17 work…
> >>
> >> 1) resource_class used is not because its needed… in HIGHER file we
> default to xlarge but only python upgrade tests need that… reported in
> CASSANDRA-17600
> >> 2) our current patching allows MID/HIGHER to drift as changes need new
> patches else patching may do the wrong thing… reported in CASSANDRA-17600
> >> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs,
> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently
> controlled and fleshed out by humans who want to add new jobs..  we should
> move away from maintaining .circleci/config-2_1.yml and instead
> auto-generate it.  Simple example of this problem is jdk11 support… we run
> a subset of tests on jdk11 and say its supported… will jdk17 have the same
> issue?  Will it be even less tests?  Why does the burden lie on everyone to
> “do the right thing” when all they want is a simple job?
> >> 4) why do we require people to install “circleci” command to
> contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml
> then CI will work just fine… we don’t need to call “circleci config
> process” every time we touch circle config…. Also, seems that w/e someone
> new to circle config (but not cassandra) touch it they always mutate
> LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix
> .circleci/config-2_1.yml….
> >>
> >>
> >> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <
> Stefan.Miklosovic@netapp.com> wrote:
> >>
> >> 1) would be nice to have. The first thing I do is that I change the
> parallelism to 20. None of committed config.yaml's are appropriate for our
> company CircleCI so I have to tweak this manually. I think we can not run
> more that 25/30 containers in parallel, something like that. HIGHRES has
> 100 and MIDRES has some jobs having parallelism equal to 50 or so so that
> is not good either. I would be happy with simple way to modify default
> config.yaml on parallelism. I use "sed" to change parallelism: 4 to
> parallelism: 20 and leave parallelism: 1 where it does not make sense to
> increase it. However I noticed that there is not "4" set everywhere, some
> jobs have it set to "1" so I have to take extra care of these cases (I
> consider that to be a bug, I think there are two or three, I do not
> remember). Once set, I have that config in "git stash" so I just apply it
> every time I need it.
> >>
> >> 5) would be nice too.
> >> 7) is nice but not crucial, it takes no time to commit that.
> >>
> >> ________________________________________
> >> From: Josh McKenzie <jm...@apache.org>
> >> Sent: Wednesday, October 19, 2022 21:50
> >> To: dev
> >> Subject: [DISCUSS] Potential circleci config and workflow changes
> >>
> >> NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> >>
> >>
> >>
> >> While working w/Andres on CASSANDRA-17939 a variety of things came up
> regarding our circleci config and opportunities to improve it. Figured I'd
> hit the list up here to see what people's thoughts are since many of us
> intersect with these systems daily and having your workflow disrupted
> without having a chance to provide input is bad.
> >>
> >> The ideas:
> >> 1. Tune parallelism levels per job (David and Ekaterina have insight on
> this)
> >> 2. Rename jobs on circle to be more indicative of their function
> >> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
> )
> >> 4. Update documentation w/guidance on using circle,
> .circleci/generate.sh examples, etc
> >> 4a. How to commit:
> https://cassandra.apache.org/_/development/how_to_commit.html
> >> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
> >> 5. Flag on generate.sh to allow auto-run on push
> >> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
> and instead be a documentation issue
> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
> temporary circleci config" as the commit message
> >>
> >> Curious to see what folks think.
> >>
> >> ~Josh
> >>
> >>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Brandon Williams <dr...@gmail.com>.
They passed with -m for me recently.

Kind Regards,
Brandon

On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
<be...@gmail.com> wrote:
>
> Can python upgrade tests be ran without -h? Last time I tried iirc they fail on -m
>
> On 20/10/22 4:11, Ekaterina Dimitrova wrote:
>
> Thank you Josh. Glad to see that our CI is getting more attention. As no Cassandra feature will be there if we don't do proper testing, right? Important as all the suites and tools we have. With that being said I am glad to see Derek is volunteering to spend more time on this as I believe this is always the main issue - ideas and willingness for improvements are there but people are swamped with other things and we lack manpower for something so important.
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
> Question for David, do you tune only parallelism and use only xlarge? If yes, we need to talk :D
> Reading what Stefan shared as experience/feedback, I think we can revise the current config and move to a more reasonable config that can work for most people but there will always be someone who needs something a bit different. With that said maybe we can add to our scripts/menu an option to change from command line through parameters parallelism and/or resources? For those who want further customization? I see this as a separate additional ticket probably. In that case we might probably skip the use of circleci config process for that part of the menu. (but not for adding new jobs and meaningful permanent updates)
> 2. Rename jobs on circle to be more indicative of their function
> +0 I am probably super used to the current names but Derek brought it to my attention that there are names which are confusing for someone new to the cassandra world. With that said I would say we can do this in a separate ticket, mass update.
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
> I am against unifying per JDK workflows but I am all in for unifying the pre-commit/separate workflows and getting back to 2 workflows as suggested by Andres. If we think of how that will look in the UI I think it will be super hard to follow. (the case of having unified both jdks in one workflow)
> 4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc 4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html 4b. Testing: https://cassandra.apache.org/_/development/testing.html
> I will open a ticket and post the guide I was working on. But it also doesn't make sense to fully update it now if we are going to significantly change the workflow soon. Until then I believe Andres has updated the circleci readme and provided good usage examples.
> 5. Flag on generate.sh to allow auto-run on push
> Auto-run on push? Can you elaborate? Like to start your whole workflow directly without using the UI? There is an approval step in the config file, we can probably add some flags to change pre-commit workflows to start build without approval when we use those mentioned flags. But having by default everything to start on push is an overkill in my opinion. People will be forgetting it and pushing builds for nothing on WIP branches. Talking from experience :D
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
> If we agree except the free tier config file we want one more reasonable config which doesn't bump resources to the max without a need but provides balanced use of resources - absolutely. -h was kept as there was understanding there are people in the community actively using it.
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
> +0
> I also wanted to address a few of the points David made.
> "Ekaterina is probably dealing with with her JDK17 work" - if you mean to ensure we have all jobs for all jdks properly, yes. That was my plan. Until Derek was so good at suggesting to work on adding missing jobs in CircleCI now so my work on that will be a bit less for certain things. This is an effort related to the recent changes in our release document. Ticket CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is welcome to join the party.
> "1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place as I mentioned in my other email the other day. [1]
>
> "our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600" - I'd say the patching is annoying sometimes, indeed but with/without the patching any changes to config mean we need to check it by reading through diff and pushing a run to CI before commit. With that said I am all in for automation but this will not change the fact we need to push test runs and verify the changes did not hurt us in a way. Same as testing patches on all branches, running all needed tests and confirming no regressions. Nothing new or changing here IMHO
>
> "CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs.. we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it. Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue? Will it be even less tests? Why does the burden lie on everyone to “do the right thing” when all they want is a simple job?"
>  Controlled and fleshed by humans it will always be but I agree we need to automate the steps to make it easier for people to add most of the combinations and not to skip any because it is too much work. We will always need a human to decide which jdks, cdc, vnodes, etc. With that said I shared your ticket/patch with Derek as he had similar thoughts, we need to get back to that one at some point. (CASSANDRA-17600) Thanks for working on that!
>
> "why do we require people to install “circleci” command to contribute? If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml…."
> I'd say config-2_1.yml is mainly for those who will make permanent changes to config (like adding/removing jobs). config-2_1.yml is actually created as per the CircleCI automation rules - 1st we add and reuse executors, parameters and commands but I think we can reduce further things if we add even more parameters probably. I have to look more into the current file. I am sure there is room for further improvement. 2nd circleci cli tool can verify the config file for errors and helps with debugging before we push to CircleCI. There is circleci config validate. If we make changes manually we are on our own to verify the long yml and also deal with duplication in config.yml. My concern is that things that need to be almost identical might start to diverge easier. Though I made my suggestion in point 1 for what cases probably we can add menu options that potentially will not require using circleci cli tool. There might be more cases though.
> Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. I'd say lots of duplication there
>
> [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>
> On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:
>>
>> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>>
>>
>> +1 to this!  I drastically lower our parallelism as only python-dtest upgrade tests need many resources…
>>
>> What I do for JVM unit/jvm-dtest is the following
>>
>> def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
>>     d = os.path.join(src_dir, 'test', kind)
>>     num_files = 0
>>     for root, dirs, files in os.walk(d):
>>         for f in files:
>>             if f.endswith('Test.java') and include(os.path.join(root, f), f):
>>                 num_files += 1
>>     return math.floor(num_files / num_file_in_worker)
>>
>> def fix_parallelism(args, contents):
>>     jobs = contents['jobs']
>>
>>     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
>>     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
>>
>> TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...
>>
>> 2. Rename jobs on circle to be more indicative of their function
>>
>>
>> Have an example?  I am not against, I just don’t know the problem you are referring to.
>>
>> 3. Unify j8 and j11 workflow pairs into single
>>
>>
>> Fine by me, but we need to keep in mind j17 is coming.  Also, most developmental CI builds don’t really need to run cross every JDK, so we need some way to disable different JDKs…
>>
>> When I am testing out a patch I tend to run the following (my script): "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know I am going to run them pre-merge so I know its safe for me.
>>
>> 5. Flag on generate.sh to allow auto-run on push
>>
>>
>> I really hate that we don’t do this by default… I still to this day strongly feel you should opt-out of CI rather than opt-in… seen several commits get merged as they didn’t see a error in circle… because circle didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
>>
>> TL;DR +1
>>
>> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
>>
>>
>> +0 from me… I have seen people not realize you have to commit after typing “higher” (wrapper around my personal circleci-enable.py script to apply my defaults to the build) but not an issue I have… so I don’t mind if people want the tool to integrate with git…
>>
>>
>> With all that said, I do feel there is more, and something I feel Ekaterina is probably dealing with with her JDK17 work…
>>
>> 1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600
>> 2) our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600
>> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs..  we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it.  Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue?  Will it be even less tests?  Why does the burden lie on everyone to “do the right thing” when all they want is a simple job?
>> 4) why do we require people to install “circleci” command to contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml….
>>
>>
>> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <St...@netapp.com> wrote:
>>
>> 1) would be nice to have. The first thing I do is that I change the parallelism to 20. None of committed config.yaml's are appropriate for our company CircleCI so I have to tweak this manually. I think we can not run more that 25/30 containers in parallel, something like that. HIGHRES has 100 and MIDRES has some jobs having parallelism equal to 50 or so so that is not good either. I would be happy with simple way to modify default config.yaml on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and leave parallelism: 1 where it does not make sense to increase it. However I noticed that there is not "4" set everywhere, some jobs have it set to "1" so I have to take extra care of these cases (I consider that to be a bug, I think there are two or three, I do not remember). Once set, I have that config in "git stash" so I just apply it every time I need it.
>>
>> 5) would be nice too.
>> 7) is nice but not crucial, it takes no time to commit that.
>>
>> ________________________________________
>> From: Josh McKenzie <jm...@apache.org>
>> Sent: Wednesday, October 19, 2022 21:50
>> To: dev
>> Subject: [DISCUSS] Potential circleci config and workflow changes
>>
>> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>>
>>
>> While working w/Andres on CASSANDRA-17939 a variety of things came up regarding our circleci config and opportunities to improve it. Figured I'd hit the list up here to see what people's thoughts are since many of us intersect with these systems daily and having your workflow disrupted without having a chance to provide input is bad.
>>
>> The ideas:
>> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>> 2. Rename jobs on circle to be more indicative of their function
>> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
>> 4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc
>> 4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html
>> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>> 5. Flag on generate.sh to allow auto-run on push
>> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
>> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
>>
>> Curious to see what folks think.
>>
>> ~Josh
>>
>>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Berenguer Blasi <be...@gmail.com>.
Can python upgrade tests be ran without -h? Last time I tried iirc they 
fail on -m

On 20/10/22 4:11, Ekaterina Dimitrova wrote:
> Thank you Josh. Glad to see that our CI is getting more attention. As 
> no Cassandra feature will be there if we don't do proper testing, 
> right? Important as all the suites and tools we have. With that being 
> said I am glad to see Derek is volunteering to spend more time on this 
> as I believe this is always the main issue - ideas and willingness for 
> improvements are there but people are swamped with other things and we 
> lack manpower for something so important.
> 1. Tune parallelism levels per job (David and Ekaterina have insight 
> on this)
> Question for David, do you tune only parallelism and use only xlarge? 
> If yes, we need to talk :D
> Reading what Stefan shared as experience/feedback, I think we can 
> revise the current config and move to a more reasonable config that 
> can work for most people but there will always be someone who needs 
> something a bit different. With that said maybe we can add to our 
> scripts/menu an option to change from command line through parameters 
> parallelism and/or resources? For those who want further 
> customization? I see this as a separate additional ticket probably. In 
> that case we might probably skip the use of circleci config process 
> for that part of the menu. (but not for adding new jobs and meaningful 
> permanent updates)
> 2. Rename jobs on circle to be more indicative of their function
> +0 I am probably super used to the current names but Derek brought it 
> to my attention that there are names which are confusing for someone 
> new to the cassandra world. With that said I would say we can do this 
> in a separate ticket, mass update.
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: 
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595 
> <https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595>) 
>
> I am against unifying per JDK workflows but I am all in for unifying 
> the pre-commit/separate workflows and getting back to 2 workflows as 
> suggested by Andres. If we think of how that will look in the UI I 
> think it will be super hard to follow. (the case of having unified 
> both jdks in one workflow)
> 4. Update documentation w/guidance on using circle, 
> .circleci/generate.sh examples, etc 4a. How to commit: 
> https://cassandra.apache.org/_/development/how_to_commit.html4b. 
> Testing: https://cassandra.apache.org/_/development/testing.html
> I will open a ticket and post the guide I was working on. But it also 
> doesn't make sense to fully update it now if we are going to 
> significantly change the workflow soon. Until then I believe Andres 
> has updated the circleci readme and provided good usage examples.
> 5. Flag on generate.sh to allow auto-run on push
> Auto-run on push? Can you elaborate? Like to start your whole workflow 
> directly without using the UI? There is an approval step in the config 
> file, we can probably add some flags to change pre-commit workflows to 
> start build without approval when we use those mentioned flags. But 
> having by default everything to start on push is an overkill in my 
> opinion. People will be forgetting it and pushing builds for nothing 
> on WIP branches. Talking from experience :D
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for 
> all suites, default to -m, deprecate -h?) <- may not be a code-change 
> issue and instead be a documentation issue
> If we agree except the free tier config file we want one more 
> reasonable config which doesn't bump resources to the max without a 
> need but provides balanced use of resources - absolutely. -h was kept 
> as there was understanding there are people in the community actively 
> using it.
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
> temporary circleci config" as the commit message
> +0
> I also wanted to address a few of the points David made.
> "Ekaterina is probably dealing with with her JDK17 work" - if you mean 
> to ensure we have all jobs for all jdks properly, yes. That was my 
> plan. Until Derek was so good at suggesting to work on adding missing 
> jobs in CircleCI now so my work on that will be a bit less for certain 
> things. This is an effort related to the recent changes in our release 
> document. Ticket CASSANDRA-17950 
> <https://issues.apache.org/jira/browse/CASSANDRA-17950> :-) I am 
> helping with mentoring/reviews. Everyone is welcome to join the party.
> "1) resource_class used is not because its needed… in HIGHER file we 
> default to xlarge but only python upgrade tests need that… reported in 
> CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first 
> place as I mentioned in my other email the other day. [1]
>
> "our current patching allows MID/HIGHER to drift as changes need new 
> patches else patching may do the wrong thing… reported in 
> CASSANDRA-17600" - I'd say the patching is annoying sometimes, indeed 
> but with/without the patching any changes to config mean we need to 
> check it by reading through diff and pushing a run to CI before 
> commit. With that said I am all in for automation but this will not 
> change the fact we need to push test runs and verify the changes did 
> not hurt us in a way. Same as testing patches on all branches, running 
> all needed tests and confirming no regressions. Nothing new or 
> changing here IMHO
>
> "CI is a combinatorial problem, we need to run all jobs for all JDKs, 
> vnode on/of, cdc on/off, compression on/of, etc…. But this is 
> currently controlled and fleshed out by humans who want to add new 
> jobs.. we should move away from maintaining .circleci/config-2_1.yml 
> and instead auto-generate it. Simple example of this problem is jdk11 
> support… we run a subset of tests on jdk11 and say its supported… will 
> jdk17 have the same issue? Will it be even less tests? Why does the 
> burden lie on everyone to “do the right thing” when all they want is a 
> simple job?"
>  Controlled and fleshed by humans it will always be but I agree we 
> need to automate the steps to make it easier for people to add most of 
> the combinations and not to skip any because it is too much work. We 
> will always need a human to decide which jdks, cdc, vnodes, etc. With 
> that said I shared your ticket/patch with Derek as he had similar 
> thoughts, we need to get back to that one at some point. 
> (CASSANDRA-17600) Thanks for working on that!
> "why do we require people to install “circleci” command to contribute? 
> If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI 
> will work just fine… we don’t need to call “circleci config process” 
> every time we touch circle config…. Also, seems that w/e someone new 
> to circle config (but not cassandra) touch it they always mutate 
> LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to 
> fix .circleci/config-2_1.yml…."
> I'd say config-2_1.yml is mainly for those who will make permanent 
> changes to config (like adding/removing jobs). config-2_1.yml is 
> actually created as per the CircleCI automation rules - 1st we add and 
> reuse executors, parameters and commands but I think we can reduce 
> further things if we add even more parameters probably. I have to look 
> more into the current file. I am sure there is room for further 
> improvement. 2nd circleci cli tool can verify the config file for 
> errors and helps with debugging before we push to CircleCI. There is 
> circleci config validate. If we make changes manually we are on our 
> own to verify the long yml and also deal with duplication in 
> config.yml. My concern is that things that need to be almost identical 
> might start to diverge easier. Though I made my suggestion in point 1 
> for what cases probably we can add menu options that potentially will 
> not require using circleci cli tool. There might be more cases though.
> Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. 
> I'd say lots of duplication there
> [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>
> On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:
>
>>     1. Tune parallelism levels per job (David and Ekaterina have
>>     insight on this)
>
>     +1 to this!  I drastically lower our parallelism as
>     only python-dtest upgrade tests need many resources…
>
>     What I do for JVM unit/jvm-dtest is the following
>
>         def java_parallelism(src_dir, kind, num_file_in_worker,
>         include = lambda a, b: True):
>             d = os.path.join(src_dir, 'test', kind)
>             num_files = 0
>             for root, dirs, files in os.walk(d):
>                 for f in files:
>                     if f.endswith('Test.java') and
>         include(os.path.join(root, f), f):
>                         num_files += 1
>             return math.floor(num_files / num_file_in_worker)
>
>         def fix_parallelism(args, contents):
>             jobs = contents['jobs']
>
>             unit_parallelism                =
>         java_parallelism(args.src, 'unit', 20)
>             jvm_dtest_parallelism           =
>         java_parallelism(args.src, 'distributed', 4, lambda full,
>         name: 'upgrade' not in full)
>             jvm_dtest_upgrade_parallelism   =
>         java_parallelism(args.src, 'distributed', 2, lambda full,
>         name: 'upgrade' in full)
>
>     TL;DR - I find all test files we are going to run, and based off a
>     pre-defined variable that says “idea” number of files per
>     worker, I then calculate how many workers we need.  So unit tests
>     are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing
>     which files have higher cost?  Sure… but the “perfect” and
>     the “average” are too similar that it wasn’t worth it...
>
>>     2. Rename jobs on circle to be more indicative of their function
>
>     Have an example?  I am not against, I just don’t know the problem
>     you are referring to.
>
>>     3. Unify j8 and j11 workflow pairs into single
>
>     Fine by me, but we need to keep in mind j17 is coming.  Also, most
>     developmental CI builds don’t really need to run cross every JDK,
>     so we need some way to disable different JDKs…
>
>     When I am testing out a patch I tend to run the following (my
>     script): "circleci-enable.py --no-jdk11”; this will remove the
>     JDK11 builds.  I know I am going to run them pre-merge so I know
>     its safe for me.
>
>>     5. Flag on generate.sh to allow auto-run on push
>
>     I really hate that we don’t do this by default… I still to this
>     day strongly feel you should *opt-out* of CI rather than opt-in…
>     seen several commits get merged as they didn’t see a error in
>     circle… because circle didn’t do any work…. Yes, I am fully aware
>     that I am beating a dead horse…
>
>     TL;DR +1
>
>>     7. Consider flag on generate.sh to run and commit with "[DO NOT
>>     MERGE] temporary circleci config" as the commit message
>
>     +0 from me… I have seen people not realize you have to commit
>     after typing “higher” (wrapper around my personal
>     circleci-enable.py script to apply my defaults to the build) but
>     not an issue I have… so I don’t mind if people want the tool to
>     integrate with git…
>
>
>     With all that said, I do feel there is more, and something I feel
>     Ekaterina is probably dealing with with her JDK17 work…
>
>     1) resource_class used is not because its needed… in HIGHER file
>     we default to xlarge but only python upgrade tests need that…
>     reported in CASSANDRA-17600
>     2) our current patching allows MID/HIGHER to drift as changes need
>     new patches else patching may do the wrong thing… reported in
>     CASSANDRA-17600
>     3) CI is a combinatorial problem, we need to run all jobs for all
>     JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this
>     is currently controlled and fleshed out by humans who want to add
>     new jobs..  we should move away from
>     maintaining .circleci/config-2_1.yml and instead auto-generate
>     it.  Simple example of this problem is jdk11 support… we run a
>     subset of tests on jdk11 and say its supported… will jdk17 have
>     the same issue? Will it be even less tests?  Why does the burden
>     lie on everyone to “do the right thing” when all they want is a
>     simple job?
>     4) why do we require people to install “circleci” command to
>     contribute? If you rename .circleci/config-2_1.yml
>     to .circleci/config.yml then CI will work just fine… we don’t need
>     to call “circleci config process” every time we touch circle
>     config…. Also, seems that w/e someone new to circle config (but
>     not cassandra) touch it they always mutate LOW/MID/HIGH and
>     not .circleci/config-2_1.yml… so I keep going back to
>     fix .circleci/config-2_1.yml….
>
>
>>     On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan
>>     <St...@netapp.com> wrote:
>>
>>     1) would be nice to have. The first thing I do is that I change
>>     the parallelism to 20. None of committed config.yaml's are
>>     appropriate for our company CircleCI so I have to tweak this
>>     manually. I think we can not run more that 25/30 containers in
>>     parallel, something like that. HIGHRES has 100 and MIDRES has
>>     some jobs having parallelism equal to 50 or so so that is not
>>     good either. I would be happy with simple way to modify default
>>     config.yaml on parallelism. I use "sed" to change parallelism: 4
>>     to parallelism: 20 and leave parallelism: 1 where it does not
>>     make sense to increase it. However I noticed that there is not
>>     "4" set everywhere, some jobs have it set to "1" so I have to
>>     take extra care of these cases (I consider that to be a bug, I
>>     think there are two or three, I do not remember). Once set, I
>>     have that config in "git stash" so I just apply it every time I
>>     need it.
>>
>>     5) would be nice too.
>>     7) is nice but not crucial, it takes no time to commit that.
>>
>>     ________________________________________
>>     From: Josh McKenzie <jm...@apache.org>
>>     Sent: Wednesday, October 19, 2022 21:50
>>     To: dev
>>     Subject: [DISCUSS] Potential circleci config and workflow changes
>>
>>     NetApp Security WARNING: This is an external email. Do not click
>>     links or open attachments unless you recognize the sender and
>>     know the content is safe.
>>
>>
>>
>>     While working w/Andres on CASSANDRA-17939 a variety of things
>>     came up regarding our circleci config and opportunities to
>>     improve it. Figured I'd hit the list up here to see what people's
>>     thoughts are since many of us intersect with these systems daily
>>     and having your workflow disrupted without having a chance to
>>     provide input is bad.
>>
>>     The ideas:
>>     1. Tune parallelism levels per job (David and Ekaterina have
>>     insight on this)
>>     2. Rename jobs on circle to be more indicative of their function
>>     3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
>>     https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
>>     <https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595>)
>>     4. Update documentation w/guidance on using circle,
>>     .circleci/generate.sh examples, etc
>>     4a. How to commit:
>>     https://cassandra.apache.org/_/development/how_to_commit.html
>>     4b. Testing: https://cassandra.apache.org/_/development/testing.html
>>     5. Flag on generate.sh to allow auto-run on push
>>     6. Clean up the -l, -m, -h flags (test and indicate -l
>>     feasibility for all suites, default to -m, deprecate -h?) <- may
>>     not be a code-change issue and instead be a documentation issue
>>     7. Consider flag on generate.sh to run and commit with "[DO NOT
>>     MERGE] temporary circleci config" as the commit message
>>
>>     Curious to see what folks think.
>>
>>     ~Josh
>>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Thank you Josh. Glad to see that our CI is getting more attention. As no
Cassandra feature will be there if we don't do proper testing, right?
Important as all the suites and tools we have. With that being said I am
glad to see Derek is volunteering to spend more time on this as I believe
this is always the main issue - ideas and willingness for improvements are
there but people are swamped with other things and we lack manpower for
something so important.
1. Tune parallelism levels per job (David and Ekaterina have insight on
this)
Question for David, do you tune only parallelism and use only xlarge? If
yes, we need to talk :D

Reading what Stefan shared as experience/feedback, I think we can revise
the current config and move to a more reasonable config that can work for
most people but there will always be someone who needs something a bit
different. With that said maybe we can add to our scripts/menu an option to
change from command line through parameters parallelism and/or resources?
For those who want further customization? I see this as a separate
additional ticket probably. In that case we might probably skip the
use of circleci
config process for that part of the menu. (but not for adding new jobs and
meaningful permanent updates)

2. Rename jobs on circle to be more indicative of their function
+0 I am probably super used to the current names but Derek brought it to my
attention that there are names which are confusing for someone new to the
cassandra world. With that said I would say we can do this in a separate
ticket, mass update.

3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
)
I am against unifying per JDK workflows but I am all in for unifying the
pre-commit/separate workflows and getting back to 2 workflows as suggested
by Andres. If we think of how that will look in the UI I think it will be
super hard to follow. (the case of having unified both jdks in one workflow)
4. Update documentation w/guidance on using circle, .circleci/generate.sh
examples, etc 4a. How to commit:
https://cassandra.apache.org/_/development/how_to_commit.html 4b. Testing:
https://cassandra.apache.org/_/development/testing.html
I will open a ticket and post the guide I was working on. But it also
doesn't make sense to fully update it now if we are going to significantly
change the workflow soon. Until then I believe Andres has updated the
circleci readme and provided good usage examples.
5. Flag on generate.sh to allow auto-run on push
Auto-run on push? Can you elaborate? Like to start your whole workflow
directly without using the UI? There is an approval step in the config
file, we can probably add some flags to change pre-commit workflows to
start build without approval when we use those mentioned flags. But having
by default everything to start on push is an overkill in my opinion. People
will be forgetting it and pushing builds for nothing on WIP branches.
Talking from experience :D

6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all
suites, default to -m, deprecate -h?) <- may not be a code-change issue and
instead be a documentation issue
If we agree except the free tier config file we want one more reasonable
config which doesn't bump resources to the max without a need but provides
balanced use of resources - absolutely. -h was kept as there
was understanding there are people in the community actively using it.
7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
temporary circleci config" as the commit message
+0

I also wanted to address a few of the points David made.
"Ekaterina is probably dealing with with her JDK17 work" - if you mean to
ensure we have all jobs for all jdks properly, yes. That was my plan. Until
Derek was so good at suggesting to work on adding missing jobs in CircleCI
now so my work on that will be a bit less for certain things. This is an
effort related to the recent changes in our release document. Ticket
CASSANDRA-17950 <https://issues.apache.org/jira/browse/CASSANDRA-17950> :-)
I am helping with mentoring/reviews. Everyone is welcome to join the party.
"1) resource_class used is not because its needed… in HIGHER file we
default to xlarge but only python upgrade tests need that… reported in
CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place
as I mentioned in my other email the other day. [1]

"our current patching allows MID/HIGHER to drift as changes need new
patches else patching may do the wrong thing… reported in CASSANDRA-17600"
- I'd say the patching is annoying sometimes, indeed but with/without the
patching any changes to config mean we need to check it by reading through
diff and pushing a run to CI before commit. With that said I am all in for
automation but this will not change the fact we need to push test runs and
verify the changes did not hurt us in a way. Same as testing patches on all
branches, running all needed tests and confirming no regressions. Nothing
new or changing here IMHO

"CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode
on/of, cdc on/off, compression on/of, etc…. But this is currently
controlled and fleshed out by humans who want to add new jobs.. we should
move away from maintaining .circleci/config-2_1.yml and instead
auto-generate it. Simple example of this problem is jdk11 support… we run a
subset of tests on jdk11 and say its supported… will jdk17 have the same
issue? Will it be even less tests? Why does the burden lie on everyone to
“do the right thing” when all they want is a simple job?"
 Controlled and fleshed by humans it will always be but I agree we need to
automate the steps to make it easier for people to add most of the
combinations and not to skip any because it is too much work. We will
always need a human to decide which jdks, cdc, vnodes, etc. With that said
I shared your ticket/patch with Derek as he had similar thoughts, we need
to get back to that one at some point. (CASSANDRA-17600) Thanks for working
on that!

"why do we require people to install “circleci” command to contribute? If
you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will
work just fine… we don’t need to call “circleci config process” every time
we touch circle config…. Also, seems that w/e someone new to circle config
(but not cassandra) touch it they always mutate LOW/MID/HIGH and not
.circleci/config-2_1.yml… so I keep going back to fix
.circleci/config-2_1.yml…."
I'd say config-2_1.yml is mainly for those who will make permanent changes
to config (like adding/removing jobs). config-2_1.yml is actually created
as per the CircleCI automation rules - 1st we add and reuse executors,
parameters and commands but I think we can reduce further things if we add
even more parameters probably. I have to look more into the current file. I
am sure there is room for further improvement. 2nd circleci cli tool can
verify the config file for errors and helps with debugging before we push
to CircleCI. There is circleci config validate. If we make changes manually
we are on our own to verify the long yml and also deal with duplication in
config.yml. My concern is that things that need to be almost identical
might start to diverge easier. Though I made my suggestion in point 1 for
what cases probably we can add menu options that potentially will not
require using circleci cli tool. There might be more cases though.
Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. I'd
say lots of duplication there

[1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl

On Wed, 19 Oct 2022 at 17:20, David Capwell <dc...@apple.com> wrote:

> 1. Tune parallelism levels per job (David and Ekaterina have insight on
> this)
>
>
> +1 to this!  I drastically lower our parallelism as only python-dtest
> upgrade tests need many resources…
>
> What I do for JVM unit/jvm-dtest is the following
>
> def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda
> a, b: True):
>     d = os.path.join(src_dir, 'test', kind)
>     num_files = 0
>     for root, dirs, files in os.walk(d):
>         for f in files:
>             if f.endswith('Test.java') and include(os.path.join(root, f),
> f):
>                 num_files += 1
>     return math.floor(num_files / num_file_in_worker)
>
> def fix_parallelism(args, contents):
>     jobs = contents['jobs']
>
>     unit_parallelism                = java_parallelism(args.src, 'unit',
> 20)
>     jvm_dtest_parallelism           = java_parallelism(args.src,
> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
> 'distributed', 2, lambda full, name: 'upgrade' in full)
>
> TL;DR - I find all test files we are going to run, and based off a
> pre-defined variable that says “idea” number of files per worker, I then
> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
> workers.  Can I be “smarter” by knowing which files have higher cost?
> Sure… but the “perfect” and the “average” are too similar that it wasn’t
> worth it...
>
> 2. Rename jobs on circle to be more indicative of their function
>
>
> Have an example?  I am not against, I just don’t know the problem you
> are referring to.
>
> 3. Unify j8 and j11 workflow pairs into single
>
>
> Fine by me, but we need to keep in mind j17 is coming.  Also, most
> developmental CI builds don’t really need to run cross every JDK, so we
> need some way to disable different JDKs…
>
> When I am testing out a patch I tend to run the following (my script):
> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I
> know I am going to run them pre-merge so I know its safe for me.
>
> 5. Flag on generate.sh to allow auto-run on push
>
>
> I really hate that we don’t do this by default… I still to this day
> strongly feel you should *opt-out* of CI rather than opt-in… seen several
> commits get merged as they didn’t see a error in circle… because circle
> didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
>
> TL;DR +1
>
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
> temporary circleci config" as the commit message
>
>
> +0 from me… I have seen people not realize you have to commit after
> typing “higher” (wrapper around my personal circleci-enable.py script to
> apply my defaults to the build) but not an issue I have… so I don’t mind if
> people want the tool to integrate with git…
>
>
> With all that said, I do feel there is more, and something I feel
> Ekaterina is probably dealing with with her JDK17 work…
>
> 1) resource_class used is not because its needed… in HIGHER file we
> default to xlarge but only python upgrade tests need that… reported in
> CASSANDRA-17600
> 2) our current patching allows MID/HIGHER to drift as changes need new
> patches else patching may do the wrong thing… reported in CASSANDRA-17600
> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs,
> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently
> controlled and fleshed out by humans who want to add new jobs..  we should
> move away from maintaining .circleci/config-2_1.yml and instead
> auto-generate it.  Simple example of this problem is jdk11 support… we run
> a subset of tests on jdk11 and say its supported… will jdk17 have the same
> issue?  Will it be even less tests?  Why does the burden lie on everyone
> to “do the right thing” when all they want is a simple job?
> 4) why do we require people to install “circleci” command to contribute?
> If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will
> work just fine… we don’t need to call “circleci config process” every time
> we touch circle config…. Also, seems that w/e someone new to circle config
> (but not cassandra) touch it they always mutate LOW/MID/HIGH and
> not .circleci/config-2_1.yml… so I keep going back to
> fix .circleci/config-2_1.yml….
>
>
> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <
> Stefan.Miklosovic@netapp.com> wrote:
>
> 1) would be nice to have. The first thing I do is that I change the
> parallelism to 20. None of committed config.yaml's are appropriate for our
> company CircleCI so I have to tweak this manually. I think we can not run
> more that 25/30 containers in parallel, something like that. HIGHRES has
> 100 and MIDRES has some jobs having parallelism equal to 50 or so so that
> is not good either. I would be happy with simple way to modify default
> config.yaml on parallelism. I use "sed" to change parallelism: 4 to
> parallelism: 20 and leave parallelism: 1 where it does not make sense to
> increase it. However I noticed that there is not "4" set everywhere, some
> jobs have it set to "1" so I have to take extra care of these cases (I
> consider that to be a bug, I think there are two or three, I do not
> remember). Once set, I have that config in "git stash" so I just apply it
> every time I need it.
>
> 5) would be nice too.
> 7) is nice but not crucial, it takes no time to commit that.
>
> ________________________________________
> From: Josh McKenzie <jm...@apache.org>
> Sent: Wednesday, October 19, 2022 21:50
> To: dev
> Subject: [DISCUSS] Potential circleci config and workflow changes
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> While working w/Andres on CASSANDRA-17939 a variety of things came up
> regarding our circleci config and opportunities to improve it. Figured I'd
> hit the list up here to see what people's thoughts are since many of us
> intersect with these systems daily and having your workflow disrupted
> without having a chance to provide input is bad.
>
> The ideas:
> 1. Tune parallelism levels per job (David and Ekaterina have insight on
> this)
> 2. Rename jobs on circle to be more indicative of their function
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
> )
> 4. Update documentation w/guidance on using circle, .circleci/generate.sh
> examples, etc
> 4a. How to commit:
> https://cassandra.apache.org/_/development/how_to_commit.html
> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
> 5. Flag on generate.sh to allow auto-run on push
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all
> suites, default to -m, deprecate -h?) <- may not be a code-change issue and
> instead be a documentation issue
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
> temporary circleci config" as the commit message
>
> Curious to see what folks think.
>
> ~Josh
>
>
>

Re: [DISCUSS] Potential circleci config and workflow changes

Posted by David Capwell <dc...@apple.com>.
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)

+1 to this!  I drastically lower our parallelism as only python-dtest upgrade tests need many resources…

What I do for JVM unit/jvm-dtest is the following

def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
    d = os.path.join(src_dir, 'test', kind)
    num_files = 0
    for root, dirs, files in os.walk(d):
        for f in files:
            if f.endswith('Test.java') and include(os.path.join(root, f), f):
                num_files += 1
    return math.floor(num_files / num_file_in_worker)

def fix_parallelism(args, contents):
    jobs = contents['jobs']

    unit_parallelism                = java_parallelism(args.src, 'unit', 20)
    jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
    jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)

TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...

> 2. Rename jobs on circle to be more indicative of their function


Have an example?  I am not against, I just don’t know the problem you are referring to.

> 3. Unify j8 and j11 workflow pairs into single


Fine by me, but we need to keep in mind j17 is coming.  Also, most developmental CI builds don’t really need to run cross every JDK, so we need some way to disable different JDKs…

When I am testing out a patch I tend to run the following (my script): "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know I am going to run them pre-merge so I know its safe for me.

> 5. Flag on generate.sh to allow auto-run on push


I really hate that we don’t do this by default… I still to this day strongly feel you should opt-out of CI rather than opt-in… seen several commits get merged as they didn’t see a error in circle… because circle didn’t do any work…. Yes, I am fully aware that I am beating a dead horse… 

TL;DR +1

> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message


+0 from me… I have seen people not realize you have to commit after typing “higher” (wrapper around my personal circleci-enable.py script to apply my defaults to the build) but not an issue I have… so I don’t mind if people want the tool to integrate with git…


With all that said, I do feel there is more, and something I feel Ekaterina is probably dealing with with her JDK17 work…

1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600
2) our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600
3) CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs..  we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it.  Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue?  Will it be even less tests?  Why does the burden lie on everyone to “do the right thing” when all they want is a simple job?
4) why do we require people to install “circleci” command to contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml….


> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <St...@netapp.com> wrote:
> 
> 1) would be nice to have. The first thing I do is that I change the parallelism to 20. None of committed config.yaml's are appropriate for our company CircleCI so I have to tweak this manually. I think we can not run more that 25/30 containers in parallel, something like that. HIGHRES has 100 and MIDRES has some jobs having parallelism equal to 50 or so so that is not good either. I would be happy with simple way to modify default config.yaml on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and leave parallelism: 1 where it does not make sense to increase it. However I noticed that there is not "4" set everywhere, some jobs have it set to "1" so I have to take extra care of these cases (I consider that to be a bug, I think there are two or three, I do not remember). Once set, I have that config in "git stash" so I just apply it every time I need it.
> 
> 5) would be nice too.
> 7) is nice but not crucial, it takes no time to commit that.
> 
> ________________________________________
> From: Josh McKenzie <jm...@apache.org>
> Sent: Wednesday, October 19, 2022 21:50
> To: dev
> Subject: [DISCUSS] Potential circleci config and workflow changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> While working w/Andres on CASSANDRA-17939 a variety of things came up regarding our circleci config and opportunities to improve it. Figured I'd hit the list up here to see what people's thoughts are since many of us intersect with these systems daily and having your workflow disrupted without having a chance to provide input is bad.
> 
> The ideas:
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
> 2. Rename jobs on circle to be more indicative of their function
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
> 4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc
> 4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html
> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
> 5. Flag on generate.sh to allow auto-run on push
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message
> 
> Curious to see what folks think.
> 
> ~Josh
> 


Re: [DISCUSS] Potential circleci config and workflow changes

Posted by "Miklosovic, Stefan" <St...@netapp.com>.
1) would be nice to have. The first thing I do is that I change the parallelism to 20. None of committed config.yaml's are appropriate for our company CircleCI so I have to tweak this manually. I think we can not run more that 25/30 containers in parallel, something like that. HIGHRES has 100 and MIDRES has some jobs having parallelism equal to 50 or so so that is not good either. I would be happy with simple way to modify default config.yaml on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and leave parallelism: 1 where it does not make sense to increase it. However I noticed that there is not "4" set everywhere, some jobs have it set to "1" so I have to take extra care of these cases (I consider that to be a bug, I think there are two or three, I do not remember). Once set, I have that config in "git stash" so I just apply it every time I need it.

5) would be nice too.
7) is nice but not crucial, it takes no time to commit that.

________________________________________
From: Josh McKenzie <jm...@apache.org>
Sent: Wednesday, October 19, 2022 21:50
To: dev
Subject: [DISCUSS] Potential circleci config and workflow changes

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.



While working w/Andres on CASSANDRA-17939 a variety of things came up regarding our circleci config and opportunities to improve it. Figured I'd hit the list up here to see what people's thoughts are since many of us intersect with these systems daily and having your workflow disrupted without having a chance to provide input is bad.

The ideas:
1. Tune parallelism levels per job (David and Ekaterina have insight on this)
2. Rename jobs on circle to be more indicative of their function
3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
4. Update documentation w/guidance on using circle, .circleci/generate.sh examples, etc
4a. How to commit: https://cassandra.apache.org/_/development/how_to_commit.html
4b. Testing: https://cassandra.apache.org/_/development/testing.html
5. Flag on generate.sh to allow auto-run on push
6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all suites, default to -m, deprecate -h?) <- may not be a code-change issue and instead be a documentation issue
7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] temporary circleci config" as the commit message

Curious to see what folks think.

~Josh