You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Josh McKenzie (Jira)" <ji...@apache.org> on 2022/11/02 19:44:00 UTC

[jira] [Updated] (CASSANDRA-18009) Tune parallelism for circleci jobs

     [ https://issues.apache.org/jira/browse/CASSANDRA-18009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh McKenzie updated CASSANDRA-18009:
--------------------------------------
    Component/s: CI
                     (was: Test/dtest/java)
                     (was: Test/dtest/python)
                     (was: Test/unit)

> Tune parallelism for circleci jobs
> ----------------------------------
>
>                 Key: CASSANDRA-18009
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18009
>             Project: Cassandra
>          Issue Type: Task
>          Components: CI
>            Reporter: Josh McKenzie
>            Priority: Normal
>
> We should tune the parallel parameters for our circleci config to be more optimal. From the email / slack conversations on the topic:
> {code}
> > def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True):
> >     d = os.path.join(src_dir, 'test', kind)
> >     num_files = 0
> >     for root, dirs, files in os.walk(d):
> >         for f in files:
> >             if f.endswith('Test.java') and include(os.path.join(root, f), f):
> >                 num_files += 1
> >     return math.floor(num_files / num_file_in_worker)
> > 
> > def fix_parallelism(args, contents):
> >     jobs = contents['jobs']
> > 
> >     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
> >     jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full)
> >     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full)
> {code}
> bq. `TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it...`
> Quoting [~dcapwell]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org