You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Andreas Reichel <an...@manticore-projects.com> on 2021/11/01 07:36:55 UTC

Running tests in parallel?

Greetings.

Pardon me to ask: Am I right that most of the Use Cases tests are
executed serially (only)?
Executing the tests seem to take surprisingly long time (8+ minutes?!)
at almost no load on my cpu cores.

If my observation was right: Is that intentional and why would we not
run the tests in parallel?
With my limited understanding, most of the tests create a workbook and
sheets and cells and do something and the assert. I think this could be
done in parallel, unless multiple tests accessed the same workbook?

(I am new to apache projects and I am no developer,  so please go soft
on me.)

Cheers
Andreas


Re: Running tests in parallel?

Posted by Andreas Beeker <ki...@apache.org>.
Hi Andreas,

the two files build/ooxml-lite-report.clazz & .xsb are generated via the OOXMLLiteAgent.
The agent is used on the modules with xmlbeans dependencies (poi-ooxml, poi-excelant and poi-integration) and incrementally gather the used XmlBeans classes and xsbs.

I haven't tried build caching up to now explicitly, but I assume it would skip generating poi-ooxml-full if the schemas haven't changed.

AFAIK the poi-ooxml-lite is not used in the main gradle build and we still need to migrate the poi-integration/distsourcebuild (build.xml) ant and jenkins job to actually test the lite jar. Maybe we can exclude the poi-ooxml-lite tasks in the "check" phase and only activate them in the "jenkins" phase.

Best wishes,
Andi



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Running tests in parallel?

Posted by Andreas Reichel <an...@manticore-projects.com>.
Dear All,

I submitted  PR #275 which reduced the build time by est. 27%. Its a
low hanging fruit, just using parallel building.

However, I was not able to address the big elephant in the room: Gradle
Build Caching.
When activating it, the subproject `POI-OOXML-LITE` will fail on the
task `generateModuleInfo` when a file is not found:
`Property '$1' specifies file '/home/are/Documents/src/poi/build/ooxml-
lite-report.clazz' which doesn't exist.`

Still I would love to drive that further, because when I excluded POI-
OOXML-LITE from the project, I was able to activate Gradle Build
Caching and a `gradle clean jar` (without changes) rebuilt in 14
seconds instead of optimized 6:20 minutes.

(I understand of course, that this works only when nothing has changed,
but this is a typical development interest. You change only a few lines
and then want to run you tests as fast as possible before going to the
next. Without caching, clean jar build ALWAYS takes 6:20 even when
nothing has changed! )

So in my limited understanding, POI-OOXML-LITE:generateModuleInfo seems
to be the only showstopper.

My question is: can anyone tell me, which step/task creates the file 

`File clazzFile = file("${OOXML_LITE_REPORT}.clazz")`

I would love to try my luck and hard forcing this step/task before POI-
OOXML-LITE:generateModuleInfo kicks in.
Build Caching looks too sweet for me.

Thanks in advance for advise and  cheers
Andreas

Re: Running tests in parallel?

Posted by Andreas Reichel <an...@manticore-projects.com>.
Dear All.

On Mon, 2021-11-01 at 14:36 +0700, Andreas Reichel wrote:
> Am I right that most of the Use Cases tests are executed serially
> (only)?

Looks like I was semi-right:
// set heap size for the test JVM(s)
minHeapSize = "128m"
maxHeapSize = "1512m"

// Specifying the local via system properties did not work, so we set them this way
jvmArgs << [
    '-Djava.io.tmpdir=build',
    '-DPOI.testdata.path=../test-data',
    '-Djava.awt.headless=true',
    '-Djava.locale.providers=JRE,CLDR',
    '-Duser.language=en',
    '-Duser.country=US',
    '-Djavax.xml.stream.XMLInputFactory=com.sun.xml.internal.stream.XMLInputFactoryImpl',
    "-Dversion.id=${project.version}",
    '-ea',
    '-Djunit.jupiter.execution.parallel.config.strategy=fixed',
    '-Djunit.jupiter.execution.parallel.config.fixed.parallelism=2'
    // -Xjit:verbose={compileStart|compileEnd},vlog=build/jit.log${no.jit.sherlock}   ... if ${isIBMVM}
]


Questions please:

1) why do we not allocate maxHeapSize dynamically based on the (Free)
Memory of the OS, e.g. use 50% of that memory?
2) why do we not allocate all Cpu Cores, but just 2? What advantage do
the following lines have:
'-Djunit.jupiter.execution.parallel.config.strategy=fixed',
'-Djunit.jupiter.execution.parallel.config.fixed.parallelism=2'

Best regards
Andreas