You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Xinbo Lu (Jira)" <ji...@apache.org> on 2023/11/03 03:40:00 UTC
[jira] [Updated] (TIKA-4165) Fix a flaky test, caused by nondeterministic iteration order of HashMap
[ https://issues.apache.org/jira/browse/TIKA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinbo Lu updated TIKA-4165:
---------------------------
Attachment: diff
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> -----------------------------------------------------------------------
>
> Key: TIKA-4165
> URL: https://issues.apache.org/jira/browse/TIKA-4165
> Project: Tika
> Issue Type: Improvement
> Reporter: Xinbo Lu
> Priority: Minor
> Attachments: diff
>
>
> Fix a flaky test, caused by nondeterministic iteration order of HashMap
> h3. Related Test
> [org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious|https://github.com/lxb007981/tika/blob/971f0cbd9b46c1d7fb96b0a3732c3fc870920aba/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java#L59]
> h3. Root Cause
>
> [tika/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java|https://github.com/lxb007981/tika/blob/d466492e0a01c8ee28c108bc3022f1a03ff530de/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java#L119]
> Line 119 in [d466492|https://github.com/lxb007981/tika/commit/d466492e0a01c8ee28c108bc3022f1a03ff530de]
> ||for (Map.Entry<String, Metadata> embeddedImage : embeddedImages.entrySet()) {||
>
> When recursively extracting metadata from an XPS file, a simple {{HashMap}} is used and iterated. However, note that {{HashMap}} does not gurantee the order of iteration, the extracted metadata have no guaranteed order in the resulting metadata list. And later in the test {{{}org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest.testVarious{}}}, the test assumes the order of metadata in the list is the same as the {{HashMap}} insertion order, thus renders the test flaky.
> h3. Fix
> We sort the metadata list before doing comparison.
> h3. How to reproduce the test
> {*}Java version{*}: 11.0.20.1
> {*}Maven version{*}: 3.6.3
> # Build the module
> {{mvn clean install -DskipTests -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module -am}}
> # Test without shuffling
> {{mvn -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module test -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious}}
> This test passed.
> # Test with shuffling using [NonDex|https://github.com/TestingResearchIllinois/NonDex]
> {{{}mvn -pl tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=org.apache.tika.parser.microsoft.ooxml.xps.XPSParserTest#testVarious{}}}This test passed with the proposed fix but failed without it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)