You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2021/04/15 14:04:02 UTC

Test failure

We're getting a test failure.  I don't see any recent check-ins that would be causing this, so maybe it's been there for awhile (I don't always run the tests)

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContains:79 <span class="ocrx_word" id="word_1_1" not found in:
<html xmlns=http://www.w3.org/1999/xhtml>
<head>
<meta name="pdf:docinfo:custom:AAPL:Keywords" content="" />
<meta name="pdf:PDFVersion" content="1.3" />
<meta name="pdf:docinfo:title" content="Presentation1" />
<meta name="xmp:CreatorTool" content="PowerPoint" />
<meta name="pdf:hasXFA" content="false" />
<meta name="access_permission:modify_annotations" content="true" />
<meta name="access_permission:can_print_degraded" content="true" />
<meta name="AAPL:Keywords" content="" />
<meta name="dc:creator" content="grantingersoll" />
<meta name="dcterms:created" content="2014-02-08T19:57:12Z" />
<meta name="dcterms:modified" content="2014-02-08T19:57:12Z" />
<meta name="dc:format" content="application/pdf; version=1.3" />
<meta name="pdf:docinfo:creator_tool" content="PowerPoint" />
<meta name="access_permission:fill_in_form" content="true" />
<meta name="pdf:docinfo:keywords" content="" />
<meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" />
<meta name="pdf:encrypted" content="false" />
<meta name="dc:title" content="Presentation1" />
<meta name="cp:subject" content="" />
<meta name="pdf:docinfo:subject" content="" />
<meta name="pdf:hasMarkedContent" content="false" />
<meta name="Content-Type" content="application/pdf" />
<meta name="pdf:docinfo:creator" content="grantingersoll" />
<meta name="dc:subject" content="" />
<meta name="dc:subject" content="" />
<meta name="dc:subject" content="" />
<meta name="dc:subject" content="" />
<meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" />
<meta name="access_permission:extract_for_accessibility" content="true" />
<meta name="access_permission:assemble_document" content="true" />
<meta name="xmpTPg:NPages" content="1" />
<meta name="pdf:hasXMP" content="false" />
<meta name="access_permission:extract_content" content="true" />
<meta name="access_permission:can_print" content="true" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.pdf.PDFParser" />
<meta name="meta:keyword" content="" />
<meta name="access_permission:can_modify" content="true" />
<meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz PDFContext" />
<meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" />
<title>Presentation1</title>
</head>
<body><div class="page"><p />
<img src="embedded:image0.png" alt="image0.png" /></div>
</body></html><html xmlns=http://www.w3.org/1999/xhtml>
<head>
<meta name="Transparency Alpha" content="none" />
<meta name="tiff:ImageLength" content="261" />
<meta name="Compression CompressionTypeName" content="deflate" />
<meta name="Data BitsPerSample" content="8 8 8" />
<meta name="Data PlanarConfiguration" content="PixelInterleaved" />
<meta name="Dimension VerticalPixelSize" content="0.35273367" />
<meta name="IHDR" content="width=934, height=261, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none" />
<meta name="embeddedResourceType" content="INLINE" />
<meta name="Chroma ColorSpaceType" content="RGB" />
<meta name="tiff:BitsPerSample" content="8 8 8" />
<meta name="Content-Type" content="image/png" />
<meta name="height" content="261" />
<meta name="pHYs" content="pixelsPerUnitXAxis=2835, pixelsPerUnitYAxis=2835, unitSpecifier=meter" />
<meta name="Dimension PixelAspectRatio" content="1.0" />
<meta name="resourceName" content="image0.png" />
<meta name="pdf:hasXMP" content="false" />
<meta name="Compression NumProgressiveScans" content="1" />
<meta name="Content-Type-Parser-Override" content="image/ocr-png" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.image.ImageParser" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.ocr.TesseractOCRParser" />
<meta name="Dimension HorizontalPixelSize" content="0.35273367" />
<meta name="Chroma BlackIsZero" content="true" />
<meta name="Compression Lossless" content="true" />
<meta name="X-TIKA:embedded_depth" content="1" />
<meta name="width" content="934" />
<meta name="Dimension ImageOrientation" content="Normal" />
<meta name="X-TIKA:embedded_resource_path" content="/image0.png" />
<meta name="tiff:ImageWidth" content="934" />
<meta name="Chroma NumChannels" content="3" />
<meta name="Data SampleFormat" content="UnsignedIntegral" />
<title></title>
</head>
<body /></html>
[INFO]
[ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT:
[INFO]
[INFO] Apache Tika parent ................................. SUCCESS [  2.952 s]
[INFO] Apache Tika core ................................... SUCCESS [ 37.037 s]
[INFO] tika-parsers ....................................... SUCCESS [  0.225 s]
[INFO] Apache Tika classic parser modules and package ..... SUCCESS [  0.500 s]
[INFO] Apache Tika classic parser modules ................. SUCCESS [  0.261 s]
[INFO] tika-parser-html-commons ........................... SUCCESS [  1.773 s]
[INFO] tika-parser-digest-commons ......................... SUCCESS [  0.998 s]
[INFO] tika-parser-mail-commons ........................... SUCCESS [  1.627 s]
[INFO] tika-parser-xmp-commons ............................ SUCCESS [  2.008 s]
[INFO] tika-parser-zip-commons ............................ SUCCESS [  2.405 s]
[INFO] tika-parser-image-module ........................... SUCCESS [  4.140 s]
[INFO] tika-parser-ocr-module ............................. SUCCESS [ 16.227 s]
[INFO] tika-parser-audiovideo-module ...................... SUCCESS [  2.998 s]
[INFO] tika-parser-text-module ............................ SUCCESS [  3.578 s]
[INFO] tika-parser-code-module ............................ SUCCESS [  3.739 s]
[INFO] tika-parser-html-module ............................ SUCCESS [  3.842 s]
[INFO] tika-parser-font-module ............................ SUCCESS [  2.291 s]
[INFO] tika-parser-xml-module ............................. SUCCESS [  2.637 s]
[INFO] tika-parser-microsoft-module ....................... SUCCESS [ 46.829 s]
[INFO] tika-parser-pkg-module ............................. SUCCESS [  3.862 s]
[INFO] tika-parser-pdf-module ............................. SUCCESS [ 15.538 s]
[INFO] tika-parser-apple-module ........................... SUCCESS [  3.497 s]
[INFO] tika-parser-cad-module ............................. SUCCESS [  2.195 s]
[INFO] tika-parser-mail-module ............................ SUCCESS [  9.893 s]
[INFO] tika-parser-miscoffice-module ...................... SUCCESS [  8.474 s]
[INFO] tika-parser-news-module ............................ SUCCESS [  1.982 s]
[INFO] tika-parser-crypto-module .......................... SUCCESS [  2.624 s]
[INFO] Apache Tika classic parser package ................. FAILURE [02:15 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:21 min
[INFO] Finished at: 2021-04-15T10:00:49-04:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project tika-parsers-classic-package: There are test failures.
[ERROR]
[ERROR] Please refer to C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-package\target\surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :tika-parsers-classic-package

c:\tika>





Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<http://www.torch.ai/>



RE: Test failure

Posted by Peter Kronenberg <pe...@torch.ai>.
It wasn't a totally clean pull,  but I didn't have anything else there.  I had stuff in other branches, but when it failed, I tried it on a clean main.    Here's the command line I used:   mvn clean install  -pl :tika-parsers-classic-package -am

Since my pull request was processed with no problem, clearly, it's not happening on that system. 

Peter Kronenberg  |  SENIOR AI ANALYTIC ENGINEER 
C: 703.887.5623

4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI


-----Original Message-----
From: Tim Allison <ta...@apache.org> 
Sent: Thursday, April 15, 2021 12:25 PM
To: <de...@tika.apache.org> <de...@tika.apache.org>
Subject: Re: Test failure

Hmmmm....I found a couple of other things that I fixed on Windows just now, but I'm not able to replicate it.  Are you getting that failure with a clean pull/clone?

On Thu, Apr 15, 2021 at 11:48 AM Tim Allison <ta...@apache.org> wrote:

> Thank you for sharing!
>
> Not able to replicate on linux...trying my Windows laptop.
>
> Unrelated...there's something really broken with the xhtml in that 
> there are two bodies.  I can replicate this on linux.  Will open an issue...
>
> On Thu, Apr 15, 2021 at 10:04 AM Peter Kronenberg < 
> peter.kronenberg@torch.ai> wrote:
>
>> We’re getting a test failure.  I don’t see any recent check-ins that 
>> would be causing this, so maybe it’s been there for awhile (I don’t 
>> always run the tests)
>>
>>
>>
>> [INFO] Results:
>>
>> [INFO]
>>
>> [ERROR] Failures:
>>
>> [ERROR]
>> TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContain
>> s:79 <span class="ocrx_word" id="word_1_1" not found in:
>>
>> <html 
>> xmlns=https://us-east-2.protection.sophos.com?d=w3.org&u=aHR0cDovL3d3
>> dy53My5vcmcvMTk5OS94aHRtbA==&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=SEh
>> WWFZLN1BHMjVlWXplUEZlVFBERFZQUFB0M05pUmlMK2J3cTdQdE1SQT0=&h=a3418ed64
>> 9234751bbaa0a259857d290>
>>
>> <head>
>>
>> <meta name="pdf:docinfo:custom:AAPL:Keywords" content="" />
>>
>> <meta name="pdf:PDFVersion" content="1.3" />
>>
>> <meta name="pdf:docinfo:title" content="Presentation1" />
>>
>> <meta name="xmp:CreatorTool" content="PowerPoint" />
>>
>> <meta name="pdf:hasXFA" content="false" />
>>
>> <meta name="access_permission:modify_annotations" content="true" />
>>
>> <meta name="access_permission:can_print_degraded" content="true" />
>>
>> <meta name="AAPL:Keywords" content="" />
>>
>> <meta name="dc:creator" content="grantingersoll" />
>>
>> <meta name="dcterms:created" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="dcterms:modified" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="dc:format" content="application/pdf; version=1.3" />
>>
>> <meta name="pdf:docinfo:creator_tool" content="PowerPoint" />
>>
>> <meta name="access_permission:fill_in_form" content="true" />
>>
>> <meta name="pdf:docinfo:keywords" content="" />
>>
>> <meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="pdf:encrypted" content="false" />
>>
>> <meta name="dc:title" content="Presentation1" />
>>
>> <meta name="cp:subject" content="" />
>>
>> <meta name="pdf:docinfo:subject" content="" />
>>
>> <meta name="pdf:hasMarkedContent" content="false" />
>>
>> <meta name="Content-Type" content="application/pdf" />
>>
>> <meta name="pdf:docinfo:creator" content="grantingersoll" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" 
>> />
>>
>> <meta name="access_permission:extract_for_accessibility" 
>> content="true" />
>>
>> <meta name="access_permission:assemble_document" content="true" />
>>
>> <meta name="xmpTPg:NPages" content="1" />
>>
>> <meta name="pdf:hasXMP" content="false" />
>>
>> <meta name="access_permission:extract_content" content="true" />
>>
>> <meta name="access_permission:can_print" content="true" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.DefaultParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.pdf.PDFParser" />
>>
>> <meta name="meta:keyword" content="" />
>>
>> <meta name="access_permission:can_modify" content="true" />
>>
>> <meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz 
>> PDFContext" />
>>
>> <meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" />
>>
>> <title>Presentation1</title>
>>
>> </head>
>>
>> <body><div class="page"><p />
>>
>> <img src="embedded:image0.png" alt="image0.png" /></div>
>>
>> </body></html><html 
>> xmlns=https://us-east-2.protection.sophos.com?d=w3.org&u=aHR0cDovL3d3
>> dy53My5vcmcvMTk5OS94aHRtbA==&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=SEh
>> WWFZLN1BHMjVlWXplUEZlVFBERFZQUFB0M05pUmlMK2J3cTdQdE1SQT0=&h=a3418ed64
>> 9234751bbaa0a259857d290>
>>
>> <head>
>>
>> <meta name="Transparency Alpha" content="none" />
>>
>> <meta name="tiff:ImageLength" content="261" />
>>
>> <meta name="Compression CompressionTypeName" content="deflate" />
>>
>> <meta name="Data BitsPerSample" content="8 8 8" />
>>
>> <meta name="Data PlanarConfiguration" content="PixelInterleaved" />
>>
>> <meta name="Dimension VerticalPixelSize" content="0.35273367" />
>>
>> <meta name="IHDR" content="width=934, height=261, bitDepth=8, 
>> colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, 
>> interlaceMethod=none" />
>>
>> <meta name="embeddedResourceType" content="INLINE" />
>>
>> <meta name="Chroma ColorSpaceType" content="RGB" />
>>
>> <meta name="tiff:BitsPerSample" content="8 8 8" />
>>
>> <meta name="Content-Type" content="image/png" />
>>
>> <meta name="height" content="261" />
>>
>> <meta name="pHYs" content="pixelsPerUnitXAxis=2835, 
>> pixelsPerUnitYAxis=2835, unitSpecifier=meter" />
>>
>> <meta name="Dimension PixelAspectRatio" content="1.0" />
>>
>> <meta name="resourceName" content="image0.png" />
>>
>> <meta name="pdf:hasXMP" content="false" />
>>
>> <meta name="Compression NumProgressiveScans" content="1" />
>>
>> <meta name="Content-Type-Parser-Override" content="image/ocr-png" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.DefaultParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.image.ImageParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.ocr.TesseractOCRParser" />
>>
>> <meta name="Dimension HorizontalPixelSize" content="0.35273367" />
>>
>> <meta name="Chroma BlackIsZero" content="true" />
>>
>> <meta name="Compression Lossless" content="true" />
>>
>> <meta name="X-TIKA:embedded_depth" content="1" />
>>
>> <meta name="width" content="934" />
>>
>> <meta name="Dimension ImageOrientation" content="Normal" />
>>
>> <meta name="X-TIKA:embedded_resource_path" content="/image0.png" />
>>
>> <meta name="tiff:ImageWidth" content="934" />
>>
>> <meta name="Chroma NumChannels" content="3" />
>>
>> <meta name="Data SampleFormat" content="UnsignedIntegral" />
>>
>> <title></title>
>>
>> </head>
>>
>> <body /></html>
>>
>> [INFO]
>>
>> [ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10
>>
>> [INFO]
>>
>> [INFO]
>> ---------------------------------------------------------------------
>> ---
>>
>> [INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT:
>>
>> [INFO]
>>
>> [INFO] Apache Tika parent ................................. SUCCESS [
>> 2.952 s]
>>
>> [INFO] Apache Tika core ................................... SUCCESS [
>> 37.037 s]
>>
>> [INFO] tika-parsers ....................................... SUCCESS [
>> 0.225 s]
>>
>> [INFO] Apache Tika classic parser modules and package ..... SUCCESS [
>> 0.500 s]
>>
>> [INFO] Apache Tika classic parser modules ................. SUCCESS [
>> 0.261 s]
>>
>> [INFO] tika-parser-html-commons ........................... SUCCESS [
>> 1.773 s]
>>
>> [INFO] tika-parser-digest-commons ......................... SUCCESS [
>> 0.998 s]
>>
>> [INFO] tika-parser-mail-commons ........................... SUCCESS [
>> 1.627 s]
>>
>> [INFO] tika-parser-xmp-commons ............................ SUCCESS [
>> 2.008 s]
>>
>> [INFO] tika-parser-zip-commons ............................ SUCCESS [
>> 2.405 s]
>>
>> [INFO] tika-parser-image-module ........................... SUCCESS [
>> 4.140 s]
>>
>> [INFO] tika-parser-ocr-module ............................. SUCCESS [
>> 16.227 s]
>>
>> [INFO] tika-parser-audiovideo-module ...................... SUCCESS [
>> 2.998 s]
>>
>> [INFO] tika-parser-text-module ............................ SUCCESS [
>> 3.578 s]
>>
>> [INFO] tika-parser-code-module ............................ SUCCESS [
>> 3.739 s]
>>
>> [INFO] tika-parser-html-module ............................ SUCCESS [
>> 3.842 s]
>>
>> [INFO] tika-parser-font-module ............................ SUCCESS [
>> 2.291 s]
>>
>> [INFO] tika-parser-xml-module ............................. SUCCESS [
>> 2.637 s]
>>
>> [INFO] tika-parser-microsoft-module ....................... SUCCESS [
>> 46.829 s]
>>
>> [INFO] tika-parser-pkg-module ............................. SUCCESS [
>> 3.862 s]
>>
>> [INFO] tika-parser-pdf-module ............................. SUCCESS [
>> 15.538 s]
>>
>> [INFO] tika-parser-apple-module ........................... SUCCESS [
>> 3.497 s]
>>
>> [INFO] tika-parser-cad-module ............................. SUCCESS [
>> 2.195 s]
>>
>> [INFO] tika-parser-mail-module ............................ SUCCESS [
>> 9.893 s]
>>
>> [INFO] tika-parser-miscoffice-module ...................... SUCCESS [
>> 8.474 s]
>>
>> [INFO] tika-parser-news-module ............................ SUCCESS [
>> 1.982 s]
>>
>> [INFO] tika-parser-crypto-module .......................... SUCCESS [
>> 2.624 s]
>>
>> [INFO] Apache Tika classic parser package ................. FAILURE
>> [02:15 min]
>>
>> [INFO]
>> ---------------------------------------------------------------------
>> ---
>>
>> [INFO] BUILD FAILURE
>>
>> [INFO]
>> ---------------------------------------------------------------------
>> ---
>>
>> [INFO] Total time:  05:21 min
>>
>> [INFO] Finished at: 2021-04-15T10:00:49-04:00
>>
>> [INFO]
>> ---------------------------------------------------------------------
>> ---
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test 
>> (default-test) on project tika-parsers-classic-package: There are test failures.
>>
>> [ERROR]
>>
>> [ERROR] Please refer to
>> C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-packag
>> e\target\surefire-reports
>> for the individual test results.
>>
>> [ERROR] Please refer to dump files (if any exist) [date].dump, 
>> [date]-jvmRun[N].dump and [date].dumpstream.
>>
>> [ERROR] -> [Help 1]
>>
>> [ERROR]
>>
>> [ERROR] To see the full stack trace of the errors, re-run Maven with 
>> the -e switch.
>>
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>
>> [ERROR]
>>
>> [ERROR] For more information about the errors and possible solutions, 
>> please read the following articles:
>>
>> [ERROR] [Help 1]
>> https://us-east-2.protection.sophos.com?d=apache.org&u=aHR0cDovL2N3aW
>> tpLmFwYWNoZS5vcmcvY29uZmx1ZW5jZS9kaXNwbGF5L01BVkVOL01vam9GYWlsdXJlRXh
>> jZXB0aW9u&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=T3hmVU5xL1pCNjZZQVplWW
>> tiWTA4TkVyRzJhZXNnLysvVEVCa3lCc05YZz0=&h=a3418ed649234751bbaa0a259857
>> d290
>>
>> [ERROR]
>>
>> [ERROR] After correcting the problems, you can resume the build with 
>> the command
>>
>> [ERROR]   mvn <args> -rf :tika-parsers-classic-package
>>
>>
>>
>> c:\tika>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>>
>> *C: 703.887.5623*
>>
>> [image: Torch AI] 
>> <https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy5
>> 0b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2
>> tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=a3418ed649234751bbaa0a25
>> 9857d290>
>>
>> 4303 W. 119th St., Leawood, KS 66209
>> https://us-east-2.protection.sophos.com?d=torch.ai&u=d3d3LlRPUkNILkFJ
>> &i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=Nm1Pd1NUek94MUNheHppZ0RpaUZ4RVl
>> YemhyTlhSa1M3Ly9FUFhXeDc5dz0=&h=a3418ed649234751bbaa0a259857d290 
>> <https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy5
>> 0b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2
>> tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=a3418ed649234751bbaa0a25
>> 9857d290>
>>
>>
>>
>>
>>
>

Re: Test failure

Posted by Tim Allison <ta...@apache.org>.
Hmmmm....I found a couple of other things that I fixed on Windows just now,
but I'm not able to replicate it.  Are you getting that failure with a
clean pull/clone?

On Thu, Apr 15, 2021 at 11:48 AM Tim Allison <ta...@apache.org> wrote:

> Thank you for sharing!
>
> Not able to replicate on linux...trying my Windows laptop.
>
> Unrelated...there's something really broken with the xhtml in that there
> are two bodies.  I can replicate this on linux.  Will open an issue...
>
> On Thu, Apr 15, 2021 at 10:04 AM Peter Kronenberg <
> peter.kronenberg@torch.ai> wrote:
>
>> We’re getting a test failure.  I don’t see any recent check-ins that
>> would be causing this, so maybe it’s been there for awhile (I don’t always
>> run the tests)
>>
>>
>>
>> [INFO] Results:
>>
>> [INFO]
>>
>> [ERROR] Failures:
>>
>> [ERROR]
>> TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContains:79
>> <span class="ocrx_word" id="word_1_1" not found in:
>>
>> <html xmlns=http://www.w3.org/1999/xhtml>
>>
>> <head>
>>
>> <meta name="pdf:docinfo:custom:AAPL:Keywords" content="" />
>>
>> <meta name="pdf:PDFVersion" content="1.3" />
>>
>> <meta name="pdf:docinfo:title" content="Presentation1" />
>>
>> <meta name="xmp:CreatorTool" content="PowerPoint" />
>>
>> <meta name="pdf:hasXFA" content="false" />
>>
>> <meta name="access_permission:modify_annotations" content="true" />
>>
>> <meta name="access_permission:can_print_degraded" content="true" />
>>
>> <meta name="AAPL:Keywords" content="" />
>>
>> <meta name="dc:creator" content="grantingersoll" />
>>
>> <meta name="dcterms:created" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="dcterms:modified" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="dc:format" content="application/pdf; version=1.3" />
>>
>> <meta name="pdf:docinfo:creator_tool" content="PowerPoint" />
>>
>> <meta name="access_permission:fill_in_form" content="true" />
>>
>> <meta name="pdf:docinfo:keywords" content="" />
>>
>> <meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" />
>>
>> <meta name="pdf:encrypted" content="false" />
>>
>> <meta name="dc:title" content="Presentation1" />
>>
>> <meta name="cp:subject" content="" />
>>
>> <meta name="pdf:docinfo:subject" content="" />
>>
>> <meta name="pdf:hasMarkedContent" content="false" />
>>
>> <meta name="Content-Type" content="application/pdf" />
>>
>> <meta name="pdf:docinfo:creator" content="grantingersoll" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="dc:subject" content="" />
>>
>> <meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" />
>>
>> <meta name="access_permission:extract_for_accessibility" content="true" />
>>
>> <meta name="access_permission:assemble_document" content="true" />
>>
>> <meta name="xmpTPg:NPages" content="1" />
>>
>> <meta name="pdf:hasXMP" content="false" />
>>
>> <meta name="access_permission:extract_content" content="true" />
>>
>> <meta name="access_permission:can_print" content="true" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.DefaultParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.pdf.PDFParser" />
>>
>> <meta name="meta:keyword" content="" />
>>
>> <meta name="access_permission:can_modify" content="true" />
>>
>> <meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz
>> PDFContext" />
>>
>> <meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" />
>>
>> <title>Presentation1</title>
>>
>> </head>
>>
>> <body><div class="page"><p />
>>
>> <img src="embedded:image0.png" alt="image0.png" /></div>
>>
>> </body></html><html xmlns=http://www.w3.org/1999/xhtml>
>>
>> <head>
>>
>> <meta name="Transparency Alpha" content="none" />
>>
>> <meta name="tiff:ImageLength" content="261" />
>>
>> <meta name="Compression CompressionTypeName" content="deflate" />
>>
>> <meta name="Data BitsPerSample" content="8 8 8" />
>>
>> <meta name="Data PlanarConfiguration" content="PixelInterleaved" />
>>
>> <meta name="Dimension VerticalPixelSize" content="0.35273367" />
>>
>> <meta name="IHDR" content="width=934, height=261, bitDepth=8,
>> colorType=RGB, compressionMethod=deflate, filterMethod=adaptive,
>> interlaceMethod=none" />
>>
>> <meta name="embeddedResourceType" content="INLINE" />
>>
>> <meta name="Chroma ColorSpaceType" content="RGB" />
>>
>> <meta name="tiff:BitsPerSample" content="8 8 8" />
>>
>> <meta name="Content-Type" content="image/png" />
>>
>> <meta name="height" content="261" />
>>
>> <meta name="pHYs" content="pixelsPerUnitXAxis=2835,
>> pixelsPerUnitYAxis=2835, unitSpecifier=meter" />
>>
>> <meta name="Dimension PixelAspectRatio" content="1.0" />
>>
>> <meta name="resourceName" content="image0.png" />
>>
>> <meta name="pdf:hasXMP" content="false" />
>>
>> <meta name="Compression NumProgressiveScans" content="1" />
>>
>> <meta name="Content-Type-Parser-Override" content="image/ocr-png" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.DefaultParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.image.ImageParser" />
>>
>> <meta name="X-TIKA:Parsed-By"
>> content="org.apache.tika.parser.ocr.TesseractOCRParser" />
>>
>> <meta name="Dimension HorizontalPixelSize" content="0.35273367" />
>>
>> <meta name="Chroma BlackIsZero" content="true" />
>>
>> <meta name="Compression Lossless" content="true" />
>>
>> <meta name="X-TIKA:embedded_depth" content="1" />
>>
>> <meta name="width" content="934" />
>>
>> <meta name="Dimension ImageOrientation" content="Normal" />
>>
>> <meta name="X-TIKA:embedded_resource_path" content="/image0.png" />
>>
>> <meta name="tiff:ImageWidth" content="934" />
>>
>> <meta name="Chroma NumChannels" content="3" />
>>
>> <meta name="Data SampleFormat" content="UnsignedIntegral" />
>>
>> <title></title>
>>
>> </head>
>>
>> <body /></html>
>>
>> [INFO]
>>
>> [ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10
>>
>> [INFO]
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT:
>>
>> [INFO]
>>
>> [INFO] Apache Tika parent ................................. SUCCESS [
>> 2.952 s]
>>
>> [INFO] Apache Tika core ................................... SUCCESS [
>> 37.037 s]
>>
>> [INFO] tika-parsers ....................................... SUCCESS [
>> 0.225 s]
>>
>> [INFO] Apache Tika classic parser modules and package ..... SUCCESS [
>> 0.500 s]
>>
>> [INFO] Apache Tika classic parser modules ................. SUCCESS [
>> 0.261 s]
>>
>> [INFO] tika-parser-html-commons ........................... SUCCESS [
>> 1.773 s]
>>
>> [INFO] tika-parser-digest-commons ......................... SUCCESS [
>> 0.998 s]
>>
>> [INFO] tika-parser-mail-commons ........................... SUCCESS [
>> 1.627 s]
>>
>> [INFO] tika-parser-xmp-commons ............................ SUCCESS [
>> 2.008 s]
>>
>> [INFO] tika-parser-zip-commons ............................ SUCCESS [
>> 2.405 s]
>>
>> [INFO] tika-parser-image-module ........................... SUCCESS [
>> 4.140 s]
>>
>> [INFO] tika-parser-ocr-module ............................. SUCCESS [
>> 16.227 s]
>>
>> [INFO] tika-parser-audiovideo-module ...................... SUCCESS [
>> 2.998 s]
>>
>> [INFO] tika-parser-text-module ............................ SUCCESS [
>> 3.578 s]
>>
>> [INFO] tika-parser-code-module ............................ SUCCESS [
>> 3.739 s]
>>
>> [INFO] tika-parser-html-module ............................ SUCCESS [
>> 3.842 s]
>>
>> [INFO] tika-parser-font-module ............................ SUCCESS [
>> 2.291 s]
>>
>> [INFO] tika-parser-xml-module ............................. SUCCESS [
>> 2.637 s]
>>
>> [INFO] tika-parser-microsoft-module ....................... SUCCESS [
>> 46.829 s]
>>
>> [INFO] tika-parser-pkg-module ............................. SUCCESS [
>> 3.862 s]
>>
>> [INFO] tika-parser-pdf-module ............................. SUCCESS [
>> 15.538 s]
>>
>> [INFO] tika-parser-apple-module ........................... SUCCESS [
>> 3.497 s]
>>
>> [INFO] tika-parser-cad-module ............................. SUCCESS [
>> 2.195 s]
>>
>> [INFO] tika-parser-mail-module ............................ SUCCESS [
>> 9.893 s]
>>
>> [INFO] tika-parser-miscoffice-module ...................... SUCCESS [
>> 8.474 s]
>>
>> [INFO] tika-parser-news-module ............................ SUCCESS [
>> 1.982 s]
>>
>> [INFO] tika-parser-crypto-module .......................... SUCCESS [
>> 2.624 s]
>>
>> [INFO] Apache Tika classic parser package ................. FAILURE
>> [02:15 min]
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] BUILD FAILURE
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] Total time:  05:21 min
>>
>> [INFO] Finished at: 2021-04-15T10:00:49-04:00
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test)
>> on project tika-parsers-classic-package: There are test failures.
>>
>> [ERROR]
>>
>> [ERROR] Please refer to
>> C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-package\target\surefire-reports
>> for the individual test results.
>>
>> [ERROR] Please refer to dump files (if any exist) [date].dump,
>> [date]-jvmRun[N].dump and [date].dumpstream.
>>
>> [ERROR] -> [Help 1]
>>
>> [ERROR]
>>
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>>
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>
>> [ERROR]
>>
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>>
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>>
>> [ERROR]
>>
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>>
>> [ERROR]   mvn <args> -rf :tika-parsers-classic-package
>>
>>
>>
>> c:\tika>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>>
>> *C: 703.887.5623*
>>
>> [image: Torch AI] <http://www.torch.ai/>
>>
>> 4303 W. 119th St., Leawood, KS 66209
>> WWW.TORCH.AI <http://www.torch.ai/>
>>
>>
>>
>>
>>
>

Re: Test failure

Posted by Tim Allison <ta...@apache.org>.
Thank you for sharing!

Not able to replicate on linux...trying my Windows laptop.

Unrelated...there's something really broken with the xhtml in that there
are two bodies.  I can replicate this on linux.  Will open an issue...

On Thu, Apr 15, 2021 at 10:04 AM Peter Kronenberg <pe...@torch.ai>
wrote:

> We’re getting a test failure.  I don’t see any recent check-ins that would
> be causing this, so maybe it’s been there for awhile (I don’t always run
> the tests)
>
>
>
> [INFO] Results:
>
> [INFO]
>
> [ERROR] Failures:
>
> [ERROR]
> TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContains:79
> <span class="ocrx_word" id="word_1_1" not found in:
>
> <html xmlns=http://www.w3.org/1999/xhtml>
>
> <head>
>
> <meta name="pdf:docinfo:custom:AAPL:Keywords" content="" />
>
> <meta name="pdf:PDFVersion" content="1.3" />
>
> <meta name="pdf:docinfo:title" content="Presentation1" />
>
> <meta name="xmp:CreatorTool" content="PowerPoint" />
>
> <meta name="pdf:hasXFA" content="false" />
>
> <meta name="access_permission:modify_annotations" content="true" />
>
> <meta name="access_permission:can_print_degraded" content="true" />
>
> <meta name="AAPL:Keywords" content="" />
>
> <meta name="dc:creator" content="grantingersoll" />
>
> <meta name="dcterms:created" content="2014-02-08T19:57:12Z" />
>
> <meta name="dcterms:modified" content="2014-02-08T19:57:12Z" />
>
> <meta name="dc:format" content="application/pdf; version=1.3" />
>
> <meta name="pdf:docinfo:creator_tool" content="PowerPoint" />
>
> <meta name="access_permission:fill_in_form" content="true" />
>
> <meta name="pdf:docinfo:keywords" content="" />
>
> <meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" />
>
> <meta name="pdf:encrypted" content="false" />
>
> <meta name="dc:title" content="Presentation1" />
>
> <meta name="cp:subject" content="" />
>
> <meta name="pdf:docinfo:subject" content="" />
>
> <meta name="pdf:hasMarkedContent" content="false" />
>
> <meta name="Content-Type" content="application/pdf" />
>
> <meta name="pdf:docinfo:creator" content="grantingersoll" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" />
>
> <meta name="access_permission:extract_for_accessibility" content="true" />
>
> <meta name="access_permission:assemble_document" content="true" />
>
> <meta name="xmpTPg:NPages" content="1" />
>
> <meta name="pdf:hasXMP" content="false" />
>
> <meta name="access_permission:extract_content" content="true" />
>
> <meta name="access_permission:can_print" content="true" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.pdf.PDFParser" />
>
> <meta name="meta:keyword" content="" />
>
> <meta name="access_permission:can_modify" content="true" />
>
> <meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz
> PDFContext" />
>
> <meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" />
>
> <title>Presentation1</title>
>
> </head>
>
> <body><div class="page"><p />
>
> <img src="embedded:image0.png" alt="image0.png" /></div>
>
> </body></html><html xmlns=http://www.w3.org/1999/xhtml>
>
> <head>
>
> <meta name="Transparency Alpha" content="none" />
>
> <meta name="tiff:ImageLength" content="261" />
>
> <meta name="Compression CompressionTypeName" content="deflate" />
>
> <meta name="Data BitsPerSample" content="8 8 8" />
>
> <meta name="Data PlanarConfiguration" content="PixelInterleaved" />
>
> <meta name="Dimension VerticalPixelSize" content="0.35273367" />
>
> <meta name="IHDR" content="width=934, height=261, bitDepth=8,
> colorType=RGB, compressionMethod=deflate, filterMethod=adaptive,
> interlaceMethod=none" />
>
> <meta name="embeddedResourceType" content="INLINE" />
>
> <meta name="Chroma ColorSpaceType" content="RGB" />
>
> <meta name="tiff:BitsPerSample" content="8 8 8" />
>
> <meta name="Content-Type" content="image/png" />
>
> <meta name="height" content="261" />
>
> <meta name="pHYs" content="pixelsPerUnitXAxis=2835,
> pixelsPerUnitYAxis=2835, unitSpecifier=meter" />
>
> <meta name="Dimension PixelAspectRatio" content="1.0" />
>
> <meta name="resourceName" content="image0.png" />
>
> <meta name="pdf:hasXMP" content="false" />
>
> <meta name="Compression NumProgressiveScans" content="1" />
>
> <meta name="Content-Type-Parser-Override" content="image/ocr-png" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.image.ImageParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.ocr.TesseractOCRParser" />
>
> <meta name="Dimension HorizontalPixelSize" content="0.35273367" />
>
> <meta name="Chroma BlackIsZero" content="true" />
>
> <meta name="Compression Lossless" content="true" />
>
> <meta name="X-TIKA:embedded_depth" content="1" />
>
> <meta name="width" content="934" />
>
> <meta name="Dimension ImageOrientation" content="Normal" />
>
> <meta name="X-TIKA:embedded_resource_path" content="/image0.png" />
>
> <meta name="tiff:ImageWidth" content="934" />
>
> <meta name="Chroma NumChannels" content="3" />
>
> <meta name="Data SampleFormat" content="UnsignedIntegral" />
>
> <title></title>
>
> </head>
>
> <body /></html>
>
> [INFO]
>
> [ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10
>
> [INFO]
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT:
>
> [INFO]
>
> [INFO] Apache Tika parent ................................. SUCCESS [
> 2.952 s]
>
> [INFO] Apache Tika core ................................... SUCCESS [
> 37.037 s]
>
> [INFO] tika-parsers ....................................... SUCCESS [
> 0.225 s]
>
> [INFO] Apache Tika classic parser modules and package ..... SUCCESS [
> 0.500 s]
>
> [INFO] Apache Tika classic parser modules ................. SUCCESS [
> 0.261 s]
>
> [INFO] tika-parser-html-commons ........................... SUCCESS [
> 1.773 s]
>
> [INFO] tika-parser-digest-commons ......................... SUCCESS [
> 0.998 s]
>
> [INFO] tika-parser-mail-commons ........................... SUCCESS [
> 1.627 s]
>
> [INFO] tika-parser-xmp-commons ............................ SUCCESS [
> 2.008 s]
>
> [INFO] tika-parser-zip-commons ............................ SUCCESS [
> 2.405 s]
>
> [INFO] tika-parser-image-module ........................... SUCCESS [
> 4.140 s]
>
> [INFO] tika-parser-ocr-module ............................. SUCCESS [
> 16.227 s]
>
> [INFO] tika-parser-audiovideo-module ...................... SUCCESS [
> 2.998 s]
>
> [INFO] tika-parser-text-module ............................ SUCCESS [
> 3.578 s]
>
> [INFO] tika-parser-code-module ............................ SUCCESS [
> 3.739 s]
>
> [INFO] tika-parser-html-module ............................ SUCCESS [
> 3.842 s]
>
> [INFO] tika-parser-font-module ............................ SUCCESS [
> 2.291 s]
>
> [INFO] tika-parser-xml-module ............................. SUCCESS [
> 2.637 s]
>
> [INFO] tika-parser-microsoft-module ....................... SUCCESS [
> 46.829 s]
>
> [INFO] tika-parser-pkg-module ............................. SUCCESS [
> 3.862 s]
>
> [INFO] tika-parser-pdf-module ............................. SUCCESS [
> 15.538 s]
>
> [INFO] tika-parser-apple-module ........................... SUCCESS [
> 3.497 s]
>
> [INFO] tika-parser-cad-module ............................. SUCCESS [
> 2.195 s]
>
> [INFO] tika-parser-mail-module ............................ SUCCESS [
> 9.893 s]
>
> [INFO] tika-parser-miscoffice-module ...................... SUCCESS [
> 8.474 s]
>
> [INFO] tika-parser-news-module ............................ SUCCESS [
> 1.982 s]
>
> [INFO] tika-parser-crypto-module .......................... SUCCESS [
> 2.624 s]
>
> [INFO] Apache Tika classic parser package ................. FAILURE [02:15
> min]
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] BUILD FAILURE
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Total time:  05:21 min
>
> [INFO] Finished at: 2021-04-15T10:00:49-04:00
>
> [INFO]
> ------------------------------------------------------------------------
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test)
> on project tika-parsers-classic-package: There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to
> C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-package\target\surefire-reports
> for the individual test results.
>
> [ERROR] Please refer to dump files (if any exist) [date].dump,
> [date]-jvmRun[N].dump and [date].dumpstream.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn <args> -rf :tika-parsers-classic-package
>
>
>
> c:\tika>
>
>
>
>
>
>
>
>
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 4303 W. 119th St., Leawood, KS 66209
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>