You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/04/06 19:08:13 UTC

[jira] [Updated] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

     [ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated TIKA-1421:
---------------------------------------
    Labels: memex  (was: )

> Tika-Parsers tests fail on CentOS6 if tesseract isn't installed
> ---------------------------------------------------------------
>
>                 Key: TIKA-1421
>                 URL: https://issues.apache.org/jira/browse/TIKA-1421
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>         Environment: CentOS6 AWS VM for DARPA Memex
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>            Priority: Blocker
>              Labels: memex
>             Fix For: 1.7
>
>
> While testing TIKA-93 on CentOS6, I ran into some test failing issues on a 1.7-trunk fresh install of tika in tika-parsers:
> {noformat}
> Running org.apache.tika.parser.chm.TestChmLzxcControlData
> Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
> Running org.apache.tika.parser.chm.TestChmBlockInfo
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
> Running org.apache.tika.parser.chm.TestChmItsfHeader
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec
> Running org.apache.tika.parser.txt.TXTParserTest
> Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
> Running org.apache.tika.parser.txt.CharsetDetectorTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 sec
> Running org.apache.tika.parser.image.xmp.JempboxExtractorTest
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec
> Running org.apache.tika.parser.image.PSDParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
> Running org.apache.tika.parser.image.ImageParserTest
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 sec
> Running org.apache.tika.parser.image.ImageMetadataExtractorTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.241 sec
> Running org.apache.tika.parser.image.MetadataFieldsTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
> Running org.apache.tika.parser.image.TiffParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 sec
> Running org.apache.tika.parser.font.FontParsersTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.192 sec
> Running org.apache.tika.parser.mp4.MP4ParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.07 sec
> Running org.apache.tika.parser.mp3.Mp3ParserTest
> Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.046 sec
> Running org.apache.tika.parser.mp3.MpegStreamTest
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
> Running org.apache.tika.parser.dwg.DWGParserTest
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 sec
> Running org.apache.tika.parser.pkg.GzipParserTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec
> Running org.apache.tika.parser.pkg.Seven7ParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.37 sec
> Running org.apache.tika.parser.pkg.TarParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec
> Running org.apache.tika.parser.pkg.Bzip2ParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.233 sec
> Running org.apache.tika.parser.pkg.ArParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
> Running org.apache.tika.parser.pkg.ZipParserTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.302 sec
> Running org.apache.tika.parser.video.FLVParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.026 sec
> Running org.apache.tika.parser.solidworks.SolidworksParserTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
> Running org.apache.tika.parser.ibooks.iBooksParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
> Running org.apache.tika.parser.ParsingReaderTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.018 sec
> Running org.apache.tika.parser.mail.RFC822ParserTest
> Tests run: 8, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 0.31 sec <<< FAILURE!
> Running org.apache.tika.parser.mbox.MboxParserTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.026 sec
> Running org.apache.tika.parser.mbox.OutlookPSTParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.094 sec
> Running org.apache.tika.parser.jpeg.JpegParserTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.153 sec
> Running org.apache.tika.parser.executable.ExecutableParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
> Running org.apache.tika.parser.rtf.RTFParserTest
> Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.221 sec
> Running org.apache.tika.parser.fork.ForkParserIntegrationTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.322 sec
> Running org.apache.tika.parser.envi.EnviHeaderParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
> Running org.apache.tika.parser.AutoDetectParserTest
> Tests run: 22, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.439 sec <<< FAILURE!
> Running org.apache.tika.parser.epub.EpubParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec
> Running org.apache.tika.parser.code.SourceCodeParserTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 sec
> Running org.apache.tika.parser.netcdf.NetCDFParserTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.125 sec
> Running org.apache.tika.parser.pdf.PDFParserTest
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
>  INFO [main] (PDFParser.java:248) - Document is encrypted
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
>  WARN [main] (FontManager.java:312) - Font not found: Times New Roman
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
>  WARN [main] (FontManager.java:312) - Font not found: Times New Roman
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116
> ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
>  INFO [main] (PDFParser.java:248) - Document is encrypted
>  INFO [main] (PDFParser.java:248) - Document is encrypted
> Tests run: 27, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 14.305 sec <<< FAILURE!
> Running org.apache.tika.parser.RecursiveParserWrapperTest
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.233 sec
> Running org.apache.tika.parser.prt.PRTParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec
> Running org.apache.tika.parser.html.HtmlParserTest
> Tests run: 38, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.162 sec
> Running org.apache.tika.parser.mat.MatParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.543 sec
> Running org.apache.tika.parser.feed.FeedParserTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec
> Running org.apache.tika.parser.ocr.TesseractOCRTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.007 sec
> Running org.apache.tika.parser.odf.ODFParserTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.098 sec
> Running org.apache.tika.parser.hdf.HDFParserTest
>  WARN [main] (H4header.java:392) - **dimension length=0 for TagVGroup= *refno=53 tag= VG (1965) Vgroup length=34 class= Dim0.0 name= Longitude using data 52
>  WARN [main] (H4header.java:392) - **dimension length=0 for TagVGroup= *refno=55 tag= VG (1965) Vgroup length=33 class= Dim0.0 name= Latitude using data 54
>  WARN [main] (H4header.java:392) - **dimension length=0 for TagVGroup= *refno=57 tag= VG (1965) Vgroup length=33 class= Dim0.0 name= fakeDim2 using data 56
>  WARN [main] (H4header.java:392) - **dimension length=0 for TagVGroup= *refno=59 tag= VG (1965) Vgroup length=33 class= Dim0.0 name= fakeDim3 using data 58
>  WARN [main] (H4header.java:844) - data tag missing vgroup= 70 Sea Surface Temperature
>  WARN [main] (H4header.java:844) - data tag missing vgroup= 73 Number of Observations per Bin
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.087 sec
> Running org.apache.tika.embedder.ExternalEmbedderTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec
> Running org.apache.tika.mime.MimeTypesTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
> Running org.apache.tika.mime.TestMimeTypes
> Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.163 sec
> Running org.apache.tika.mime.MimeTypeTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
> Running org.apache.tika.detect.TestContainerAwareDetector
> Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.277 sec
> Running org.apache.tika.TestParsers
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
>  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
>  WARN [main] (FontManager.java:312) - Font not found: Times New Roman
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.2 sec
> Results :
> Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): Exception thrown: TIKA-198: Illegal IOException from org.apache.tika.parser.ocr.TesseractOCRParser@2657d8a0
>   testInlineSelector(org.apache.tika.parser.pdf.PDFParserTest): expected:<2> but was:<0>
>   testInlineConfig(org.apache.tika.parser.pdf.PDFParserTest): expected:<2> but was:<0>
>   testEmbeddedFilesInChildren(org.apache.tika.parser.pdf.PDFParserTest): expected:<5> but was:<3>
> Tests in error: 
>   testUnusualFromAddress(org.apache.tika.parser.mail.RFC822ParserTest): TIKA-198: Illegal IOException from org.apache.tika.parser.ocr.TesseractOCRParser@1574a7af
>   testImages(org.apache.tika.parser.AutoDetectParserTest): TIKA-198: Illegal IOException from org.apache.tika.parser.ocr.TesseractOCRParser@107aac4a
> Tests run: 538, Failures: 4, Errors: 2, Skipped: 4
> {noformat}
> I tried installing Tesseract here:
> http://pkgs.org/centos-6/naulinux-school-x86_64/tesseract-3.01-2.el6.x86_64.rpm.html
> However, installing that causes the other tests to pass, but the Tesseract ones to fail (I think there is something wrong with the English config and am looking into it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)