You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/08/02 11:36:00 UTC
[jira] [Resolved] (TIKA-2438) Test failure at
OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102
[ https://issues.apache.org/jira/browse/TIKA-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-2438.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.17
There was a mismatch in the locale settings. The unit test was taking the locale from POI LocaleUtil, but the parsers were taking the Locale from Locale.getDefault(). I switched the parsers to take the locale from LocaleUtil if a user doesn't specify Locale via the ParseContext.
I also added a unit test to make sure that Locale.ITALIAN works. This failed before the modifications.
Many thanks to [~krichter] for raising this issue! Please let us know if this doesn't fix the build for you.
> Test failure at OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102
> -----------------------------------------------------------------------------------------------
>
> Key: TIKA-2438
> URL: https://issues.apache.org/jira/browse/TIKA-2438
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.16
> Reporter: Karl Richter
> Fix For: 1.17
>
>
> `mvn clean install` fails due to
> {code:java}
> Failed tests:
> OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102 1.23456789012345E+15</td> <td>1.23456789012345E+15 not found in:
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="date" content="2016-07-22T11:32:25Z" />
> <meta name="extended-properties:AppVersion" content="16.0300" />
> <meta name="dc:creator" content="Allison, Timothy B." />
> <meta name="extended-properties:Company" content="" />
> <meta name="dcterms:created" content="2016-06-29T16:29:27Z" />
> <meta name="Last-Modified" content="2016-07-22T11:32:25Z" />
> <meta name="dcterms:modified" content="2016-07-22T11:32:25Z" />
> <meta name="Last-Save-Date" content="2016-07-22T11:32:25Z" />
> <meta name="protected" content="false" />
> <meta name="meta:save-date" content="2016-07-22T11:32:25Z" />
> <meta name="Application-Name" content="Microsoft Excel" />
> <meta name="modified" content="2016-07-22T11:32:25Z" />
> <meta name="Content-Type" content="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
> <meta name="creator" content="Allison, Timothy B." />
> <meta name="meta:author" content="Allison, Timothy B." />
> <meta name="meta:creation-date" content="2016-06-29T16:29:27Z" />
> <meta name="extended-properties:Application" content="Microsoft Excel" />
> <meta name="meta:last-author" content="Allison, Timothy B." />
> <meta name="Creation-Date" content="2016-06-29T16:29:27Z" />
> <meta name="Last-Author" content="Allison, Timothy B." />
> <meta name="Application-Version" content="16.0300" />
> <meta name="Author" content="Allison, Timothy B." />
> <meta name="publisher" content="" />
> <meta name="dc:publisher" content="" />
> <title></title>
> </head>
> <body><div><h1>Sheet1</h1>
> <table><tbody><tr> <td>123456789012345</td> <td>123456789012346</td></tr>
> <tr> <td>1,23456789012345E+15</td> <td>1,23456789012345E+15</td></tr>
> <tr />
> </tbody></table>
> </div>
> </body></html>
> OOXMLParserTest.testXLSBVarious:1552->TikaTest.assertContains:102 <td>13.1211231321</td> not found in:
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="date" content="2017-03-10T14:58:49Z" />
> <meta name="extended-properties:AppVersion" content="16.0300" />
> <meta name="dc:creator" content="Allison, Timothy B." />
> <meta name="extended-properties:Company" content="" />
> <meta name="dcterms:created" content="2017-03-09T12:24:26Z" />
> <meta name="Last-Modified" content="2017-03-10T14:58:49Z" />
> <meta name="dcterms:modified" content="2017-03-10T14:58:49Z" />
> <meta name="Last-Save-Date" content="2017-03-10T14:58:49Z" />
> <meta name="protected" content="false" />
> <meta name="meta:save-date" content="2017-03-10T14:58:49Z" />
> <meta name="Application-Name" content="Microsoft Excel" />
> <meta name="modified" content="2017-03-10T14:58:49Z" />
> <meta name="Content-Type" content="application/vnd.ms-excel.sheet.binary.macroenabled.12" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
> <meta name="creator" content="Allison, Timothy B." />
> <meta name="meta:author" content="Allison, Timothy B." />
> <meta name="meta:creation-date" content="2017-03-09T12:24:26Z" />
> <meta name="extended-properties:Application" content="Microsoft Excel" />
> <meta name="meta:last-author" content="Allison, Timothy B." />
> <meta name="Creation-Date" content="2017-03-09T12:24:26Z" />
> <meta name="Last-Author" content="Allison, Timothy B." />
> <meta name="X-TIKA:origResourceName" content="C:\Users\tallison\Desktop\working\xlsb\" />
> <meta name="Application-Version" content="16.0300" />
> <meta name="Author" content="Allison, Timothy B." />
> <meta name="publisher" content="" />
> <meta name="dc:publisher" content="" />
> <title></title>
> </head>
> <body><div><h1>mySheet1</h1>
> <table><tbody><tr> <td>String</td> <td>This is a string</td></tr>
> <tr> <td>integer</td> <td>13</td></tr>
> <tr> <td>float</td> <td>13,1211231321</td></tr>
> <tr> <td>currency</td> <td>$ 3.03</td></tr>
> <tr> <td>percent</td> <td>20%</td></tr>
> <tr> <td>float 2</td> <td>13,12</td></tr>
> <tr> <td>long int</td> <td>123456789012345</td></tr>
> <tr> <td>longer int</td> <td>1,23456789012345E+15</td> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> test comment2
> </td></tr>
> <tr> <td>fraction</td> <td>1/4</td></tr>
> <tr> <td>date</td> <td>3/9/17</td></tr>
> <tr> <td>comment</td> <td>contents<br />
> Allison, Timothy B.: Allison, Timothy B.:
> test comment
> </td></tr>
> <tr> <td>hyperlink</td> <td>tika_link</td></tr>
> <tr> <td>formula</td> <td>4</td> <td>2</td></tr>
> <tr> <td>formulaErr</td> <td>ERROR</td></tr>
> <tr> <td>formulaFloat</td> <td>0,5</td> <td>March</td> <td>April</td></tr>
> <tr> <td>customFormat1</td> <td> 46/1963</td> <td>merchant1</td> <td>1</td> <td>3</td></tr>
> <tr> <td>customFormat2</td> <td> 3/128</td> <td>merchant2</td> <td>2</td> <td>4</td></tr>
> <tr> <td>text test</td></tr>
> <tr> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment1
> </td></tr>
> <tr> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment2
> </td></tr>
> <tr> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment3
> </td></tr>
> <tr> <td>the</td> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment4 (end of row)
> </td></tr>
> <tr> <td>the</td> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment5 between cells
> </td> <td>quick</td></tr>
> <tr> <td>comment6<br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment6 actually in cell
> </td></tr>
> <tr> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment7 end of file
> </td></tr>
> <tr> <td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment8 end of file</td></tr>
> </tbody></table>
> <p>OddLeftHeader OddCenterHeader OddRightHeader</p>
> <p>EvenLeftHeader EvenCenterHeader EvenRightHeader
> </p>
> <p>FirstPageLeftHeader FirstPageCenterHeader FirstPageRightHeader</p>
> <p>OddLeftFooter OddCenterFooter OddRightFooter</p>
> <p>EvenLeftFooter EvenCenterFooter EvenRightFooter</p>
> <p>FirstPageLeftFooter FirstPageCenterFooter FirstPageRightFooter</p>
> <p>test textbox
> </p>
> <a href="http://lucene.apache.org/">http://lucene.apache.org/</a><p>myChartTitle</p>
> <p />
> merchant1 March April 1 3 merchant2 March April 2 4 <p />
> <p />
> <p />
> <p />
> <p>test WordArt</p>
> <p>myChartTitle</p>
> <p />
> merchant1 March April 1 3 merchant2 March April 2 4 <p />
> <p />
> <p />
> <p />
> <p>myChartTitle</p>
> <p />
> merchant1 March April 1 3 merchant2 March April 2 4 <p />
> <p />
> <p />
> <p />
> <a href="http://tika.apache.org/">http://tika.apache.org/</a></div>
> <div class="package-entry" /><div class="package-entry" /><div class="package-entry" /></body></html>
> {code}
> experienced with 1.16-75-g4455a6f08
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)