You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/08/02 11:36:00 UTC

[jira] [Resolved] (TIKA-2438) Test failure at OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102

     [ https://issues.apache.org/jira/browse/TIKA-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2438.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.17

There was a mismatch in the locale settings.  The unit test was taking the locale from POI LocaleUtil, but the parsers were taking the Locale from Locale.getDefault().   I switched the parsers to take the locale from LocaleUtil if a user doesn't specify Locale via the ParseContext.

I also added a unit test to make sure that Locale.ITALIAN works.  This failed before the modifications.

Many thanks to [~krichter] for raising this issue!  Please let us know if this doesn't fix the build for you.

> Test failure at OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102
> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2438
>                 URL: https://issues.apache.org/jira/browse/TIKA-2438
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Karl Richter
>             Fix For: 1.17
>
>
> `mvn clean install` fails due to
> {code:java}
> Failed tests: 
>   OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102 1.23456789012345E+15</td>	<td>1.23456789012345E+15 not found in:
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="date" content="2016-07-22T11:32:25Z" />
> <meta name="extended-properties:AppVersion" content="16.0300" />
> <meta name="dc:creator" content="Allison, Timothy B." />
> <meta name="extended-properties:Company" content="" />
> <meta name="dcterms:created" content="2016-06-29T16:29:27Z" />
> <meta name="Last-Modified" content="2016-07-22T11:32:25Z" />
> <meta name="dcterms:modified" content="2016-07-22T11:32:25Z" />
> <meta name="Last-Save-Date" content="2016-07-22T11:32:25Z" />
> <meta name="protected" content="false" />
> <meta name="meta:save-date" content="2016-07-22T11:32:25Z" />
> <meta name="Application-Name" content="Microsoft Excel" />
> <meta name="modified" content="2016-07-22T11:32:25Z" />
> <meta name="Content-Type" content="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
> <meta name="creator" content="Allison, Timothy B." />
> <meta name="meta:author" content="Allison, Timothy B." />
> <meta name="meta:creation-date" content="2016-06-29T16:29:27Z" />
> <meta name="extended-properties:Application" content="Microsoft Excel" />
> <meta name="meta:last-author" content="Allison, Timothy B." />
> <meta name="Creation-Date" content="2016-06-29T16:29:27Z" />
> <meta name="Last-Author" content="Allison, Timothy B." />
> <meta name="Application-Version" content="16.0300" />
> <meta name="Author" content="Allison, Timothy B." />
> <meta name="publisher" content="" />
> <meta name="dc:publisher" content="" />
> <title></title>
> </head>
> <body><div><h1>Sheet1</h1>
> <table><tbody><tr>	<td>123456789012345</td>	<td>123456789012346</td></tr>
> <tr>	<td>1,23456789012345E+15</td>	<td>1,23456789012345E+15</td></tr>
> <tr />
> </tbody></table>
> </div>
> </body></html>
>   OOXMLParserTest.testXLSBVarious:1552->TikaTest.assertContains:102 <td>13.1211231321</td> not found in:
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="date" content="2017-03-10T14:58:49Z" />
> <meta name="extended-properties:AppVersion" content="16.0300" />
> <meta name="dc:creator" content="Allison, Timothy B." />
> <meta name="extended-properties:Company" content="" />
> <meta name="dcterms:created" content="2017-03-09T12:24:26Z" />
> <meta name="Last-Modified" content="2017-03-10T14:58:49Z" />
> <meta name="dcterms:modified" content="2017-03-10T14:58:49Z" />
> <meta name="Last-Save-Date" content="2017-03-10T14:58:49Z" />
> <meta name="protected" content="false" />
> <meta name="meta:save-date" content="2017-03-10T14:58:49Z" />
> <meta name="Application-Name" content="Microsoft Excel" />
> <meta name="modified" content="2017-03-10T14:58:49Z" />
> <meta name="Content-Type" content="application/vnd.ms-excel.sheet.binary.macroenabled.12" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
> <meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
> <meta name="creator" content="Allison, Timothy B." />
> <meta name="meta:author" content="Allison, Timothy B." />
> <meta name="meta:creation-date" content="2017-03-09T12:24:26Z" />
> <meta name="extended-properties:Application" content="Microsoft Excel" />
> <meta name="meta:last-author" content="Allison, Timothy B." />
> <meta name="Creation-Date" content="2017-03-09T12:24:26Z" />
> <meta name="Last-Author" content="Allison, Timothy B." />
> <meta name="X-TIKA:origResourceName" content="C:\Users\tallison\Desktop\working\xlsb\" />
> <meta name="Application-Version" content="16.0300" />
> <meta name="Author" content="Allison, Timothy B." />
> <meta name="publisher" content="" />
> <meta name="dc:publisher" content="" />
> <title></title>
> </head>
> <body><div><h1>mySheet1</h1>
> <table><tbody><tr>	<td>String</td>	<td>This is a string</td></tr>
> <tr>	<td>integer</td>	<td>13</td></tr>
> <tr>	<td>float</td>	<td>13,1211231321</td></tr>
> <tr>	<td>currency</td>	<td>$   3.03</td></tr>
> <tr>	<td>percent</td>	<td>20%</td></tr>
> <tr>	<td>float 2</td>	<td>13,12</td></tr>
> <tr>	<td>long int</td>	<td>123456789012345</td></tr>
> <tr>	<td>longer int</td>	<td>1,23456789012345E+15</td>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> test comment2
> </td></tr>
> <tr>	<td>fraction</td>	<td>1/4</td></tr>
> <tr>	<td>date</td>	<td>3/9/17</td></tr>
> <tr>	<td>comment</td>	<td>contents<br />
> Allison, Timothy B.: Allison, Timothy B.:
> test comment
> </td></tr>
> <tr>	<td>hyperlink</td>	<td>tika_link</td></tr>
> <tr>	<td>formula</td>	<td>4</td>	<td>2</td></tr>
> <tr>	<td>formulaErr</td>	<td>ERROR</td></tr>
> <tr>	<td>formulaFloat</td>	<td>0,5</td>	<td>March</td>	<td>April</td></tr>
> <tr>	<td>customFormat1</td>	<td>   46/1963</td>	<td>merchant1</td>	<td>1</td>	<td>3</td></tr>
> <tr>	<td>customFormat2</td>	<td>  3/128</td>	<td>merchant2</td>	<td>2</td>	<td>4</td></tr>
> <tr>	<td>text test</td></tr>
> <tr>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment1
> </td></tr>
> <tr>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment2
> </td></tr>
> <tr>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment3
> </td></tr>
> <tr>	<td>the</td>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment4 (end of row)
> </td></tr>
> <tr>	<td>the</td>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment5 between cells
> </td>	<td>quick</td></tr>
> <tr>	<td>comment6<br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment6 actually in cell
> </td></tr>
> <tr>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment7 end of file
> </td></tr>
> <tr>	<td><br />
> Allison, Timothy B.: Allison, Timothy B.:
> comment8 end of file</td></tr>
> </tbody></table>
> <p>OddLeftHeader OddCenterHeader OddRightHeader</p>
> <p>EvenLeftHeader EvenCenterHeader EvenRightHeader
> </p>
> <p>FirstPageLeftHeader FirstPageCenterHeader FirstPageRightHeader</p>
> <p>OddLeftFooter OddCenterFooter OddRightFooter</p>
> <p>EvenLeftFooter EvenCenterFooter EvenRightFooter</p>
> <p>FirstPageLeftFooter FirstPageCenterFooter FirstPageRightFooter</p>
> <p>test textbox
> </p>
> <a href="http://lucene.apache.org/">http://lucene.apache.org/</a><p>myChartTitle</p>
> <p />
> merchant1	March	April	1	3	merchant2	March	April	2	4	<p />
> <p />
> <p />
> <p />
> <p>test WordArt</p>
> <p>myChartTitle</p>
> <p />
> merchant1	March	April	1	3	merchant2	March	April	2	4	<p />
> <p />
> <p />
> <p />
> <p>myChartTitle</p>
> <p />
> merchant1	March	April	1	3	merchant2	March	April	2	4	<p />
> <p />
> <p />
> <p />
> <a href="http://tika.apache.org/">http://tika.apache.org/</a></div>
> <div class="package-entry" /><div class="package-entry" /><div class="package-entry" /></body></html>
> {code}
> experienced with 1.16-75-g4455a6f08



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)