You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/05/15 21:47:07 UTC
[jira] [Created] (TIKA-922) iWork number cell formats which are
being modified in parsing
Erik Peterson created TIKA-922:
----------------------------------
Summary: iWork number cell formats which are being modified in parsing
Key: TIKA-922
URL: https://issues.apache.org/jira/browse/TIKA-922
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.0
Environment: Windows 7, 64 bit
Reporter: Erik Peterson
iWork Number cell formats which Tika parser is parsing but in a modified form.
Percentage turns into a decimal. ie 90% becomes .9000000002
Accounting appends a $, but the $ is missing from parsed data
Fraction is turned into a decimal
Number System (ie Binary) translated to decimal. Ie '11001000' becomes '200'
Scientific Numbers translated to decimal. ie 9.0000E-03 becomes 9000
Drop down menu parses all the menu items, but not what's selected.
Currency & Number aren't displayed properly ie. $0.60 becomes .59999
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-922) iWork number cell formats which are
being modified in parsing
Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276390#comment-13276390 ]
Nick Burch commented on TIKA-922:
---------------------------------
Tika is returning the values/text stored in the file itself, and is not doing any interpretation on them. If iWorks stores 90% as 0.9 (or as close to that as floating point allows), then that's what we'll return
For the Excel formats, something very similar gets stored in the files too. However, for the Excel formats, we have a full library (Apache POI) around it to handle formatting
As there's no such library for iWorks at the moment, I wonder how close the iWorks formatting rules are to Excel ones? If they're close enough, then we might be able to re-use some of the formatting support in POI
> iWork number cell formats which are being modified in parsing
> -------------------------------------------------------------
>
> Key: TIKA-922
> URL: https://issues.apache.org/jira/browse/TIKA-922
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Labels: iwork
>
> iWork Number cell formats which Tika parser is parsing but in a modified form.
> Percentage turns into a decimal. ie 90% becomes .9000000002
> Accounting appends a $, but the $ is missing from parsed data
> Fraction is turned into a decimal
> Number System (ie Binary) translated to decimal. Ie '11001000' becomes '200'
> Scientific Numbers translated to decimal. ie 9.0000E-03 becomes 9000
> Drop down menu parses all the menu items, but not what's selected.
> Currency & Number aren't displayed properly ie. $0.60 becomes .59999
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira