You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/05/15 21:47:07 UTC

[jira] [Created] (TIKA-922) iWork number cell formats which are being modified in parsing

Erik Peterson created TIKA-922:
----------------------------------

             Summary: iWork number cell formats which are being modified in parsing
                 Key: TIKA-922
                 URL: https://issues.apache.org/jira/browse/TIKA-922
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
         Environment: Windows 7, 64 bit
            Reporter: Erik Peterson


iWork Number cell formats which Tika parser is parsing but in a modified form.

  Percentage turns into a decimal. ie 90% becomes .9000000002 
  Accounting appends a $, but the $ is missing from parsed data
  Fraction is turned into a decimal 
  Number System (ie Binary) translated to decimal. Ie '11001000' becomes '200'
  Scientific Numbers translated to decimal. ie 9.0000E-03 becomes 9000 
  Drop down menu parses all the menu items, but not what's selected. 
  Currency & Number aren't displayed properly ie. $0.60 becomes .59999 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-922) iWork number cell formats which are being modified in parsing

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276390#comment-13276390 ] 

Nick Burch commented on TIKA-922:
---------------------------------

Tika is returning the values/text stored in the file itself, and is not doing any interpretation on them. If iWorks stores 90% as 0.9 (or as close to that as floating point allows), then that's what we'll return

For the Excel formats, something very similar gets stored in the files too. However, for the Excel formats, we have a full library (Apache POI) around it to handle formatting

As there's no such library for iWorks at the moment, I wonder how close the iWorks formatting rules are to Excel ones? If they're close enough, then we might be able to re-use some of the formatting support in POI
                
> iWork number cell formats which are being modified in parsing
> -------------------------------------------------------------
>
>                 Key: TIKA-922
>                 URL: https://issues.apache.org/jira/browse/TIKA-922
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>              Labels: iwork
>
> iWork Number cell formats which Tika parser is parsing but in a modified form.
>   Percentage turns into a decimal. ie 90% becomes .9000000002 
>   Accounting appends a $, but the $ is missing from parsed data
>   Fraction is turned into a decimal 
>   Number System (ie Binary) translated to decimal. Ie '11001000' becomes '200'
>   Scientific Numbers translated to decimal. ie 9.0000E-03 becomes 9000 
>   Drop down menu parses all the menu items, but not what's selected. 
>   Currency & Number aren't displayed properly ie. $0.60 becomes .59999 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira