You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/04 01:42:05 UTC

[jira] [Closed] (TIKA-1054) Problem with parsing excel date formats

     [ https://issues.apache.org/jira/browse/TIKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Palsulich closed TIKA-1054.
---------------------------------
    Resolution: Not a Problem

Closing as Not a Problem, following the above Locale comments. I don't know if it applies for this issue, but hopefully most code doesn't depend on Locale, after we added in forbiddenapischecker.

> Problem with parsing excel date formats
> ---------------------------------------
>
>                 Key: TIKA-1054
>                 URL: https://issues.apache.org/jira/browse/TIKA-1054
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>            Reporter: Olof Jonasson
>
> I'm using solr4.0 and tika1.2 and get some problems with indexing excel files containing date formats. I've read TIKA-125, TIKA-371, TIKA-103 and TIKA-360 and there I get the impression that the date formatting problem is solved (for some cases at least).
> I've used testEXCEL-formats.xls from TIKA-103 and also resaved it as xlsx and tested that as well. Default locale on my computer is swedish. This is what I get (sorry for the occasional swedish):
> Content of testEXCEL-formats.xlsx and testEXCEL-formats.xls
> Number #,##0.00 1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 AM 6:15 PM
> Time Format: h:mm 06:15 18:15
> Date Format: m/d/yy 2009-10-03
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format 2008-01-19 04:35
> Custom Number: 19 dollars and ,99 cents
> Custom Date: At 4:20 AM on torsdag maj 17, 2007
> What the tika1.2 parser returns for the xlsx (and is indexed by solr)
> Number #,##0.00 1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 fm 6:15 em
> Time Format: h:mm 6:15 18:15
> Date Format: m/d/yy 2009/10/03
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format 1/19/08 4:35
> Custom Number: 19,99 dollars and cents
> Custom Date: 39219.18056369212 
> What the tika1.2 parser returns for the xls (and is indexed by solr)
> Number #,##0.00  1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 fm 6:15 em
> Time Format: h:mm  6:15 18:15
> Date Format: m/d/yy 10/3/09
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format  1/19/08 4:35
> Custom Number: 19,99 dollars and cents
> Custom Date: 39219.18056369212
> --- 
> Unexpected formats for:
> Date Format: m/d/yy 2009-10-03
> Date/Time Format 2008-01-19 04:35
> Custom Date: At 4:20 AM on torsdag maj 17, 2007



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)