You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/06/13 15:32:00 UTC
[jira] [Commented] (TIKA-2666) Document last printed in the year
27321
[ https://issues.apache.org/jira/browse/TIKA-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511302#comment-16511302 ]
Tim Allison commented on TIKA-2666:
-----------------------------------
Thank you for noticing and sharing this with us!
When I step through the processing in a debugger, the low and high int file times are both large negative numbers for the last printed date.
I don't know this chunk of POI well enough to determine if this is a bug or a corrupt file, but I suspect this is a corrupt file. MSOffice handles this corruption by showing you "00:00"; I see "12:00" when I check the properties from the file system, and I see no value for last printed when I open it in PPT2016.
If you'd like POI to add a sanity check on dates, please open an issue on their bugzilla: https://bz.apache.org/bugzilla/describecomponents.cgi?product=POI
> Document last printed in the year 27321
> ---------------------------------------
>
> Key: TIKA-2666
> URL: https://issues.apache.org/jira/browse/TIKA-2666
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.17
> Reporter: Isabelle Giguere
> Priority: Minor
> Attachments: Genetic_Factors_and_the_Directionality_of.ppt, PPT_lastPrinted_00.png, tika-app-1.17.metadata.txt
>
>
> Tika extracts a strange last print date for the attached PowerPoint (97-2003)
> In the attached screen shot PPT_lastPrinted_00.png, the date for last print was set to 00:00
> But when Tika extracts metadata from this document, the last print date is in the year 27321 !
> Last-Printed: 27321-01-23T08:20:12Z
> meta:print-date: 27321-01-23T08:20:12Z
> Attached metadata obtained using Tika 1.17
> This weird date is causing issues further down in processing. We can probably filter it out for now, but I do wonder how 00:00 turns into 27321-01-23T08:20:12Z
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)