You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/09/05 03:27:00 UTC

[jira] [Commented] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)

    [ https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603861#comment-16603861 ] 

David Smiley commented on TIKA-2722:
------------------------------------

BTW I believe I found a JDK bug so I reported it, including a demonstration program.  When I get the official/public bug ID, I will report back here with it.

> Don't call Date.toString (Possible issue with JDK 11)
> -----------------------------------------------------
>
>                 Key: TIKA-2722
>                 URL: https://issues.apache.org/jira/browse/TIKA-2722
>             Project: Tika
>          Issue Type: Bug
>         Environment: Tika 1.18, JDK 11 with locale set to "ar-EG".  
>            Reporter: David Smiley
>            Priority: Major
>
> I'm troubleshooting [a test failure in Apache Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] "extracting" contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 passes; I don't know about JDK 10. It has to do with extracting date metadata from a PDF, particularly the created date but perhaps others too.
> I stepped through the code into Tika and I think I've found out where the troublesome code is.  First note PDFParser line 271: {{addMetadata(metadata, "created", info.getCreationDate());}}.  That addMetadata overload variant will call toString on a Date.  IMO that's asking for trouble since the output of that is Locale-dependent.  I think that's okay to show to a user but not for machine-to-machine information exchange.  In the case of the test, it yielded this odd looking date string:
> Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008
> I pasted that in and it looks consistent with what I see in IntelliJ and in Jenkins logs; hopefully will post correctly to JIRA.  The odd part is the hour & minutes relative to GMT.  I won't be certain until after I click "Create".
> Perhaps this problem is also indicative of a JDK 11 bug?  Nevertheless I think Tika should avoid calling Date.toString().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)