You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/07/18 17:07:00 UTC
[jira] [Commented] (SOLR-12561) Port ExtractionDateUtil to
java.time API
[ https://issues.apache.org/jira/browse/SOLR-12561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548094#comment-16548094 ]
David Smiley commented on SOLR-12561:
-------------------------------------
Patch:
* Use java.time thoroughly; no java.text remnants nor use of Date. Always Instant.
* Enhanced TestExtractionDateUtil a lot to be more thorough to test more of the supported patterns and their idiosyncrasies. I want to ensure we don't break back-compat here! These better tests helped uncovered some issues during development of this switch.
* Two of the default patterns had a lowercase "hh" for hour of AM/PM instead of "HH" for hour of day. SimpleDateFormat seemed to deal with this but I think they are fundamentally invalid without an AM/PM qualifier. I switched them to HH. If someone custom configures the patterns in their solr config, they'll need to use the correct designator.
* Use parsed DateTimeFormatter instances instead of Strings in SolrContentHandler and it's factory. Since order might be significant or might be used for performance reasons, I also switched to LinkedHashSet from HashSet for the impl in ExtractingRequestHandler's config parser.
This seems safe for 7.x; any break would seem to be very obscure IMO. On the other hand, 8.0 will be out this fall or so.
> Port ExtractionDateUtil to java.time API
> ----------------------------------------
>
> Key: SOLR-12561
> URL: https://issues.apache.org/jira/browse/SOLR-12561
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: master (8.0)
>
> Attachments: SOLR-12561.patch
>
>
> The ExtractionDateUtil class in the extraction contrib uses SimpleDateFormatter. The Java 8 java.time API is superior; you can find articles out there why. One thing that comes to mind is less timezone bugginess – SOLR-10243. Although the API may be a bit baroque IMO (over-engineered). Here I'd like to switch over the API and furthermore have the patterns be pre-parsed so that at runtime we don't need to re-parse the patterns.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org