You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/07/18 17:07:00 UTC

[jira] [Commented] (SOLR-12561) Port ExtractionDateUtil to java.time API

    [ https://issues.apache.org/jira/browse/SOLR-12561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548094#comment-16548094 ] 

David Smiley commented on SOLR-12561:
-------------------------------------

Patch:
* Use java.time thoroughly; no java.text remnants nor use of Date.  Always Instant.
* Enhanced TestExtractionDateUtil a lot to be more thorough to test more of the supported patterns and their idiosyncrasies.  I want to ensure we don't break back-compat here!  These better tests helped uncovered some issues during development of this switch.
* Two of the default patterns had a lowercase "hh" for hour of AM/PM instead of "HH" for hour of day.  SimpleDateFormat seemed to deal with this but I think they are fundamentally invalid without an AM/PM qualifier.  I switched them to HH.  If someone custom configures the patterns in their solr config, they'll need to use the correct designator.
* Use parsed DateTimeFormatter instances instead of Strings in SolrContentHandler and it's factory.  Since order might be significant or might be used for performance reasons, I also switched to LinkedHashSet from HashSet for the impl in ExtractingRequestHandler's config parser.

This seems safe for 7.x; any break would seem to be very obscure IMO.  On the other hand, 8.0 will be out this fall or so.

> Port ExtractionDateUtil to java.time API
> ----------------------------------------
>
>                 Key: SOLR-12561
>                 URL: https://issues.apache.org/jira/browse/SOLR-12561
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: master (8.0)
>
>         Attachments: SOLR-12561.patch
>
>
> The ExtractionDateUtil class in the extraction contrib uses SimpleDateFormatter.  The Java 8 java.time API is superior; you can find articles out there why.  One thing that comes to mind is less timezone bugginess – SOLR-10243.  Although the API may be a bit baroque IMO (over-engineered).  Here I'd like to switch over the API and furthermore have the patterns be pre-parsed so that at runtime we don't need to re-parse the patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org