You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Michael Osipov (Jira)" <ji...@apache.org> on 2022/08/02 16:33:00 UTC

[jira] [Commented] (DOXIA-669) Improve/rework CachedFileEntityResolver

    [ https://issues.apache.org/jira/browse/DOXIA-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574317#comment-17574317 ] 

Michael Osipov commented on DOXIA-669:
--------------------------------------

Note: The validation cannot be turned off/skipped for HTML5 doctype since both public id and system id are null. Best is not to validate these documents at all.

> Improve/rework CachedFileEntityResolver
> ---------------------------------------
>
>                 Key: DOXIA-669
>                 URL: https://issues.apache.org/jira/browse/DOXIA-669
>             Project: Maven Doxia
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-M3
>            Reporter: Michael Osipov
>            Assignee: Michael Osipov
>            Priority: Major
>             Fix For: 2.0.0-M4
>
>
> While working on a few other the following flaws have been detected with the {{CachedFileEntityResolver}} which need to be addressed:
> * If a resource is not available, but through HTTP and the target will redirect to HTTPS {{HttpURLConnection}} will not follow by default and even worse it will not notify us
> * Our required resources: fml.xsd, xdoc.xsd and xml.xsd are only checked by system id and not public id which means you need to take care of multiple URLs
> * It perfoms outbound connections for resources which could be available offline, e.g., schemas from above
> * It logs zero information what is happening making debugging very hard
> * If a document does not supply a schema or DTD the validation fails while logically there is nothing to validate. E.g., HTML5 is now schemaless with mere {{<!DOCTYPE html>}}.
> Things to be done:
> * Have all supported schema in the classpath for fast access
> * Remove all not used schemas
> * Provide a public id to classpath resource mapping to avoid alternating system ids
> * Add debug logging to assist analysis
> * Don't fail if a schema is not required
> * If URL is a file load directly because file IO is fast.
> Likely other points.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)