You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Mukul Gandhi (Jira)" <xe...@xml.apache.org> on 2021/11/19 11:04:00 UTC

[jira] [Resolved] (XERCESJ-1716) Validating XML against XSD is slow for long strings if pattern restrictions are defined, even if maxLength is restricted.

     [ https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukul Gandhi resolved XERCESJ-1716.
-----------------------------------
      Assignee: Mukul Gandhi
    Resolution: Workaround

> Validating XML against XSD is slow for long strings if pattern restrictions are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1716
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1716
>             Project: Xerces2-J
>          Issue Type: Improvement
>            Reporter: Márk Petrényi
>            Assignee: Mukul Gandhi
>            Priority: Major
>         Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided as a value in the input xml, we would assume that it is regarded invalid quickly because of the length. Actually the validation takes several minutes since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way (workaround.xsd):
> {code:xml}
>  <xsd:simpleType name="SimpleText255Type">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="xsd:string">
>  <xsd:minLength value="1"/>
>  <xsd:maxLength value="255"/>
>  <xsd:pattern value=".\{1,255}"/>
>  </xsd:restriction>
>  </xsd:simpleType>
>  <xsd:simpleType name="SimpleText255NotBlankType">
>  <xsd:annotation>
>  <xsd:documentation xml:lang="en">String of maximum 255 characters, not blank</xsd:documentation>
>  </xsd:annotation>
>  <xsd:restriction base="SimpleText255Type">
>  <xsd:pattern value=".*[^\s].*"/>
>  </xsd:restriction>
>  </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be evaluated first and it fails relatively quickly thus the time consuming second regex wont be checked.
> It would be great to have the regex pattern checked after validating other xsd restrictions (minLength, maxLength, etc..) or to have control over the validation ordering, thus avoiding unneccesseraly slow validations and the use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string value. The problem can be checked using the SourceValidator from Xerces2-J samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org