You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Mukul Gandhi (JIRA)" <xe...@xml.apache.org> on 2019/01/15 05:55:00 UTC

[jira] [Comment Edited] (XERCESJ-1705) Validation against asserts (1.1) is slow and takes up a lot of memory for larger files.

    [ https://issues.apache.org/jira/browse/XERCESJ-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741769#comment-16741769 ] 

Mukul Gandhi edited comment on XERCESJ-1705 at 1/15/19 5:54 AM:
----------------------------------------------------------------

Here are my findings after analyzing your bug report, using the file attachments provided by you. Your XML file is of the size about 2.3 MB. This is a reasonably sized XML file and is not large, and asserts should run fast on such a XML file. Within your XML, the total number of sibling element A are 91000. The structure of element A is very shallow (i.e its of very tiny height). The assert in your XSD evaluates on each element A. On my localhost, the XSD validation took about 25 sec (about the same order as the timing you've reported).

I think, an assert evaluation on one A XML element takes very less time. The total time (25 sec. or 20 sec.) for all assert evaluations, is the time to repeat one fast assert 91000 times. I don't think this is a performance bug in Xerces. As a comparison, consider following java code (which is a simple repetition done 91000 times),

long start = System.currentTimeMillis();
 for (int idx = 0; idx < 91000; idx++) {      

     System.out.println(idx);

}

long end = System.currentTimeMillis();
 System.out.println((end - start) + "ms");

The time reported by this code on my localhost is, about 15 sec. Is this a performance bug with this java code? I don't think so. Just the repetitions are too many.

If we remove your assert from its original place and instead have assert as shown in below XSD fragment (non assert parts are copied from your example),

<xsd:element name="root">
     <xsd:complexType>

       <xsd:sequence ...>

          ...

       </xsd:sequence>
        <xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or ($a/B and not($a/C)))"/>
     </xsd:complexType>
 </xsd:element>

This new assert conceptually does the same thing as your assert, but it is evaluated only once (on a much much bigger XML tree fragment). The XML validation time reported after this change, on my localhost is about 6.5 sec (this is a huge improvement as compared to the original time).

My conclusion is that the issue reported by your examples is not a Xerces performance bug.


was (Author: mukul_gandhi):
Here are my findings after analyzing your bug report with the file attachments. Your XML file is of the size about 2.3 MB. This is a reasonably sized XML file and is not large, and asserts should run fast on such a XML file. Within your XML, the total number of sibling element A are 91000. The structure of element A is very shallow. The assert in your XSD evaluates on each element A. On my localhost, the XSD validation took about 25 sec (about the same order as the timing you've reported).

I think, an assert evaluation on one A takes very tiny time. The total time (25 sec. or 20 sec.) for all assert evaluations, is the time to repeat one fast assert 91000 times. I don't think this is a performance bug in Xerces. As a comparison, consider following java code (which is a simple repetition done 91000 times),

long start = System.currentTimeMillis();
 for (int idx = 0; idx < 91000; idx++) {
     System.out.println(idx);
 }
 long end = System.currentTimeMillis();
 System.out.println((end - start) + "ms");

The time reported by this code on my localhost is, about 15 sec. Is this a performance bug with this java code? I don't think so. Just the repetitions are too many.

If we remove your assert from its original place, and instead have assert as shown in below XSD fragment (copied from your example),

<xsd:element name="root">
    <xsd:complexType>

       <xsd:sequence ...>

          ...

       </xsd:sequence>
       <xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or ($a/B and not($a/C)))"/>
    </xsd:complexType>
 </xsd:element>

This new assert conceptually does the same thing as your assert, but it is evaluated only once (on a much much bigger tree fragment). The XML validation time reported by this change, on my localhost is about 6.5 sec (this is a huge improvement as compared to the original time).

My conclusion is that the issue reported by your examples is not a Xerces performance bug.

> Validation against asserts (1.1) is slow and takes up a lot of memory for larger files.
> ---------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1705
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1705
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.1 Structures
>    Affects Versions: 2.12.0
>            Reporter: Gerben Abbink
>            Priority: Major
>         Attachments: PROBLEM.xml, PROBLEM.xsd
>
>
> The validation of xml against asserts in XMLSchema 1.1 is slow and takes up a lot of memory for larger xml files. I have created a simple test xml file with lots of repetition and a corresponding xml schema to show the problem.
> It takes 20 sec. to validate the xml against the xml schema. When i remove the asserts in the xml schema it takes just 1 second to validate. Testing was done from the command prompt on a modern Windows machine with 8GByte memory.
> To compare, i have also validated the xml file against the xml schema in XMLSpy. With asserts it takes 2 sec., without the asserts 1 sec. (XMLSpy does not uses Xerces.)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org