You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commons-dev@ws.apache.org by "Andreas Veithen (JIRA)" <ji...@apache.org> on 2008/12/15 17:51:44 UTC

[jira] Reopened: (WSCOMMONS-394) StAXUtils: Add Network Detached XMLStreamReader capability

     [ https://issues.apache.org/jira/browse/WSCOMMONS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Veithen reopened WSCOMMONS-394:
---------------------------------------

      Assignee: Andreas Veithen  (was: Rich Scheuerle)

While I agree with the analysis, I think the proposed solution is suboptimal:

1) The "network detached" XMLStreamReader will still try to connect to the network to retrieve the external DTD subset. This doesn't solve the performance problem. It might even make it worse if the network error is only triggered after a timeout.

2) In order to provide predictable results, Axiom should either attempt to load the DTD and report an error if it fails, or not attempt to load the DTD at all. The current solution might lead to subtle bugs when a machine that normally is connected to the network is suddenly disconnected.

3) The current solution simply ignores the error and continues to pull events from the parser. However there is a risk that after the error, the parser remains in an inconsistent state. WSCOMMONS-372 shows a case where after throwing an exception from XMLStreamReader#getText() caused by an unexpected end of stream, Woodstox happily continues to return events. This might also happen with the current solution.

One of the problems is that even if IS_SUPPORTING_EXTERNAL_ENTITIES is set to false, Woodstox still tries to load the external DTD subset. This can be avoided by registering a custom XMLResolver that simply returns an empty document when asked to load the DTD. I tested this solution and it gives the expected result. In particular, the parser no longer throws an exception, so that we can get rid of the workaround implemented in StAXOMBuilder#getDTDText().

If there are no objections, I will clean up my solution and than commit it.

I also noticed that the test case in OMDTDTest is not entirely correct. In fact it tries to simulate a network error using a malformed URL. However even if the parser didn't try to load the DTD, it would still be allowed to complain about the invalid URL. The test case should use a well formed URL but make sure that there is no document at that URL.

> StAXUtils: Add Network Detached XMLStreamReader capability
> ----------------------------------------------------------
>
>                 Key: WSCOMMONS-394
>                 URL: https://issues.apache.org/jira/browse/WSCOMMONS-394
>             Project: WS-Commons
>          Issue Type: Improvement
>          Components: AXIOM
>            Reporter: Rich Scheuerle
>            Assignee: Andreas Veithen
>             Fix For: Axiom 1.2.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Background:
> The JSR 173 (StAX) Specification did not do an adequate job defining the semantics for processing DTD DOCTYPE constructs.
> The reference implementation's getValue() returns the entire subset of the DOCTYPE instead of returning the instance (docinfo) information.
> This is a known issue and has been discussed on the forum.
> http://markmail.org/message/im6f2yu2y544k3he
> The problem is worse if the DOCTYPE references as external location.  To get the subset, the parser implementation must do a network call.
> This is (a) ill-performant and (b) requires the application to be attached to a network.
> In addition, the various parser implementations have different mechanisms for getting the DOCTYPE subset.  Some implementations apparently defer
> the processing until the getText() call...while other implementations load the subset when the tag is processed.
> Problem Scenario:
> Configuration and deployment files (i.e. web.xml) often contain DOCTYPE constructs.   In many situations, the deployer may not be connected to the 
> network when processing the file.   In such a scenario, the deployer needs a mechanism to process the file without being hindered by the DOCTYPE
> processing.
> Solution:
> The proposed solution is to add new methods to StAXUtils:
>    XMLStreamReader getNetworkDetachedXMLStreamReader(...)
> A caller (i.e. a deployer application) can use the new methods to safely obtain an XMLStreamReader that is configured for a network detached environment.
> As StAX changes, we can update the implementation of the methods.
> Next Action:
> I am working on the proposed solution and tests.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.