You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/11/26 10:20:31 UTC
DO NOT REPLY [Bug 5077] New: - Report top-level whitespace

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5077>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5077

Report top-level whitespace

           Summary: Report top-level whitespace
           Product: Xerces2-J
           Version: 2.0.0 [beta 2]
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: XNI
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: jjc@jclark.com


I would like XMLDocumentHandler to get an additional method that
reports top-level whitespace (whitespace not inside declarations,
comments or processing instructions) in the prolog or epilog.

The main reason I want this is that I want to be able to report the
location of the first character of the markup for a node.  Now the
locator usually gives the first character of the markup following a
markup for a paticular event.  So in almost all cases I can get the
first character of the current event by using the saved location from
the previous event.  However, this fails for objects in the prolog
and epilog and for the document element because there is no
event for top-level whitespace. For example, given a document:

  <!--a comment-->
  <doc/>

the whitespace between the comment and the document element is not
reported, so the closest I cannot accurately determine the location of
the first character of the start-tag; the closest I can come is the
first character of the whitespace preceding the start-tag.

It's easy to provide this information and has minimal impact on
applications that are not interested in it.

I believe the .NET Framework XML parser provides this information.

I think it's consistent with the XNI philosophy of providing a
lossless information set.  I would suggest that as a general
principle, for every character in the document entity, there should be
a callback with which that character is associated.  At the moment, I
believe the only exception to this is top-level whitespace.

Consider the document:

<?xml version="1.0" encoding="utf-8"?>
<doc/>

If this is parsed as an external entity, then the whitespace between
the XML declaration and the element will be preserved; but if it's
parsed as a document entity, the whitespace is totally thrown away.
It would be more intuitive if XNI could preserve all whitespace except
whitespace within markup (i.e. tags, declarations, comments,
processing instructions).

If a method is added to XMLDocumentHandler, then it would also be
natural to add a similar method to XMLDTDHandler providing whitespace
between markup declarations.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org