You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ji...@apache.org on 2004/04/23 23:43:53 UTC

[jira] Closed: (XERCESJ-76) Report top-level whitespace

Message:

   The following issue has been closed.

   Resolver: Andy Clark
       Date: Fri, 23 Apr 2004 2:42 PM

This issue is an old feature request that was never implemented. So I am closing it. If someone wants this functionality and will actually spend the time to implement it, then feel free to re-open this issue.
---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESJ-76

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESJ-76
    Summary: Report top-level whitespace
       Type: Bug

     Status: Closed
 Resolution: WON'T FIX

    Project: Xerces2-J
 Components: 
             XNI
   Versions:
             2.0.0 [beta 2]

   Assignee: 
   Reporter: James Clark

    Created: Mon, 26 Nov 2001 1:20 AM
    Updated: Fri, 23 Apr 2004 2:42 PM
Environment: Operating System: Other
Platform: Other

Description:
I would like XMLDocumentHandler to get an additional method that
reports top-level whitespace (whitespace not inside declarations,
comments or processing instructions) in the prolog or epilog.

The main reason I want this is that I want to be able to report the
location of the first character of the markup for a node.  Now the
locator usually gives the first character of the markup following a
markup for a paticular event.  So in almost all cases I can get the
first character of the current event by using the saved location from
the previous event.  However, this fails for objects in the prolog
and epilog and for the document element because there is no
event for top-level whitespace. For example, given a document:

  <!--a comment-->
  <doc/>

the whitespace between the comment and the document element is not
reported, so the closest I cannot accurately determine the location of
the first character of the start-tag; the closest I can come is the
first character of the whitespace preceding the start-tag.

It's easy to provide this information and has minimal impact on
applications that are not interested in it.

I believe the .NET Framework XML parser provides this information.

I think it's consistent with the XNI philosophy of providing a
lossless information set.  I would suggest that as a general
principle, for every character in the document entity, there should be
a callback with which that character is associated.  At the moment, I
believe the only exception to this is top-level whitespace.

Consider the document:

<?xml version="1.0" encoding="utf-8"?>
<doc/>

If this is parsed as an external entity, then the whitespace between
the XML declaration and the element will be preserved; but if it's
parsed as a document entity, the whitespace is totally thrown away.
It would be more intuitive if XNI could preserve all whitespace except
whitespace within markup (i.e. tags, declarations, comments,
processing instructions).

If a method is added to XMLDocumentHandler, then it would also be
natural to add a similar method to XMLDTDHandler providing whitespace
between markup declarations.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org