You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by md...@us.britannica.com on 2000/07/13 17:03:42 UTC

fixing HTML special characters (was: RE: Bug in DTMLiaison.java.. .)

Hey David,

I've been having problems with HTML special characters too.  I haven't been
able to pin down the bug, but I did create a work around.  

The problem seems to show up only in large documents (perhaps documents that
exceed the 8k character buffer?) and only seems to occur with special
characters greater than   (eg. mdashes, left and right quotes).

Xalan attempts to convert these entities from their numeric values to a
single character, and somehow those characters get garbled.  However, when I
changed the characters() method of FormatterToHTML to retain the numbered
entities, the problem goes away.  I used this logic:

  if ((160 > ch) && (ch > 126)) {
    accum('&');
    accum('#');
    String intStr = Integer.toString(ch);
    int nIntStr = intStr.length();
    for(int k = 0; k < nIntStr; k++)
    {
      accum(intStr.charAt(k));
    }
    accum(';');
  } 

I guess the question is, do you prefer using numbered entities or single
characters in this case?  It's really a judgement call, it seems; the HTML
specification gives no clear preference to either.

Morgan Delagrange
Britannica.com


> -----Original Message-----
> From: David_Marston@lotus.com [mailto:David_Marston@lotus.com]
> Sent: Wednesday, July 12, 2000 10:28 AM
> To: xalan-dev@xml.apache.org
> Subject: Re: Bug in DTMLiaison.java...
> 
> ...
> By the way, if anyone is in a bug-fixing mood, they could look into
> any of these areas:
> 1. Namespace handling: functions, namespace:: axis, exclusion
> 2. HTML special characters, after discussion on this list
> 3. Error-checking: sub-element rules, function argument counts
> Let me know if you want to hear more!
> .................David Marston
>