You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Assaf Arkin <ar...@intalio.com> on 2000/08/22 06:41:17 UTC

Re: Newline bug in BaseMarkupSerializer + a fix

How do you end up getting \r\n inside the document?

A parser is supposed to parse the document so that every \r\n pair
converts to a single \n. It's perfectly legal to have a \r or a \n in
the document, but an \r\n is really an \r\r\n in the original text file.

The serializer might be pushing it a bit by printing a new line for a
\r, but definitely it should not attempt to consolidate \r\n. That would
violate the XML 1.0 specification.

arkin


mdusseault@home.com wrote:
> 
> Hi,
> 
> I recently hit a bug in the way newlines are handled in the
> BaseMarkupSerializer class.  What was happenning is that the
> newlines were getting doubled.  If there were 2 linebreaks,
> I'd get 4.
> 
> The lines in the text were terminated with the braindead
> DOS end of line (I *hate* that!).  I've isolated it to this
> part of the BaseMarkupSerializer class (in 1.1.1 dist source -
> hope you don't mind my reformatting of the code in this message):
> 
> for ( index = 0 ; index < text.length() ; ++index ) {
>   ch = text.charAt( index );
>   if ( ch == '\n' || ch == '\r' )
>     _printer.breakLine( true );
>   else if ( unescaped )
>     _printer.printText( ch );
>   else
>     printEscaped( ch );
> }
> 
> The problem is at line number 1219 (of 1.1.1 dist).  It should
> be obvious by now that it will generate a newline for every
> \n and \r char, which means any \r\n combo will end up creating
> two linebreaks.
> 
> I'm thinking about the best way to fix it...  I figure
> something like making sure there isn't a \r in front of a \n
> before printing a newline might be the best way.  Hopefully
> nothing ever puts them in the reverse order...
> 
> Something like this seems to work:
> 
> --- snip ---
> char last_ch = 0;
> 
> if ( preserveSpace ) {
>   // Preserving spaces: the text must print exactly as it is,
>   // without breaking when spaces appear in the text and without
>   // consolidating spaces. If a line terminator is used, a line
>   // break will occur.
>   for ( index = 0 ; index < text.length() ; ++index ) {
>     ch = text.charAt( index );
>     if ( ch == '\n' || ch == '\r' )
>       if (((ch == '\n') && (last_ch != '\r')) || (ch == '\r'))
>         _printer.breakLine( true );
>     else if ( unescaped )
>       _printer.printText( ch );
>     else
>       printEscaped( ch );
>     last_ch = ch;
>   }
> --- snip ---
> 
> There's probably a more elegant way to fix it, but I'm in the
> middle of beta testing and I'm *really* busy.  Since it works
> and I can continue with my testing, I'll leave it for now.
> 
> I wouldn't be surprised if this is actually present in other
> places too.  I see a few similar lines in the same file which
> probably trigger the problem.
> 
> P.S. You guys/gals rock!  Thanks for the nice tools.  Every time I
> find and fix a bug in open code like this, I make sure to remind my
> boss how much trouble the bug would have been if I didn't have the
> source code... :-)
> 
> Later,
> 
> Mike.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                            www.intalio.com
CTO, Intalio Inc.                                       www.exolab.org