You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Thimo von Rauchhaupt <Th...@empic.de> on 2007/11/30 14:54:56 UTC

Xerces Unmarshaller bug removing single whitespace

Hello ,

When using Xerces (2.9.0 as well as 2.9.1) for unmarshalling it removes
(from line 101:)

<subjectmark><![CDATA[No specific subject]]></subjectmark>

the single whitespace between "specific" and "subject". In the loaded object
the String value " No specificsubject" can be found.

The strange behavior is, that if I enter some linebreaks obove the last
object tag (question) from 

   </question>
   <question>

To 

   </question>






   <question>

the bug does not occur. Also strange is that the same tag (subjectmark) with
the same value occurs many times in the file, but only this one is parsed
wrongly.

My questions are:
1) Does anybody can tell me if I did something wrong?  
2) Ist his a bug? Can anybody tell me how to report this bug / in which
component? The bug reporting page is awfully complicated to do so. I only
can read old bug reports but no data entry page can be found.

Many thanks in advance, 
Thimo 


P.S.: My java code is:

FileInputStream fis = new FileInputStream(aFileToImport); // is attached
file AnonymizedImport.xml
InputStreamReader isr = new InputStreamReader(fis,
Exporter.DEFAULT_ENCODING); // means UTF8

Unmarshaller tempUnmarshaller = new Unmarshaller();
Mapping tempMapping = new Mapping();
 
tempMapping.loadMapping(Exporter.class.getClassLoader().getResource(Exporter
.XML_MAPPING_FILE)); // see attached file import.xml
tempUnmarshaller.setMapping(tempMapping);
tempUnmarshaller.setDebug(stdlog.isDebugEnabled());
ImportExportBean tempImportBean = (ImportExportBean)
tempUnmarshaller.unmarshal(isr);

AW: Xerces Unmarshaller bug removing single whitespace

Posted by Thimo von Rauchhaupt <Th...@empic.de>.
Hello Michaek,

Thanks a lot for your answer. You´re right (sorry for that). I use
Castor-XML for unmarshalling.
 
And there was the problem: I just tried an older xerces version and
everything was OK. But now I checked a newer Castor and everything is OK,
too. 

-> It was a incompatibility between my old Castor and the new Xerces.

Thanks,
Thimo

--------------------
Thimo von Rauchhaupt
Technical Engineering Director 
/ Sales Europe

EMPIC GmbH, Werner-von-Siemens-Str. 61
CEO: Joerg K. Kottenbrink
91052 Erlangen, Germany
Reg. No: 2873 in Fuerth, Germany
Phone: +49/9131/877 276
Fax: +49/9131/877 265
Mobile: +49/172/23 43 189
Skype: thimo_von_rauchhaupt
eMail: thimo.rauchhaupt@empic.de
http://www.empic.eu

-----Ursprüngliche Nachricht-----
Von: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Gesendet: Freitag, 30. November 2007 15:48
An: j-users@xerces.apache.org
Betreff: Re: Xerces Unmarshaller bug removing single whitespace

Hi Thimo,

There's no such thing as a "Xerces Unmarshaller" so have no idea what
library you're referring to but it certainly doesn't come from this
project. I doubt this is a problem with Xerces. I suspect the Unmarshaller
classes you're using are the source of the odd behaviour possibly because
it's not handling multiple calls to the SAX characters() callback [1]
correctly.

A ContentHandler written like:

private StringBuffer buf;
public void characters(char[] ch, int start, int length)
   throws SAXException {
   buf.append(new String(ch, start, length).trim());
}

would cause whitespace to be dropped from seemingly random points in the
document (like you're seeing).

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandler.h
tml#characters(char[],%20int,%20int)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Thimo von Rauchhaupt" <Th...@empic.de> wrote on 11/30/2007
08:54:56 AM:

> Hello ,
>
> When using Xerces (2.9.0 as well as 2.9.1) for unmarshalling it removes
> (from line 101:)
>
> <subjectmark><![CDATA[No specific subject]]></subjectmark>
>
> the single whitespace between "specific" and "subject". In the loaded
object
> the String value " No specificsubject" can be found.
>
> The strange behavior is, that if I enter some linebreaks obove the last
> object tag (question) from
>
>    </question>
>    <question>
>
> To
>
>    </question>
>
>
>
>
>
>
>    <question>
>
> the bug does not occur. Also strange is that the same tag (subjectmark)
with
> the same value occurs many times in the file, but only this one is parsed
> wrongly.
>
> My questions are:
> 1) Does anybody can tell me if I did something wrong?
> 2) Ist his a bug? Can anybody tell me how to report this bug / in which
> component? The bug reporting page is awfully complicated to do so. I only
> can read old bug reports but no data entry page can be found.
>
> Many thanks in advance,
> Thimo
>
>
> P.S.: My java code is:
>
> FileInputStream fis = new FileInputStream(aFileToImport); // is attached
> file AnonymizedImport.xml
> InputStreamReader isr = new InputStreamReader(fis,
> Exporter.DEFAULT_ENCODING); // means UTF8
>
> Unmarshaller tempUnmarshaller = new Unmarshaller();
> Mapping tempMapping = new Mapping();
>
>
tempMapping.loadMapping(Exporter.class.getClassLoader().getResource(Exporter

> .XML_MAPPING_FILE)); // see attached file import.xml
> tempUnmarshaller.setMapping(tempMapping);
> tempUnmarshaller.setDebug(stdlog.isDebugEnabled());
> ImportExportBean tempImportBean = (ImportExportBean)
> tempUnmarshaller.unmarshal(isr);
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Xerces Unmarshaller bug removing single whitespace

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Thimo,

There's no such thing as a "Xerces Unmarshaller" so have no idea what
library you're referring to but it certainly doesn't come from this
project. I doubt this is a problem with Xerces. I suspect the Unmarshaller
classes you're using are the source of the odd behaviour possibly because
it's not handling multiple calls to the SAX characters() callback [1]
correctly.

A ContentHandler written like:

private StringBuffer buf;
public void characters(char[] ch, int start, int length)
   throws SAXException {
   buf.append(new String(ch, start, length).trim());
}

would cause whitespace to be dropped from seemingly random points in the
document (like you're seeing).

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Thimo von Rauchhaupt" <Th...@empic.de> wrote on 11/30/2007
08:54:56 AM:

> Hello ,
>
> When using Xerces (2.9.0 as well as 2.9.1) for unmarshalling it removes
> (from line 101:)
>
> <subjectmark><![CDATA[No specific subject]]></subjectmark>
>
> the single whitespace between "specific" and "subject". In the loaded
object
> the String value " No specificsubject" can be found.
>
> The strange behavior is, that if I enter some linebreaks obove the last
> object tag (question) from
>
>    </question>
>    <question>
>
> To
>
>    </question>
>
>
>
>
>
>
>    <question>
>
> the bug does not occur. Also strange is that the same tag (subjectmark)
with
> the same value occurs many times in the file, but only this one is parsed
> wrongly.
>
> My questions are:
> 1) Does anybody can tell me if I did something wrong?
> 2) Ist his a bug? Can anybody tell me how to report this bug / in which
> component? The bug reporting page is awfully complicated to do so. I only
> can read old bug reports but no data entry page can be found.
>
> Many thanks in advance,
> Thimo
>
>
> P.S.: My java code is:
>
> FileInputStream fis = new FileInputStream(aFileToImport); // is attached
> file AnonymizedImport.xml
> InputStreamReader isr = new InputStreamReader(fis,
> Exporter.DEFAULT_ENCODING); // means UTF8
>
> Unmarshaller tempUnmarshaller = new Unmarshaller();
> Mapping tempMapping = new Mapping();
>
>
tempMapping.loadMapping(Exporter.class.getClassLoader().getResource(Exporter

> .XML_MAPPING_FILE)); // see attached file import.xml
> tempUnmarshaller.setMapping(tempMapping);
> tempUnmarshaller.setDebug(stdlog.isDebugEnabled());
> ImportExportBean tempImportBean = (ImportExportBean)
> tempUnmarshaller.unmarshal(isr);
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org