You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Praveen Peddi <pp...@contextmedia.com> on 2004/07/13 17:35:57 UTC

Best way to read non-utf xml documents

I have input xml files in "windows-1252" encoding and I have to convert these into utf-8 format and send to server (server assumes that all input xml files are utf-8 encoded). When I read the files and output in utf-8 encoding, I am losing some special characters like registered marks, copy right etc.

I am reading the file in OS native encoding and outputting in utf-8 encoding (by not specifying any encoding for input stream).

Whats the best way to read non-utf8 encoded xml files and output in utf-8 encoding.

Any help would be appreciated...


Thanks
Praveen

************************************************************** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:ppeddi@contextmedia.com 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
************************************************************** 
Context Media- "The Leader in Enterprise Content Integration" 

Re: Best way to read non-utf xml documents

Posted by Suresh Babu Koya <sk...@in-reality.com>.
The streams are encoded based on the System property "file.encoding". 
Use the Xerces serializer API to compose the XML and set the encoding.

/Suresh

  ----- Original Message ----- 
  From: Praveen Peddi 
  To: xerces-j-user@xml.apache.org 
  Sent: Tuesday, July 13, 2004 9:05 PM
  Subject: Best way to read non-utf xml documents


  I have input xml files in "windows-1252" encoding and I have to convert these into utf-8 format and send to server (server assumes that all input xml files are utf-8 encoded). When I read the files and output in utf-8 encoding, I am losing some special characters like registered marks, copy right etc.

  I am reading the file in OS native encoding and outputting in utf-8 encoding (by not specifying any encoding for input stream).

  Whats the best way to read non-utf8 encoded xml files and output in utf-8 encoding.

  Any help would be appreciated...


  Thanks
  Praveen

  ************************************************************** 
  Praveen Peddi
  Sr Software Engg, Context Media, Inc. 
  email:ppeddi@contextmedia.com 
  Tel:  401.854.3475 
  Fax:  401.861.3596 
  web: http://www.contextmedia.com 
  ************************************************************** 
  Context Media- "The Leader in Enterprise Content Integration"