You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by ev...@rocketrainer.com on 2002/10/22 14:30:22 UTC

What is the best way to remove XML header and serialize it to String

I have XML as a String and I need only to remove an XML header <?xml 
version="1.0" encoding="UTF-8"?> from this and get XML as a String back.
Of course I can cut this string by second symbol < but there is no sure I 
have XML with header.
So, what is the best way to remove XML header? 

Thanks
Jenya 

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: What is the best way to remove XML header and serialize it to String

Posted by Andy Clark <an...@apache.org>.
Joseph Kesselman wrote:
> Actually, the optional Byte Order Mark may preceed the XML Declaration... 
> but that's the only exception.

Yes, when you're talking about a stream. But the
question was in regards to a bit of XML in a String.
So for the string to be created, the XML would have
already been parsed correctly which takes care of
the byte order mark (BOM).

However, your remark does remind me of a problem
with the UTF-8 reader supplied with Sun's JDK. It
cannot handle the UTF-8 byte order mark correctly.
(Strictly speaking, UTF-8 doesn't need a BOM but
we all know that Microsoft tools add one to UTF-8
files for the ability to auto-detect the encoding.)

Therefore, any UTF-8 stream that contains a BOM
*and* is read by the standard Java UTF-8 reader
would pass the BOM through as another character in
the stream. So this is a problem that should be
handled as well.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: What is the best way to remove XML header and serialize it to String

Posted by Joseph Kesselman <ke...@us.ibm.com>.
On Thursday, 10/24/2002 at 12:09 ZE9, Andy Clark <an...@apache.org> wrote:
> This should be simple to detect and remove. The XMLDecl
> line *must* be the very first sequence of characters in
> the document, if present, as required by the XML spec.

Actually, the optional Byte Order Mark may preceed the XML Declaration... 
but that's the only exception.

 ______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: What is the best way to remove XML header and serialize it to String

Posted by Andy Clark <an...@apache.org>.
evgeniy.strokin@rocketrainer.com wrote:
> I have XML as a String and I need only to remove an XML header <?xml 
> version="1.0" encoding="UTF-8"?> from this and get XML as a String back.
> Of course I can cut this string by second symbol < but there is no sure 
> I have XML with header.
> So, what is the best way to remove XML header?

This should be simple to detect and remove. The XMLDecl
line *must* be the very first sequence of characters in
the document, if present, as required by the XML spec.
So just do something like the following:

   if (s.startsWith("<?xml ")) {
     s = s.substring(s.indexOf("?>")+2);
   }

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: What is the best way to remove XML header and serialize it to String

Posted by Aleksandar Milanovic <am...@galdosinc.com>.
Well, if there is no header then I suppose the first thing in the string is
an element that starts with < and not <?. I am not sure though if <? can
appear in other places. You should check the XML spec.

Alex

-----Original Message-----
From: evgeniy.strokin@rocketrainer.com
[mailto:evgeniy.strokin@rocketrainer.com]
Sent: October 22, 2002 5:30 AM
To: xerces-j-user@xml.apache.org
Subject: What is the best way to remove XML header and serialize it to
String


I have XML as a String and I need only to remove an XML header <?xml
version="1.0" encoding="UTF-8"?> from this and get XML as a String back.
Of course I can cut this string by second symbol < but there is no sure I
have XML with header.
So, what is the best way to remove XML header?

Thanks
Jenya

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org