You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by s7...@netscape.net on 2008/01/23 12:46:41 UTC

Output a new line after the XML declaration using indent="yes"

[resending my original message as it didn't appear in the list, 
trying out 4 times.]

Using the example serialization code (see at the end) and the 
built-in Sun's Java 1.4 JAXP implementation I get a result file:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
   <para>foo bar</para>
</doc>

However when I plug-in Xalan 2.7.1 I get a result file:

<?xml version="1.0" encoding="UTF-8"?><doc>
   <para>foo bar</para>
</doc>

Is there a way to make the document element appear on a new line 
after the XML declaration when using the indent="yes" output option?

-----XMLSerializationTest.java
import java.io.File;
import java.io.FileOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.AttributesImpl;

public class XMLSerializationTest
{

     static final String XALAN_INDENT_AMOUNT =
             "{http://xml.apache.org/xslt}" + "indent-amount";

     public static void main(String[] args) throws Exception
     {
         File resultFile = new File("test.xml");

         SAXTransformerFactory stf = (SAXTransformerFactory)
                 TransformerFactory.newInstance();
         TransformerHandler handler = stf.newTransformerHandler();
         Transformer transformer = handler.getTransformer();
         transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
         transformer.setOutputProperty(OutputKeys.INDENT, "yes");
         transformer.setOutputProperty(XALAN_INDENT_AMOUNT, "2");
         handler.setResult(new StreamResult(
                 new FileOutputStream(resultFile)));

         Attributes noAtts = new AttributesImpl();
         String text = "foo bar";
         handler.startDocument();
         handler.startElement("", "", "doc", noAtts);
         handler.startElement("", "", "para", noAtts);
         handler.characters(text.toCharArray(), 0, text.length());
         handler.endElement("", "", "para");
         handler.endElement("", "", "doc");
         handler.endDocument();

         System.out.println("Done.");
     }

}
-----XMLSerializationTest.java--

-- 
Stanimir

Re: Output a new line after the XML declaration using indent="yes"

Posted by Jörg Hohwiller <jo...@j-hohwiller.de>.
Hi Dave,

thanks for your response...

> > 
> > 2. Having a newline after XML-declaration as well
> > as at the end of the file is a common need that
> > JAXP users have. I can NOT accept your point
> > saying that this is generally NOT supported because
> > it could cause some trouble you may have implementing this.
> I find the tone of your post to be offensive.  No one on this list owes 
> you anything in particular, and I suggest you be more polite in future 
> posts.

Sorry for that. Of course nobody owes me anything.
I just ran into this problem and read this thread.
This gave me the impression that a user demand
-that is obvious to me- is not really seen here (I am
talking about my impression). 
Besides English is not my native language so I do
not know if there is something very offensive in my words.
I just wanted to make my personal position very clear.
Besides I tended to always write "NOT" capitalized,
because I often read over this word and get the wrong 
message. A friend explained me that this is no good style
here because it overemphasizes what I wanted to say...

> > This has a large impact after everybody has to eat this
> > when just using a plain JDK. It is a lot worse if people
> > start adding newlines manually to OutputStream,
> > because this will cause real trouble with encodings, especially
> > if the encoding is NOT 8-bit-wise (e.g. UTF-16) and a newlines
> > get broken because hackers of such workaround do NOT think
> > of such problems.
> Any XML parser that requires a newline between the XML declaration and 
> the root element, or after the root element, is non-conforming and 
> should be fixed.

Absolutely true. But XML is also read by humans.
I wrote an open-source tool (maven-plugin) that modifies 
XML that can be handwritten. Users tell me that there is a
bug in my product because newlines get lost.
I think they are right but my problem is that
I see the main problem in xalan-j.

> > 
> > Please provide a solution here.
> You can always post-process the document and 
> add the new line characters.

Of course I can do that. I already wrote about this option
before and the pitfalls one can run into with encodings, etc.
I was choosing Xalan-J because I used JAXP
and did not want to add thirdparty libs (like XOM, JDOM or 
Dom4j). But if I need such workaround I'd better use one
of those external libs instead.

Besides Xalan-J also stips a newline between XML-Comments
and the root-tag. I can not simply workaround this problem without
reimplementing an XML-Writer or just using an other product.

> Dave

Regards
  Jörg


-- 
View this message in context: http://www.nabble.com/Output-a-new-line-after-the-XML-declaration-using-indent%3D%22yes%22-tp15040090p24129270.html
Sent from the Xalan - J - Users mailing list archive at Nabble.com.


Re: Output a new line after the XML declaration using indent="yes"

Posted by David Bertoni <db...@apache.org>.
Jörg Hohwiller wrote:
> Hi there,
> 
> 
> Brian Minchau wrote:
>> Hi Stanimir.
>>
>> The Xalan
>> serializer doesn't know about whether the serialized XML will
>> be used in the the future as an external general parsed
>> entity and included in yet another XML file.
>>
>> It is possible that the XML will be included next to a text node that
>> is not all whitespace and the extra whitespace that we inject after the
>> XML header would be included next to non-whitespace
>> text and become part of that text node, modifying it.
>>
>> Extra whitespace added for indentation is done in ignorable locations,
>> but this particular one (just after the header) might not be ignored.
>>
>> Added indentation or extra whitespace before the document element
>> is not always correct, so Xalan doesn't do it.
>>
>> There is no Xalan specific option to control this behavior.
>>
>> - Brian
>>
> 
> 1. Could you please give an example or a link to the
> XML-specification to point out why a newline after
> XML-declation or at the end of the file should make the 
> XML illegal. I am NOT talking about your problems
> to implement this properly, but why
> <?xml ....?>^n
> <root>...</root>^n
> should be illegal!?!?
Brian didn't say it would make the result "illegal."  He simply said 
that it would modify the content of the result inappropriately.  The 
processor generates a external general parsed entity, and a newline 
between the XML declaration would introduce whitespace into the content 
of the entity.

> 
> 2. Having a newline after XML-declaration as well
> as at the end of the file is a common need that
> JAXP users have. I can NOT accept your point
> saying that this is generally NOT supported because
> it could cause some trouble you may have implementing this.
I find the tone of your post to be offensive.  No one on this list owes 
you anything in particular, and I suggest you be more polite in future 
posts.

> This has a large impact after everybody has to eat this
> when just using a plain JDK. It is a lot worse if people
> start adding newlines manually to OutputStream,
> because this will cause real trouble with encodings, especially
> if the encoding is NOT 8-bit-wise (e.g. UTF-16) and a newlines
> get broken because hackers of such workaround do NOT think
> of such problems.
Any XML parser that requires a newline between the XML declaration and 
the root element, or after the root element, is non-conforming and 
should be fixed.

> 
> Please provide a solution here.
You can always post-process the document and add the new line characters.

Dave

Re: Output a new line after the XML declaration using indent="yes"

Posted by Jörg Hohwiller <jo...@j-hohwiller.de>.
Hi there,


Brian Minchau wrote:
> 
> Hi Stanimir.
> 
> The Xalan
> serializer doesn't know about whether the serialized XML will
> be used in the the future as an external general parsed
> entity and included in yet another XML file.
> 
> It is possible that the XML will be included next to a text node that
> is not all whitespace and the extra whitespace that we inject after the
> XML header would be included next to non-whitespace
> text and become part of that text node, modifying it.
> 
> Extra whitespace added for indentation is done in ignorable locations,
> but this particular one (just after the header) might not be ignored.
> 
> Added indentation or extra whitespace before the document element
> is not always correct, so Xalan doesn't do it.
> 
> There is no Xalan specific option to control this behavior.
> 
> - Brian
> 

1. Could you please give an example or a link to the
XML-specification to point out why a newline after
XML-declation or at the end of the file should make the 
XML illegal. I am NOT talking about your problems
to implement this properly, but why
<?xml ....?>^n
<root>...</root>^n
should be illegal!?!?

2. Having a newline after XML-declaration as well
as at the end of the file is a common need that
JAXP users have. I can NOT accept your point
saying that this is generally NOT supported because
it could cause some trouble you may have implementing this.
This has a large impact after everybody has to eat this
when just using a plain JDK. It is a lot worse if people
start adding newlines manually to OutputStream,
because this will cause real trouble with encodings, especially
if the encoding is NOT 8-bit-wise (e.g. UTF-16) and a newlines
get broken because hackers of such workaround do NOT think
of such problems.

Please provide a solution here.

Thanks
  Jörg


-- 
View this message in context: http://www.nabble.com/Output-a-new-line-after-the-XML-declaration-using-indent%3D%22yes%22-tp15040090p24017219.html
Sent from the Xalan - J - Users mailing list archive at Nabble.com.


Re: Output a new line after the XML declaration using indent="yes"

Posted by s7...@netscape.net.
Wed, 23 Jan 2008 11:10:44 -0500, /Brian Minchau/:

> The Xalan 
> serializer doesn't know about whether the serialized XML will 
> be used in the the future as an external general parsed 
> entity and included in yet another XML file.
[...]
> There is no Xalan specific option to control this behavior.

I see.  Still I think it would be nice if the user could explicitly 
control this behavior.  As a workaround I've made it output the XML 
declaration and a new line manually:

         SAXTransformerFactory stf = (SAXTransformerFactory)
                 TransformerFactory.newInstance();
         TransformerHandler handler = stf.newTransformerHandler();
         Transformer transformer = handler.getTransformer();
         ...
         transformer.setOutputProperty(OutputKeys
                 .OMIT_XML_DECLARATION, "yes");
         ...
         OutputStream out = ...;
         handler.setResult(new StreamResult(out));

         String xmlDecl =
                 "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
                 + System.getProperty("line.separator");
         out.write(xmlDecl.getBytes("US-ASCII"));

         handler.startDocument();
         ...

-- 
Stanimir

Re: Output a new line after the XML declaration using indent="yes"

Posted by Brian Minchau <mi...@ca.ibm.com>.
Hi Stanimir.

The Xalan
serializer doesn't know about whether the serialized XML will
be used in the the future as an external general parsed
entity and included in yet another XML file.

It is possible that the XML will be included next to a text node that
is not all whitespace and the extra whitespace that we inject after the
XML header would be included next to non-whitespace
text and become part of that text node, modifying it.

Extra whitespace added for indentation is done in ignorable locations,
but this particular one (just after the header) might not be ignored.

Added indentation or extra whitespace before the document element
is not always correct, so Xalan doesn't do it.

There is no Xalan specific option to control this behavior.

- Brian
- - - - - - - - - - - - - - - - - - - -
Brian Minchau, Ph.D.
XSLT Development, IBM Toronto
(780) 431-2633
e-mail:        minchau@ca.ibm.com



                                                                           
             s7an10@netscape.n                                             
             et                                                            
                                                                        To 
             01/23/2008 06:46          xalan-j-users@xml.apache.org        
             AM                                                         cc 
                                                                           
                                                                   Subject 
                                       Output a new line after the XML     
                                       declaration using indent="yes"      
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




[resending my original message as it didn't appear in the list,
trying out 4 times.]

Using the example serialization code (see at the end) and the
built-in Sun's Java 1.4 JAXP implementation I get a result file:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
   <para>foo bar</para>
</doc>

However when I plug-in Xalan 2.7.1 I get a result file:

<?xml version="1.0" encoding="UTF-8"?><doc>
   <para>foo bar</para>
</doc>

Is there a way to make the document element appear on a new line
after the XML declaration when using the indent="yes" output option?

-----XMLSerializationTest.java
import java.io.File;
import java.io.FileOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.AttributesImpl;

public class XMLSerializationTest
{

     static final String XALAN_INDENT_AMOUNT =
             "{http://xml.apache.org/xslt}" + "indent-amount";

     public static void main(String[] args) throws Exception
     {
         File resultFile = new File("test.xml");

         SAXTransformerFactory stf = (SAXTransformerFactory)
                 TransformerFactory.newInstance();
         TransformerHandler handler = stf.newTransformerHandler();
         Transformer transformer = handler.getTransformer();
         transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
         transformer.setOutputProperty(OutputKeys.INDENT, "yes");
         transformer.setOutputProperty(XALAN_INDENT_AMOUNT, "2");
         handler.setResult(new StreamResult(
                 new FileOutputStream(resultFile)));

         Attributes noAtts = new AttributesImpl();
         String text = "foo bar";
         handler.startDocument();
         handler.startElement("", "", "doc", noAtts);
         handler.startElement("", "", "para", noAtts);
         handler.characters(text.toCharArray(), 0, text.length());
         handler.endElement("", "", "para");
         handler.endElement("", "", "doc");
         handler.endDocument();

         System.out.println("Done.");
     }

}
-----XMLSerializationTest.java--

--
Stanimir