You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Andy Clark <an...@apache.org> on 2000/01/13 01:59:01 UTC

Serializers: Do They Work?

I looked at the org.apache.xml.serialize classes for the first
time today and I can't seem to get a simple XMLSerializer
example to work. I keep getting NullPointerExceptions all over
the place. 

Has anyone else seen these problems? Is it a usage problem?
Perhaps we should have part of the documentation talk about 
how to use the new serializer APIs. Anybody interested in
writing those docs?

Also, I think I saw some bugs in the code. But I'd like to
get an answer on whether I'm using them correctly before I
start fixing bugs in the source.

I've included the program below as well as the stack trace of
the exception.

// BEGIN Test.java
package test;

import org.apache.xerces.parsers.DOMParser;

import org.apache.xml.serialize.Method;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.Serializer;
import org.apache.xml.serialize.SerializerFactory;
import org.apache.xml.serialize.XMLSerializer;

import org.w3c.dom.Document;

public class Test {

    public static void main(String argv[]) throws Exception {

        DOMParser parser = new DOMParser();
        SerializerFactory factory = SerializerFactory.getSerializerFactory(Method.XML);

        for (int i = 0; i < argv.length; i++) {
            final String arg = argv[i];
            System.err.println("argv["+i+"]: "+arg);
            parser.parse(arg);
            Document document = parser.getDocument();
            OutputFormat format = new OutputFormat(document, "UTF-8", false);
            Serializer serializer = factory.makeSerializer(format);
            serializer.asDOMSerializer().serialize(document);
        }

    }

}
// END Test.java

// BEGIN personal.xml
<?xml version="1.0"?>
<!DOCTYPE personnel SYSTEM "personal.dtd">
<?xml-stylesheet type='text/xsl' href='personal.xsl'?>
<personnel>
  <person id="H.MARUYAMA" >
    <name><family>MARUYAMA</family> <given>Hiroshi</given></name>
    <email>maruyama@jp.ibm.com</email>
    <link subordinates="  N.URAMOTO    K.TAMURA "/>
  </person>
  <person id="N.URAMOTO">
    <name><family>URAMOTO</family> <given>Naohiko</given></name>
    <email>uramoto@jp.ibm.com</email>
    <link manager="		H.MARUYAMA"/>
  </person>
  <person id="K.TAMURA">
    <name>
      <family>TAMURA</family> <given>Kent</given>
    </name>
    <!-- This URL is mail address.-->
    <url href="mailto:kent@trl.ibm.co.jp"/>
    <url href="mailto:tkent@jp.ibm.com"/>
    <link manager="H.MARUYAMA"/>
  </person>
</personnel>
// END personal.xml

// BEGIN personal.dtd
<?xml encoding="US-ASCII"?>

<!ELEMENT personnel (person)+>
<!ATTLIST person note CDATA #IMPLIED
		 contr (true|false) "false"
                 id ID #REQUIRED>
<!ELEMENT person (name,email*,url*,link?)>
<!ELEMENT family (#PCDATA)>
<!ELEMENT given (#PCDATA)>
<!ELEMENT name (family,given?)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT url EMPTY>
<!ATTLIST url href CDATA "http://">
<!ELEMENT link EMPTY>
<!ATTLIST link
  manager IDREF #IMPLIED
  subordinates IDREFS #IMPLIED>

<!NOTATION gif SYSTEM "photoshop.exe">

<!ENTITY internal-entity "entity-value">
<!ENTITY external-entity PUBLIC "-//ENTITY" "external-entity.ent">
<!ENTITY unparsed-entity SYSTEM "unparsed-entity.ent" NDATA gif>
// END personal.dtd

// BEGIN Stack Trace
Exception in thread "main" java.lang.NullPointerException:
        at org.apache.xml.serialize.BaseMarkupSerializer.printDoctypeURL(BaseMarkupSerializer.java:1261)
        at org.apache.xml.serialize.BaseMarkupSerializer.unparsedEntityDecl(BaseMarkupSerializer.java:638)
        at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:773)
        at org.apache.xml.serialize.BaseMarkupSerializer.serialize(BaseMarkupSerializer.java:385)
        at test.Test.main(Test.java:27)
// END Stack Trace

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
For the public/system Id, I have to agree with you 100%. Can you add
your reflection code to both BaseMarkupSerializer and OutputFormat.

For the constructor there are two possibilities:

1. Document the behavior. That is already the case.

2. Eliminate the constructor, preventing the possiblity of getting the
serializer in an inconsistent state.

As far as I can recall there was no such constructor in the original
code base, but at least two people asked for it, so I added it. I never
use it myself. It's a +1 on my side to drop it.

arkin




Andy Clark wrote:
> 
> Assaf Arkin wrote:
> > They have no right to complain, but they will and I don't want to answer
> > them. Throwing IllegalStateException solves that in an elegant way.
> 
> I would disagree with this solution as being elegant. My
> personal opinion is that objects should not start off in an
> unusable state. Requiring the programmer to have explicit
> knowledge of what needs to be done in order to get the
> object into a usable state is asking for trouble. Also, the
> default action of throwing an exception is a problem because
> you can't know that your program is in a usable state until
> runtime. While we can't perform compile time checks, we can
> make the object safe by default construction.
> 
> > I was rather hoping that people will use the OutputFormat to do that for
> > the time being. OutputFormat will also take the system/public ids from
> > the document type in DOM Level 2.
> 
> Yes, but if you assume that the DOM tree that you are
> handed is DOM Level 2 and attempt to call the method (when
> it's really a DOM Level 1 implementation), you will get a
> runtime exception thrown by the VM. At least the reflection
> makes it work before (and after) the W3C interfaces become
> standard and while they're being adopted.
> 
> > I don't think reflection is a good way. Besides I can always cast it to
> > the proper Xerces type as a temporary solution.
> 
> Should we assume that the DOM tree handed to the serializers
> is always going to be from Xerces? I would vote for allowing
> everyone to use our serializers, regardless of whether they
> use our DOM implementation.
> 
> > Because it has no way of knowing what is external and what is internal.
> > The DOM serializer simply dumps everything. The only solution is for the
> > DOM serializer to not print anything, which is what I think I'll do
> > after I get it to print correctly.
> 
> I agree with not printing anything.
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Serializers: Do They Work?

Posted by Andy Clark <an...@apache.org>.
Assaf Arkin wrote:
> They have no right to complain, but they will and I don't want to answer
> them. Throwing IllegalStateException solves that in an elegant way.

I would disagree with this solution as being elegant. My
personal opinion is that objects should not start off in an
unusable state. Requiring the programmer to have explicit
knowledge of what needs to be done in order to get the
object into a usable state is asking for trouble. Also, the
default action of throwing an exception is a problem because
you can't know that your program is in a usable state until
runtime. While we can't perform compile time checks, we can
make the object safe by default construction.

> I was rather hoping that people will use the OutputFormat to do that for
> the time being. OutputFormat will also take the system/public ids from
> the document type in DOM Level 2.

Yes, but if you assume that the DOM tree that you are 
handed is DOM Level 2 and attempt to call the method (when
it's really a DOM Level 1 implementation), you will get a 
runtime exception thrown by the VM. At least the reflection 
makes it work before (and after) the W3C interfaces become 
standard and while they're being adopted.

> I don't think reflection is a good way. Besides I can always cast it to
> the proper Xerces type as a temporary solution.

Should we assume that the DOM tree handed to the serializers
is always going to be from Xerces? I would vote for allowing
everyone to use our serializers, regardless of whether they
use our DOM implementation.

> Because it has no way of knowing what is external and what is internal.
> The DOM serializer simply dumps everything. The only solution is for the
> DOM serializer to not print anything, which is what I think I'll do
> after I get it to print correctly.

I agree with not printing anything.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
> I don't think that people have a right to complain that they're
> seeing output on the console if they never set the output to
> where they want it! So I agree with Keith on this one.

They have no right to complain, but they will and I don't want to answer
them. Throwing IllegalStateException solves that in an elegant way.


> Doing some more work, I added some code to enable printing of
> the DOCTYPE line when serializing the DOM tree. Since the methods
> for determing the public and system ids on the DocumentType node
> aren't included until DOM Level 2 *and* DOM Level 2 is not a
> standard, yet, I used reflection to see if the implementation
> already had the appropriate methods to invoke. In this way we
> don't have to just pass in null to startDTD() which always has
> the effect of not printing the DOCTYPE line.

I was rather hoping that people will use the OutputFormat to do that for
the time being. OutputFormat will also take the system/public ids from
the document type in DOM Level 2.

I don't think reflection is a good way. Besides I can always cast it to
the proper Xerces type as a temporary solution.


> So now that the DOCTYPE line is printing, I'm wondering why is
> the serializer printing entities and notations in the internal
> subset? In my sample, these are declared in the external DTD
> and should not be duplicated in the internal subset. Unfortunately,
> we still don't have a way of determining if entities and notations
> are internal or external so I'm undecided on what the default
> behavior in this case should be. Any thoughts? preferences?

Because it has no way of knowing what is external and what is internal.
The DOM serializer simply dumps everything. The only solution is for the
DOM serializer to not print anything, which is what I think I'll do
after I get it to print correctly.

> 
> In any case, the patch for the following changes is included
> below. If don't mind me making changes to your serializer code
> directly, I'll go ahead and make future bug fixes straight in
> CVS. Otherwise, I'll continue posting patches. Just let me know.
> 
>   Change 0: Removed debugging printlns left in code.
>   Change 1: Added ability to determine public & system ids
>   Change 2: Removed printing of entities and notation decls
>             in the internal subset.

I'm fixing some stuff right now, already did 0, will do 1 and 2 and
commit as soon as it's working.

arkin



> 
> // BEGIN Patch
> Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
> ===================================================================
> RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
> retrieving revision 1.3
> diff -c -r1.3 BaseMarkupSerializer.java
> *** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
> --- BaseMarkupSerializer.java   2000/01/14 23:51:31
> ***************
> *** 705,712 ****
>       {
>         // Only works if we're going out of DTD mode.
>         if ( _writer == _dtdWriter ) {
> - System.out.println( "Writer " + _writer );
> - System.out.println( "DocWriter " + _docWriter );
>             _line.append( _text );
>             _text = new StringBuffer( 20 );
>             flushLine( false );
> --- 705,710 ----
> ***************
> *** 776,797 ****
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !               startDTD( docType.getName(), null, null );
> !               map = docType.getEntities();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       entity = (Entity) map.item( i );
> !                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
> !                                           entity.getSystemId(), entity.getNotationName() );
> !                   }
> !               }
> !               map = docType.getNotations();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       notation = (Notation) map.item( i );
> !                       notationDecl( notation.getNodeName(), notation.getPublicId(), notation.getSystemId() );
> !                   }
> !               }
>                 endDTD();
>             }
>             // !! Fall through
> --- 774,803 ----
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !             Class docTypeClass = docType.getClass();
> !
> !             String docTypePublicId = null;
> !             String docTypeSystemId = null;
> !             try {
> !                 java.lang.reflect.Method getPublicId = docTypeClass.getMethod("getPublicId", null);
> !                 if (getPublicId.getReturnType().equals(String.class)) {
> !                     docTypePublicId = (String)getPublicId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !             try {
> !                 java.lang.reflect.Method getSystemId = docTypeClass.getMethod("getSystemId", null);
> !                 if (getSystemId.getReturnType().equals(String.class)) {
> !                     docTypeSystemId = (String)getSystemId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !
> !               startDTD( docType.getName(), docTypePublicId, docTypeSystemId );
>                 endDTD();
>             }
>             // !! Fall through
> // END Patch
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
Andy,

Grab it from the CVS let me know if it works. And thanks for helping me
figure out how to solve this one.

arkin


Andy Clark wrote:
> 
> Assaf Arkin wrote:
> > Keith also suggested I use System.out by default, but then I'll get
> > people complaining of seeing stuff in the console instead of the output
> > file.
> 
> I don't think that people have a right to complain that they're
> seeing output on the console if they never set the output to
> where they want it! So I agree with Keith on this one.
> 
> > You're prefectly right on the entity side, I'll try to get your patch in
> > and update the CVS.
> 
> Doing some more work, I added some code to enable printing of
> the DOCTYPE line when serializing the DOM tree. Since the methods
> for determing the public and system ids on the DocumentType node
> aren't included until DOM Level 2 *and* DOM Level 2 is not a
> standard, yet, I used reflection to see if the implementation
> already had the appropriate methods to invoke. In this way we
> don't have to just pass in null to startDTD() which always has
> the effect of not printing the DOCTYPE line.
> 
> So now that the DOCTYPE line is printing, I'm wondering why is
> the serializer printing entities and notations in the internal
> subset? In my sample, these are declared in the external DTD
> and should not be duplicated in the internal subset. Unfortunately,
> we still don't have a way of determining if entities and notations
> are internal or external so I'm undecided on what the default
> behavior in this case should be. Any thoughts? preferences?
> 
> In any case, the patch for the following changes is included
> below. If don't mind me making changes to your serializer code
> directly, I'll go ahead and make future bug fixes straight in
> CVS. Otherwise, I'll continue posting patches. Just let me know.
> 
>   Change 0: Removed debugging printlns left in code.
>   Change 1: Added ability to determine public & system ids
>   Change 2: Removed printing of entities and notation decls
>             in the internal subset.
> 
> // BEGIN Patch
> Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
> ===================================================================
> RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
> retrieving revision 1.3
> diff -c -r1.3 BaseMarkupSerializer.java
> *** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
> --- BaseMarkupSerializer.java   2000/01/14 23:51:31
> ***************
> *** 705,712 ****
>       {
>         // Only works if we're going out of DTD mode.
>         if ( _writer == _dtdWriter ) {
> - System.out.println( "Writer " + _writer );
> - System.out.println( "DocWriter " + _docWriter );
>             _line.append( _text );
>             _text = new StringBuffer( 20 );
>             flushLine( false );
> --- 705,710 ----
> ***************
> *** 776,797 ****
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !               startDTD( docType.getName(), null, null );
> !               map = docType.getEntities();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       entity = (Entity) map.item( i );
> !                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
> !                                           entity.getSystemId(), entity.getNotationName() );
> !                   }
> !               }
> !               map = docType.getNotations();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       notation = (Notation) map.item( i );
> !                       notationDecl( notation.getNodeName(), notation.getPublicId(), notation.getSystemId() );
> !                   }
> !               }
>                 endDTD();
>             }
>             // !! Fall through
> --- 774,803 ----
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !             Class docTypeClass = docType.getClass();
> !
> !             String docTypePublicId = null;
> !             String docTypeSystemId = null;
> !             try {
> !                 java.lang.reflect.Method getPublicId = docTypeClass.getMethod("getPublicId", null);
> !                 if (getPublicId.getReturnType().equals(String.class)) {
> !                     docTypePublicId = (String)getPublicId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !             try {
> !                 java.lang.reflect.Method getSystemId = docTypeClass.getMethod("getSystemId", null);
> !                 if (getSystemId.getReturnType().equals(String.class)) {
> !                     docTypeSystemId = (String)getSystemId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !
> !               startDTD( docType.getName(), docTypePublicId, docTypeSystemId );
>                 endDTD();
>             }
>             // !! Fall through
> // END Patch
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
Apparently docType.getSystemId()/getPublicId() is working, so no need
for the reflection. Ultra cool.

arkin


Andy Clark wrote:
> 
> Assaf Arkin wrote:
> > Keith also suggested I use System.out by default, but then I'll get
> > people complaining of seeing stuff in the console instead of the output
> > file.
> 
> I don't think that people have a right to complain that they're
> seeing output on the console if they never set the output to
> where they want it! So I agree with Keith on this one.
> 
> > You're prefectly right on the entity side, I'll try to get your patch in
> > and update the CVS.
> 
> Doing some more work, I added some code to enable printing of
> the DOCTYPE line when serializing the DOM tree. Since the methods
> for determing the public and system ids on the DocumentType node
> aren't included until DOM Level 2 *and* DOM Level 2 is not a
> standard, yet, I used reflection to see if the implementation
> already had the appropriate methods to invoke. In this way we
> don't have to just pass in null to startDTD() which always has
> the effect of not printing the DOCTYPE line.
> 
> So now that the DOCTYPE line is printing, I'm wondering why is
> the serializer printing entities and notations in the internal
> subset? In my sample, these are declared in the external DTD
> and should not be duplicated in the internal subset. Unfortunately,
> we still don't have a way of determining if entities and notations
> are internal or external so I'm undecided on what the default
> behavior in this case should be. Any thoughts? preferences?
> 
> In any case, the patch for the following changes is included
> below. If don't mind me making changes to your serializer code
> directly, I'll go ahead and make future bug fixes straight in
> CVS. Otherwise, I'll continue posting patches. Just let me know.
> 
>   Change 0: Removed debugging printlns left in code.
>   Change 1: Added ability to determine public & system ids
>   Change 2: Removed printing of entities and notation decls
>             in the internal subset.
> 
> // BEGIN Patch
> Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
> ===================================================================
> RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
> retrieving revision 1.3
> diff -c -r1.3 BaseMarkupSerializer.java
> *** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
> --- BaseMarkupSerializer.java   2000/01/14 23:51:31
> ***************
> *** 705,712 ****
>       {
>         // Only works if we're going out of DTD mode.
>         if ( _writer == _dtdWriter ) {
> - System.out.println( "Writer " + _writer );
> - System.out.println( "DocWriter " + _docWriter );
>             _line.append( _text );
>             _text = new StringBuffer( 20 );
>             flushLine( false );
> --- 705,710 ----
> ***************
> *** 776,797 ****
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !               startDTD( docType.getName(), null, null );
> !               map = docType.getEntities();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       entity = (Entity) map.item( i );
> !                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
> !                                           entity.getSystemId(), entity.getNotationName() );
> !                   }
> !               }
> !               map = docType.getNotations();
> !               if ( map != null ) {
> !                   for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       notation = (Notation) map.item( i );
> !                       notationDecl( notation.getNodeName(), notation.getPublicId(), notation.getSystemId() );
> !                   }
> !               }
>                 endDTD();
>             }
>             // !! Fall through
> --- 774,803 ----
>             // serialize it.
>             docType = ( (Document) node ).getDoctype();
>             if ( docType != null ) {
> !             Class docTypeClass = docType.getClass();
> !
> !             String docTypePublicId = null;
> !             String docTypeSystemId = null;
> !             try {
> !                 java.lang.reflect.Method getPublicId = docTypeClass.getMethod("getPublicId", null);
> !                 if (getPublicId.getReturnType().equals(String.class)) {
> !                     docTypePublicId = (String)getPublicId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !             try {
> !                 java.lang.reflect.Method getSystemId = docTypeClass.getMethod("getSystemId", null);
> !                 if (getSystemId.getReturnType().equals(String.class)) {
> !                     docTypeSystemId = (String)getSystemId.invoke(docType, null);
> !                 }
> !             }
> !             catch (Exception e) {
> !                 // ignore
> !             }
> !
> !               startDTD( docType.getName(), docTypePublicId, docTypeSystemId );
>                 endDTD();
>             }
>             // !! Fall through
> // END Patch
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Serializers: Do They Work?

Posted by Andy Clark <an...@apache.org>.
Assaf Arkin wrote:
> Keith also suggested I use System.out by default, but then I'll get
> people complaining of seeing stuff in the console instead of the output
> file.

I don't think that people have a right to complain that they're
seeing output on the console if they never set the output to
where they want it! So I agree with Keith on this one.

> You're prefectly right on the entity side, I'll try to get your patch in
> and update the CVS.

Doing some more work, I added some code to enable printing of
the DOCTYPE line when serializing the DOM tree. Since the methods
for determing the public and system ids on the DocumentType node
aren't included until DOM Level 2 *and* DOM Level 2 is not a
standard, yet, I used reflection to see if the implementation
already had the appropriate methods to invoke. In this way we
don't have to just pass in null to startDTD() which always has
the effect of not printing the DOCTYPE line.

So now that the DOCTYPE line is printing, I'm wondering why is 
the serializer printing entities and notations in the internal
subset? In my sample, these are declared in the external DTD
and should not be duplicated in the internal subset. Unfortunately,
we still don't have a way of determining if entities and notations
are internal or external so I'm undecided on what the default
behavior in this case should be. Any thoughts? preferences?

In any case, the patch for the following changes is included
below. If don't mind me making changes to your serializer code 
directly, I'll go ahead and make future bug fixes straight in 
CVS. Otherwise, I'll continue posting patches. Just let me know.

  Change 0: Removed debugging printlns left in code.
  Change 1: Added ability to determine public & system ids
  Change 2: Removed printing of entities and notation decls
            in the internal subset.

// BEGIN Patch
Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
===================================================================
RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
retrieving revision 1.3
diff -c -r1.3 BaseMarkupSerializer.java
*** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
--- BaseMarkupSerializer.java   2000/01/14 23:51:31
***************
*** 705,712 ****
      {
        // Only works if we're going out of DTD mode.
        if ( _writer == _dtdWriter ) {
- System.out.println( "Writer " + _writer );
- System.out.println( "DocWriter " + _docWriter );
            _line.append( _text );
            _text = new StringBuffer( 20 );
            flushLine( false );
--- 705,710 ----
***************
*** 776,797 ****
            // serialize it.
            docType = ( (Document) node ).getDoctype();
            if ( docType != null ) {
!               startDTD( docType.getName(), null, null );
!               map = docType.getEntities();
!               if ( map != null ) {
!                   for ( i = 0 ; i < map.getLength() ; ++i ) {
!                       entity = (Entity) map.item( i );
!                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
!                                           entity.getSystemId(), entity.getNotationName() );
!                   }
!               }
!               map = docType.getNotations();
!               if ( map != null ) {
!                   for ( i = 0 ; i < map.getLength() ; ++i ) {
!                       notation = (Notation) map.item( i );
!                       notationDecl( notation.getNodeName(), notation.getPublicId(), notation.getSystemId() );
!                   }
!               }
                endDTD();
            }
            // !! Fall through
--- 774,803 ----
            // serialize it.
            docType = ( (Document) node ).getDoctype();
            if ( docType != null ) {
!             Class docTypeClass = docType.getClass();
!
!             String docTypePublicId = null;
!             String docTypeSystemId = null;
!             try {
!                 java.lang.reflect.Method getPublicId = docTypeClass.getMethod("getPublicId", null);
!                 if (getPublicId.getReturnType().equals(String.class)) {
!                     docTypePublicId = (String)getPublicId.invoke(docType, null);
!                 }
!             }
!             catch (Exception e) {
!                 // ignore
!             }
!             try {
!                 java.lang.reflect.Method getSystemId = docTypeClass.getMethod("getSystemId", null);
!                 if (getSystemId.getReturnType().equals(String.class)) {
!                     docTypeSystemId = (String)getSystemId.invoke(docType, null);
!                 }
!             }
!             catch (Exception e) {
!                 // ignore
!             }
!
!               startDTD( docType.getName(), docTypePublicId, docTypeSystemId );
                endDTD();
            }
            // !! Fall through
// END Patch

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
That's what I was suspecting, so I made sure it throws an
IllegalStateException with a proper message and just committed that
change. So if you do this honest mistake it will complain and explain
why.

Keith also suggested I use System.out by default, but then I'll get
people complaining of seeing stuff in the console instead of the output
file.

You're prefectly right on the entity side, I'll try to get your patch in
and update the CVS.

arkin


Andy Clark wrote:
> 
> Assaf,
> 
> One of the problems has been fixed -- the null pointer caused
> by not specifying a writer (or output stream) to the serializer.
> I was surprised to find out that the serializer doesn't just
> write to System.out in the case that no writer/output stream
> is specified. I would suggest either 1) not having a constructor
> that is guaranteed to cause an exception when you serialize
> (assuming, of course, that you haven't set the output), or 2)
> default the writer/output stream to System.out. I would prefer
> the second option.
> 
> The serializer is still unable to handle text entities. The
> solution to this requires some additional work because you
> have to serialize all of the Entity node's children into a
> string and then write *that* as the value of the entity. The
> patch for this particular fix is included below. I wasn't
> sure how many variables need to be swapped out during the
> serialization of the children so I included _writer, _text,
> and _line. Please review this change and then update CVS
> appropriately.
> 
> // BEGIN Patch
> Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
> ===================================================================
> RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
> retrieving revision 1.3
> diff -c -r1.3 BaseMarkupSerializer.java
> *** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
> --- BaseMarkupSerializer.java   2000/01/14 22:46:21
> ***************
> *** 780,789 ****
>                 map = docType.getEntities();
>                 if ( map != null ) {
>                     for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       entity = (Entity) map.item( i );
> !                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
> !                                           entity.getSystemId(), entity.getNotationName() );
> !                   }
>                 }
>                 map = docType.getNotations();
>                 if ( map != null ) {
> --- 780,812 ----
>                 map = docType.getEntities();
>                 if ( map != null ) {
>                     for ( i = 0 ; i < map.getLength() ; ++i ) {
> !                       entity = (Entity) map.item( i );
> !                 String publicId = entity.getPublicId();
> !                 String systemId = entity.getSystemId();
> !                 if (publicId == null && systemId == null) {
> !                     Writer oldWriter = _writer;
> !                     StringBuffer oldText = _text;
> !                     StringBuffer oldLine = _line;
> !                     _writer = new StringWriter();
> !                     _text = new StringBuffer();
> !                     _line = new StringBuffer();
> !                     Node child = entity.getFirstChild();
> !                     while (child != null) {
> !                         serializeNode(child);
> !                         child = child.getNextSibling();
> !                     }
> !                     flushLine(true);
> !                     String value = _writer.toString();
> !                     _writer = oldWriter;
> !                     _text = oldText;
> !                     _line = oldLine;
> !                     internalEntityDecl(entity.getNodeName(), value);
> !                 }
> !                 else {
> !                               unparsedEntityDecl( entity.getNodeName(), publicId, systemId,
> !                                           entity.getNotationName() );
> !                       }
> !             }
>                 }
>                 map = docType.getNotations();
>                 if ( map != null ) {
> // END Patch
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Serializers: Do They Work?

Posted by Andy Clark <an...@apache.org>.
Assaf,

One of the problems has been fixed -- the null pointer caused
by not specifying a writer (or output stream) to the serializer.
I was surprised to find out that the serializer doesn't just
write to System.out in the case that no writer/output stream
is specified. I would suggest either 1) not having a constructor
that is guaranteed to cause an exception when you serialize
(assuming, of course, that you haven't set the output), or 2) 
default the writer/output stream to System.out. I would prefer
the second option.

The serializer is still unable to handle text entities. The
solution to this requires some additional work because you
have to serialize all of the Entity node's children into a
string and then write *that* as the value of the entity. The
patch for this particular fix is included below. I wasn't
sure how many variables need to be swapped out during the
serialization of the children so I included _writer, _text,
and _line. Please review this change and then update CVS
appropriately.

// BEGIN Patch
Index: src/org/apache/xml/serialize/BaseMarkupSerializer.java
===================================================================
RCS file: /home/cvs/xml-xerces/java/src/org/apache/xml/serialize/BaseMarkupSerializer.java,v
retrieving revision 1.3
diff -c -r1.3 BaseMarkupSerializer.java
*** BaseMarkupSerializer.java   2000/01/14 20:40:54     1.3
--- BaseMarkupSerializer.java   2000/01/14 22:46:21
***************
*** 780,789 ****
                map = docType.getEntities();
                if ( map != null ) {
                    for ( i = 0 ; i < map.getLength() ; ++i ) {
!                       entity = (Entity) map.item( i );
!                       unparsedEntityDecl( entity.getNodeName(), entity.getPublicId(),
!                                           entity.getSystemId(), entity.getNotationName() );
!                   }
                }
                map = docType.getNotations();
                if ( map != null ) {
--- 780,812 ----
                map = docType.getEntities();
                if ( map != null ) {
                    for ( i = 0 ; i < map.getLength() ; ++i ) {
!                       entity = (Entity) map.item( i );
!                 String publicId = entity.getPublicId();
!                 String systemId = entity.getSystemId();
!                 if (publicId == null && systemId == null) {
!                     Writer oldWriter = _writer;
!                     StringBuffer oldText = _text;
!                     StringBuffer oldLine = _line;
!                     _writer = new StringWriter();
!                     _text = new StringBuffer();
!                     _line = new StringBuffer();
!                     Node child = entity.getFirstChild();
!                     while (child != null) {
!                         serializeNode(child);
!                         child = child.getNextSibling();
!                     }
!                     flushLine(true);
!                     String value = _writer.toString();
!                     _writer = oldWriter;
!                     _text = oldText;
!                     _line = oldLine;
!                     internalEntityDecl(entity.getNodeName(), value);
!                 }
!                 else {
!                               unparsedEntityDecl( entity.getNodeName(), publicId, systemId,
!                                           entity.getNotationName() );
!                       }
!             }
                }
                map = docType.getNotations();
                if ( map != null ) {
// END Patch

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
Andy Clark wrote:
> 
> Assaf Arkin wrote:
> >
> > Sorry about that. Line 634 of BaseMarkupSerializer should be changed
> > from
> 
> That's not the only bug. I get a lot of NullPointerExceptions in
> other places as well. Each time I add a null pointer check just
> to get it to run, it stops later on because of another problem.

Just send me the stack traces. I use these serializers all the time, but
only for SAX, so I don't have too many DOM cases to test against.

So far I got one report from you and one from Ricardo, but he can't
figure out where it breaks.

I'll update the CVS right away with this fix, one that affects the HTML
DOM, and one that solves a pretty printing glitch.

arkin


> 
> When are you planning on updating CVS with your changes?
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Andy Clark <an...@apache.org>.
Assaf Arkin wrote:
> 
> Sorry about that. Line 634 of BaseMarkupSerializer should be changed
> from

That's not the only bug. I get a lot of NullPointerExceptions in 
other places as well. Each time I add a null pointer check just
to get it to run, it stops later on because of another problem.

When are you planning on updating CVS with your changes?

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Serializers: Do They Work?

Posted by Assaf Arkin <ar...@exoffice.com>.
Sorry about that. Line 634 of BaseMarkupSerializer should be changed
from

if ( publicId != null )

to 

if ( publicId == null )

and it should fix the problem you reported. I'm adding two other fixes
and will update the CVS tomorrow.

arkin



Andy Clark wrote:
> 
> I looked at the org.apache.xml.serialize classes for the first
> time today and I can't seem to get a simple XMLSerializer
> example to work. I keep getting NullPointerExceptions all over
> the place.
> 
> Has anyone else seen these problems? Is it a usage problem?
> Perhaps we should have part of the documentation talk about
> how to use the new serializer APIs. Anybody interested in
> writing those docs?
> 
> Also, I think I saw some bugs in the code. But I'd like to
> get an answer on whether I'm using them correctly before I
> start fixing bugs in the source.
> 
> I've included the program below as well as the stack trace of
> the exception.
> 
> // BEGIN Test.java
> package test;
> 
> import org.apache.xerces.parsers.DOMParser;
> 
> import org.apache.xml.serialize.Method;
> import org.apache.xml.serialize.OutputFormat;
> import org.apache.xml.serialize.Serializer;
> import org.apache.xml.serialize.SerializerFactory;
> import org.apache.xml.serialize.XMLSerializer;
> 
> import org.w3c.dom.Document;
> 
> public class Test {
> 
>     public static void main(String argv[]) throws Exception {
> 
>         DOMParser parser = new DOMParser();
>         SerializerFactory factory = SerializerFactory.getSerializerFactory(Method.XML);
> 
>         for (int i = 0; i < argv.length; i++) {
>             final String arg = argv[i];
>             System.err.println("argv["+i+"]: "+arg);
>             parser.parse(arg);
>             Document document = parser.getDocument();
>             OutputFormat format = new OutputFormat(document, "UTF-8", false);
>             Serializer serializer = factory.makeSerializer(format);
>             serializer.asDOMSerializer().serialize(document);
>         }
> 
>     }
> 
> }
> // END Test.java
> 
> // BEGIN personal.xml
> <?xml version="1.0"?>
> <!DOCTYPE personnel SYSTEM "personal.dtd">
> <?xml-stylesheet type='text/xsl' href='personal.xsl'?>
> <personnel>
>   <person id="H.MARUYAMA" >
>     <name><family>MARUYAMA</family> <given>Hiroshi</given></name>
>     <email>maruyama@jp.ibm.com</email>
>     <link subordinates="  N.URAMOTO    K.TAMURA "/>
>   </person>
>   <person id="N.URAMOTO">
>     <name><family>URAMOTO</family> <given>Naohiko</given></name>
>     <email>uramoto@jp.ibm.com</email>
>     <link manager="             H.MARUYAMA"/>
>   </person>
>   <person id="K.TAMURA">
>     <name>
>       <family>TAMURA</family> <given>Kent</given>
>     </name>
>     <!-- This URL is mail address.-->
>     <url href="mailto:kent@trl.ibm.co.jp"/>
>     <url href="mailto:tkent@jp.ibm.com"/>
>     <link manager="H.MARUYAMA"/>
>   </person>
> </personnel>
> // END personal.xml
> 
> // BEGIN personal.dtd
> <?xml encoding="US-ASCII"?>
> 
> <!ELEMENT personnel (person)+>
> <!ATTLIST person note CDATA #IMPLIED
>                  contr (true|false) "false"
>                  id ID #REQUIRED>
> <!ELEMENT person (name,email*,url*,link?)>
> <!ELEMENT family (#PCDATA)>
> <!ELEMENT given (#PCDATA)>
> <!ELEMENT name (family,given?)>
> <!ELEMENT email (#PCDATA)>
> <!ELEMENT url EMPTY>
> <!ATTLIST url href CDATA "http://">
> <!ELEMENT link EMPTY>
> <!ATTLIST link
>   manager IDREF #IMPLIED
>   subordinates IDREFS #IMPLIED>
> 
> <!NOTATION gif SYSTEM "photoshop.exe">
> 
> <!ENTITY internal-entity "entity-value">
> <!ENTITY external-entity PUBLIC "-//ENTITY" "external-entity.ent">
> <!ENTITY unparsed-entity SYSTEM "unparsed-entity.ent" NDATA gif>
> // END personal.dtd
> 
> // BEGIN Stack Trace
> Exception in thread "main" java.lang.NullPointerException:
>         at org.apache.xml.serialize.BaseMarkupSerializer.printDoctypeURL(BaseMarkupSerializer.java:1261)
>         at org.apache.xml.serialize.BaseMarkupSerializer.unparsedEntityDecl(BaseMarkupSerializer.java:638)
>         at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:773)
>         at org.apache.xml.serialize.BaseMarkupSerializer.serialize(BaseMarkupSerializer.java:385)
>         at test.Test.main(Test.java:27)
> // END Stack Trace
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org