You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@xerces.apache.org by Assaf Arkin <ar...@exoffice.com> on 2000/03/16 09:51:27 UTC

Review: Serializer API

I've attached a draft of the new serializer API, please review and
comment.


The new API fixes the following issues:

* Separates interfaces from implementation, all interfaces have been
placed in the package org.xml.serialize, implementation remains in
org.apache.xml.serialize

* Better defines the Serializer interface and how to use it

* Added support for reusing serializers

* Simplified SerializerFactory, one method call to get a serializer

* Adds QName for namespace support in OutputFormat

* Adds SerializerHandler for supporting non-escaping and whitespace
preserving contents

arkin


-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Andy Clark <an...@apache.org>.

Assaf Arkin wrote:
> I've attached a draft of the new serializer API, please review and
> comment.

I'm finally getting back to looking over the serializer stuff.
Unfortunately, it's getting late so I'll write up my thoughts
tomorrow and post it to the mailing list.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Review: Serializer API

Posted by Andy Clark <an...@apache.org>.

How about changing Method not to MimeType but rather to ContentType?
A construct I use *very* often in my HTML files is the following:

  <meta http-equiv='content-type' contents='text/html; charset=x-sjis'>

Which would correspond to an equivalent response line from a web
server or a header line in an email message:

  Content-Type: text/html; charset="x-sjis"

Then the ContentType could be the one used to specify the encoding
of the output stream. Which brings me back to my binary vs.
character serializer comment...

Clearly, in Java, OutputStream is for binary output and Writer is 
for character output. However, what's implied is that the program
has constructed the appropriate writer that converts the Unicode
characters to the appropriate byte sequences in that encoding. So
a Writer object is really writing to an OutputStream. 

But by changing character serializers to only support Writer, it
would appear that we're putting the onus on the programmer to
both specify the charset for the encoding type (so that any
reference to the charset in the output is correct -- e.g. the
XML encoding names, IANA, are *not* the same as the Java encoding
names) *and* create an output writer with the appropriate Java
encoding! However, I think that we can work through this by
providing a convenience mapping that creates the appropriate
writer from a given output stream. Here's a quick example:

  public class ContentType {

    // Data
    protected String type;
    protected String charset;

    // Constructors
    public ContentType(String type, String charset) {
      this.type = type;
      this.charset = charset;
    }

    // Public methods
    public Writer createWriter(OutputStream out) 
      throws UnsupportedEncodingException {

      String javaEncoding = /* do mapping on charset */;
      return new OutputStreamWriter(out, javaEncoding);
    }

    // ... etc ...

  }

What do you think?

Or maybe a static method would be better? Hmmm... I'm just
brainstorming here. I'd really like to hear other people's
opinions.

(I noticed that Sun's JavaMail extension has something
similar: javax.mail.internet.ContentType)

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Mutation Events

Posted by Gerd Mueller <ge...@softwarebuero.de>.

Hi,

> Events in the DOM can "bubble" up the tree. Section 6.2.3 of the
> candidate recommendation specification describes how this works.
> But the important thing is that you only have to register your
> listener on the root and not on *every* node.
> 
> Section 6.6.4 from the specification tells you which MutationEvents
> bubble up the tree. Some don't. For the others, you should be able
> to register your event listener on the root of the tree and receive
> the notification.
> 
> If our implementation doesn't behave the way that the spec details,
> then we need to know so that it can be fixed by someone. For more
> information on DOM Level 2 events, check out the specification at
> the following link: http://www.w3.org/TR/DOM-Level-2/events.html

Thank you, it works now. Probably my example was somehow 'miscontructed'.

Regards
Gerd

-- 
________________________________________________________________
Gerd Mueller                               gerd@softwarebuero.de
softwarebuero m&b                    http://www.softwarebuero.de

Re: Mutation Events

Posted by Andy Clark <an...@apache.org>.

Gerd Mueller wrote:
> But if I understand the specification and the implementation of Xerces right I
> have to add event listeners for _each_ mutation event type to _every_ node of
> the tree to be notified of any mutation - is this right ?

Events in the DOM can "bubble" up the tree. Section 6.2.3 of the
candidate recommendation specification describes how this works.
But the important thing is that you only have to register your
listener on the root and not on *every* node.

Section 6.6.4 from the specification tells you which MutationEvents
bubble up the tree. Some don't. For the others, you should be able
to register your event listener on the root of the tree and receive
the notification.

If our implementation doesn't behave the way that the spec details,
then we need to know so that it can be fixed by someone. For more
information on DOM Level 2 events, check out the specification at
the following link: http://www.w3.org/TR/DOM-Level-2/events.html

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Mutation Events

Posted by Gerd Mueller <ge...@softwarebuero.de>.

Hi,

I've got a question regarding the mutation events: 

I write an application which needs to be notified about _any_ mutation of a
DOM tree and I think the DOM event model is suitable to handle this (?). 

But if I understand the specification and the implementation of Xerces right I
have to add event listeners for _each_ mutation event type to _every_ node of
the tree to be notified of any mutation - is this right ? 

It seems to me a big wast of ressources. Wouldn't it be more clever to have
the possibility to add one event listener for each event type to the root node
and catch there all events which may happen somewhere in the DOM tree ? But I'm
not sure if this is W3C conform. 

Or is there another solution for this problem ?

Best Regards,
Gerd

-- 
________________________________________________________________
Gerd Mueller                               gerd@softwarebuero.de
softwarebuero m&b                    http://www.softwarebuero.de

Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

The method name is a String. Method defines some common names, but not
all the names. You can have other serializers responding to other
methods, so it's just a place holder for names.

arkin

Wong Kok Wai wrote:
> 
> Similar comments for setters that accept a string: should you check for both null
> and zero-length string before assigning?
> 
> For setMethod method in OutputFormat, why not pass an instance of Method instead of
> String?
> 
> cheers,
> Wong
> 
> Assaf Arkin wrote:
> 
> > You have some sharp eyes :-)
> >
> > arkin
> >
> > Wong Kok Wai wrote:
> > >
> > > A quick comment: In QName, the localname checking is only for null. Should add
> > > checking for zero-length string too?

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Wong Kok Wai <wo...@pacific.net.sg>.

Similar comments for setters that accept a string: should you check for both null
and zero-length string before assigning?

For setMethod method in OutputFormat, why not pass an instance of Method instead of
String?

cheers,
Wong

Assaf Arkin wrote:

> You have some sharp eyes :-)
>
> arkin
>
> Wong Kok Wai wrote:
> >
> > A quick comment: In QName, the localname checking is only for null. Should add
> > checking for zero-length string too?

Re: Review: Serializer API

Posted by Assaf Arkin <ar...@exoffice.com>.

You have some sharp eyes :-)

arkin


Wong Kok Wai wrote:
> 
> A quick comment: In QName, the localname checking is only for null. Should add
> checking for zero-length string too?

Re: Review: Serializer API

Posted by Wong Kok Wai <wo...@pacific.net.sg>.

A quick comment: In QName, the localname checking is only for null. Should add
checking for zero-length string too?

Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

Andy Clark wrote:
> 
> Assaf Arkin wrote:
> > * Separates interfaces from implementation, all interfaces have been
> > placed in the package org.xml.serialize, implementation remains in
> > org.apache.xml.serialize
> 
> I thought that org.xml package was owned by OASIS. Are we allowed
> to drop things in there?

Not yet, that's why the code is not in the CVS, it's just a
recommendation.

 
> Thanks for posting this; can't wait to look at the details!

Apparently, I forgot to attach the sources :-)

Here they are.

arkin

> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Andy Clark <an...@apache.org>.

Assaf Arkin wrote:
> * Separates interfaces from implementation, all interfaces have been
> placed in the package org.xml.serialize, implementation remains in
> org.apache.xml.serialize

I thought that org.xml package was owned by OASIS. Are we allowed
to drop things in there?

Thanks for posting this; can't wait to look at the details!

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Review: Serializer API - the related failure.

Posted by Boris Garbuzov <bo...@keystrokenet.com>.

My notice might be related to the subject. DOMSerializer serializes
newly created Element and DocumentFragment for me, but refuses the root
element, throwing
    java.lang.ClassCastException: org.apache.xerces.dom.TextImpl
      at
org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:577)

      at
org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827)

      at
org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:608)

      at
org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827)

      at
org.apache.xml.serialize.BaseMarkupSerializer.serialize(BaseMarkupSerializer.java:373)

      at
com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:397)
Is this a bug or my misuse of the API?
------------------------------------------------------------------------------------------



Andy Clark wrote:

> First, I'd like to look at what's currently in the API and then
> discuss some points of design that I'd like to see in the
> serializers.
>
> DOMSerializer: I'm sort of surprised that there are methods to
> serialize a Document, Element, and DocumentFragment but nothing
> for a generic Node. In fact, if you wanted to serialize a text
> node or entity reference, you would first have to remove or
> clone it into a DocumentFragment and serialize that. And it is
> impossible to serialize things like attributes outside of their
> container elements. Would it be enough to have the following
> method?
>
>   public void serialize(Node node) throws IOException;
>
> Method: I don't see a need for this class. If all it's doing
> is holding string constants, then I would say get rid of it
> completely. Otherwise, make it a Java "enumeration", like so:
>
>   public class Method implements Serializable {
>
>     // Constants
>     public static final Method XML = new Method("text/xml");
>     public static final Method HTML = new Method("text/html");
>     public static final Method Text = new Method("text/plain");
>
>     // Data
>     private String type;
>
>     // Constructors
>     protected Method(String type) { this.type = type; }
>
>     // Object methods: equals, hashCode, and toString
>   }
>
> But I think that we could do without it altogether and just
> make it possible to register new methods with the serializer
> factory. But I'll get to that in a minute.
>
> And the type of the method could be the mime type which would
> avoid the need of a set/getMediaType on the OutputFormat object.
> And if this thing is really representing the mime type, perhaps
> it should be called such instead of "Method". It would tie in
> better with existing standards.
>
> OutputFormat: It seems like a good idea to have a kind of
> properties object like OutputFormat. But it seems that the
> OutputFormat (and in fact the whole serializer API) is based
> on serializing to a text markup syntax. This sort of jumps
> the gun on what I'd like to say in general about the
> serialization API so I won't go any further at this point.
> Check out my comments below regarding this matter.
>
> Serializer: I noticed that this design makes use of the SAX
> interfaces but not of the traversal APIs added with DOM Level
> 2. Is there a way that we could leverage those interfaces?
>
> SerializerFactory: There's no way to dynamically register
> OutputMethods or Serializers. I think that there should be
> a way to do this.
>
> And overall, I'm not sure if we'd be allowed to drop stuff
> into the org.xml package namespace. Arkin: have you checked
> on this? And will any of this be superceded by DOM Level 3?
> at least on the DOM serialization side, that is... Perhaps
> Arnaud or someone else on the W3C commitee can shed light
> on this.
>
> Okay, now I'd like to make a few comments about what I'd
> like to see in a serialization API. First, I don't strictly
> see serialization as an output to some text markup. As
> such, I would like a split between binary and character
> serializers. Currently, there are both setOutputStream()
> and setWriter() methods on the Serializer objects. If
> possible, I'd like setOutputStream() only be on binary
> serializers and setWriter() be used on character serializers.
>
> All of the current serializer implementations (XML, HTML,
> XHTML) would be character serializers and the OutputFormat
> object seems to go very well with this. On the binary side,
> however, I can see a situation where SVG gets serialized to
> a JPEG image. I realize that this overlaps XSL Formatting
> Objects, though. Perhaps a better example would be an XML
> serializer that outputs to WBXML.
>
> Is anyone else thinking along these lines?
>
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

--
Boris Garbuzov.
Mailing address:
Box 715, Seattle, Washington, 98111-0715, USA.
E-mail: garbuzov@hotmail.com, boris@keystroke.com.
Telephone: 1(206)781-5165 (home), 1(206)576-4549 (office).
Resedential address: 139 NW 104 Street, Seattle, 98177, Wa, USA

Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

> In the case where the serializer instance doesn't support
> serializing that node type, it could throw an exception.

Why? I mean, how many serializers do you expect to be able to serialize
an attribute or a text node?

When will you use that functionality?


> processor. And there's nothing that says that a general
> serialization API has to follow exactly what the XSLT
> specification details. As long as there is a way to map
> from what an XSLT document may specify via <xsl:output> to
> our serialization scheme, I don't see a problem.

I was just following the XSLT guidelines since they are well specified.
There is no specification of how serialization should occur outside of
the XSLT spec, which is pretty clear about that. I assume if there ever
was one, it would borrow from the XSLT spec, so the naming would be in
conflict with such a specification.


> Is this a strict design goal? I'd rather have a good design
> that *allows* XSLT output "hints" than design around that
> optional part of the XSLT specification.

Once again, I don't have to stick to XSLT specs, but I think there was a
lot of work to define the spec in that area and any W3C proposal for
just specifying serialization will follow the same names and ideas.


> > No, the serializer API does not assume markup, it was designed to
> > support PDF, JPEG, and other binary formats. An implementation should by
> > default support the three common text formats, but the API is designed
> > so other formats can be introduced as well.
> 
> So you do not see the usefulness of separating binary from
> character serializers?

No.


> Just because ASCII is a one-byte encoding doesn't mean that
> OutputStream *should* be used. It may but Writer is definitely
> the way to go even if you're outputing ASCII, ISO Latin 1, or
> any other one-byte encoding. (Even ISO Latin ? would be wrong
> due to various mappings from Unicode to the single byte form
> if you use an OutputStream.)

I didn't say *should* I said *would*.

I actually use OutputStream with the serializers, I let the serializer
pick the Writer for me based on the encoding. Otherwise, I have to
create a Writer and make sure to use the same encoding. More lines of
code, same end result.

If you are outputing to a file, I think new FileOutputStream is a better
approach than new FileWriter, let the serializer pick the Writer for
you.


> > some transformations like that make sense. With that in mind, the
> > serializer API is part of the XSLT API and should support outputing of
> > all such possible transformation.
> 
> Huh? Perhaps we should go back to listing the design goals of
> the serialization API. I know that Scott wrote his own
> pretty printers when he wrote LotusXSL but I didn't realize
> that the serialization API was being designed around Xalan.

Not around Xalan, just to support Xalan, so you can plug different
serializers that are not existing in Xalan into Xalan.

I think we got our messages crossed at this point.

The serializers should fulfill the following:

* Support XML, HTML, Text

* Support any other output format (textual, binary)

* Allow an XSLT processor to use them, so you can extend the
capabilities of an XSLT processor by creating new serializers

* Allow them to be used without an XSLT processor

In keeping with that separation, extensiblity and useability, I tried to
do the following:

* Define an API that can take care of all the serialization details for
XML, HTML, Text

* Make the API sufficient enough to support other output formats
(textual and binary)

* Allow the API to be extended to support other output formats that
require more properties/methods

* Make the API separate from the TRAX API, yet make sure it's sufficient
for what XSLT needs 

The decision to support Xalan had little affect on the definition of the
API, it comes free if the API is independent of any particular
implementation details and well defined. The decision to stick to XSLT
spec was not brought up by Xalan either, only by the fact that the XSLT
spec specifies how serialization should occur.

The fact that there are some differences between the XSLT spec and the
Java spec are merely a point of conflict that do not affect just the
serializer. Consider that setContentType in the Servlet API is modeled
after the HTTP content-type. The fact that XSLT adopted media-type and
not content-type is a mistake in my opinion, but this is a conflict
between XSLT and HTTP as well.

If XML 2.0 would (not saying that it will, just an example) add content
type specification to the XML document declaration, would that be
media-type or content-type? Should an XML parser use the XML 2.0 term,
or the Java term?

arkin

> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Andy Clark <an...@apache.org>.

Arkin wrote:
> As for serializing Node that happens to be an Attribute, keep in mind
> that we're trying to define an API used by a lot of serializers. The
> question that should be raised is: would it be trivial for them to
> support it? Would a PDF serializer support that?

Perhaps not but you're precluding other types of serializers.
Here's an idea: how about a way to query the serializer to
ask it for which DOM nodes it supports serialization? For 
example:

  public boolean supportsNodeType(short nodeType);

In the case where the serializer instance doesn't support 
serializing that node type, it could throw an exception. 

> XSLT defines an output method which has one of three names xml, html,
> text or a qualified name for additional methods (like PDF, SVG, etc).It
> then defines media-type as a separate value. I don't like it, but it's
> part of the spec and the serializers have to support that for the sake
> of XSLT processing.

But <xsl:output> is not required to be observed by an XSLT
processor. And there's nothing that says that a general
serialization API has to follow exactly what the XSLT
specification details. As long as there is a way to map
from what an XSLT document may specify via <xsl:output> to
our serialization scheme, I don't see a problem.

> Not the best design, I agree, but one which follows the XSLT specs.

Is this a strict design goal? I'd rather have a good design
that *allows* XSLT output "hints" than design around that
optional part of the XSLT specification.

> No, the serializer API does not assume markup, it was designed to
> support PDF, JPEG, and other binary formats. An implementation should by
> default support the three common text formats, but the API is designed
> so other formats can be introduced as well.

So you do not see the usefulness of separating binary from
character serializers?

> I don't see why the serializers should not support both. An output
> stream is also used in many applications for character output (even if
> we both agree they should use Writer). It should certainly be understood
> that a GIF serializer only uses setOutputStream, but then, a PDF
> serizlier to Base64 encoding could use either one.

Just because ASCII is a one-byte encoding doesn't mean that
OutputStream *should* be used. It may but Writer is definitely 
the way to go even if you're outputing ASCII, ISO Latin 1, or 
any other one-byte encoding. (Even ISO Latin ? would be wrong
due to various mappings from Unicode to the single byte form 
if you use an OutputStream.)

> some transformations like that make sense. With that in mind, the
> serializer API is part of the XSLT API and should support outputing of
> all such possible transformation.

Huh? Perhaps we should go back to listing the design goals of 
the serialization API. I know that Scott wrote his own
pretty printers when he wrote LotusXSL but I didn't realize
that the serialization API was being designed around Xalan.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

Andy Clark wrote:
> 
> First, I'd like to look at what's currently in the API and then
> discuss some points of design that I'd like to see in the
> serializers.
> 
> DOMSerializer: I'm sort of surprised that there are methods to
> serialize a Document, Element, and DocumentFragment but nothing
> for a generic Node. In fact, if you wanted to serialize a text
> node or entity reference, you would first have to remove or
> clone it into a DocumentFragment and serialize that. And it is
> impossible to serialize things like attributes outside of their
> container elements. Would it be enough to have the following
> method?
> 
>   public void serialize(Node node) throws IOException;

I tried to stick to the W3C model which defines a document or a document
fragment, so if you want to print just an element, I think it makes
sense to use a document fragment.

As for serializing Node that happens to be an Attribute, keep in mind
that we're trying to define an API used by a lot of serializers. The
question that should be raised is: would it be trivial for them to
support it? Would a PDF serializer support that?

> But I think that we could do without it altogether and just
> make it possible to register new methods with the serializer
> factory. But I'll get to that in a minute.

If there is an agreement on that, I'll just make Method (which is
designed to hold the default output method names, nothing more) part of
the helpers class or kill it. I think it makes sense for documentation
the common methods, see comments below, it's not essential for anything
to work.

> And the type of the method could be the mime type which would
> avoid the need of a set/getMediaType on the OutputFormat object.
> And if this thing is really representing the mime type, perhaps
> it should be called such instead of "Method". It would tie in
> better with existing standards.

XSLT defines an output method which has one of three names xml, html,
text or a qualified name for additional methods (like PDF, SVG, etc).It
then defines media-type as a separate value. I don't like it, but it's
part of the spec and the serializers have to support that for the sake
of XSLT processing.

To select a serializer you use the method name. Generally serializers do
not care about the media type, but if we have a Servlet getting an XSLT
response, it would probably want to use the media type as the content
type. This is why getOutputFormat() exists, to extract the output format
and determine the media type.

The default output formats (and more can be supported) are defined in
the helpers class, all of which provide values for both method and media
type. In addition, the factory allows one to get an output format
suitable for a given output method, so you can determine the media type.

Not the best design, I agree, but one which follows the XSLT specs.

> OutputFormat: It seems like a good idea to have a kind of
> properties object like OutputFormat. But it seems that the
> OutputFormat (and in fact the whole serializer API) is based
> on serializing to a text markup syntax. This sort of jumps
> the gun on what I'd like to say in general about the
> serialization API so I won't go any further at this point.
> Check out my comments below regarding this matter.

No, the serializer API does not assume markup, it was designed to
support PDF, JPEG, and other binary formats. An implementation should by
default support the three common text formats, but the API is designed
so other formats can be introduced as well.

Once again, if you read the XSLT spec it clearly defines xml, html and
text, does not define, but allows, other output methods. I followed the
same guidelines in coming up with this API.

> Serializer: I noticed that this design makes use of the SAX
> interfaces but not of the traversal APIs added with DOM Level
> 2. Is there a way that we could leverage those interfaces?

Would make sense to support traversal for the DOMSerializer.

What would be the API requirements for that (other than
serializer(iterator))?

> SerializerFactory: There's no way to dynamically register
> OutputMethods or Serializers. I think that there should be
> a way to do this.

By definition the SerializerFactory is one way - but not the only way -
of obtaining serializers. You can also construct them directly. So no
need to go overboard with over generalizing it.

For registering serializers, I actually had a method for it, but I had
to pull it off and rethink it, since it would work better if it
registers both a serializer and a default OutputFormat.

I would definitely like to see a registration mechanism in the final
API.

> And overall, I'm not sure if we'd be allowed to drop stuff
> into the org.xml package namespace. Arkin: have you checked
> on this? And will any of this be superceded by DOM Level 3?
> at least on the DOM serialization side, that is... Perhaps
> Arnaud or someone else on the W3C commitee can shed light
> on this.

We are not yet dropping anything. There are two proposals, the
Serializer API and the XSLT processing API (TRAX) which we are proposing
in a larger forum as a vendor-neutral API. We intend to use org.xml for
that, if we get permission for that. Until we get that, it's only
available as a proposal and not in the CVS.

> Okay, now I'd like to make a few comments about what I'd
> like to see in a serialization API. First, I don't strictly
> see serialization as an output to some text markup. As

+1

> such, I would like a split between binary and character
> serializers. Currently, there are both setOutputStream()
> and setWriter() methods on the Serializer objects. If
> possible, I'd like setOutputStream() only be on binary
> serializers and setWriter() be used on character serializers.

I don't see why the serializers should not support both. An output
stream is also used in many applications for character output (even if
we both agree they should use Writer). It should certainly be understood
that a GIF serializer only uses setOutputStream, but then, a PDF
serizlier to Base64 encoding could use either one.

> All of the current serializer implementations (XML, HTML,
> XHTML) would be character serializers and the OutputFormat
> object seems to go very well with this. On the binary side,
> however, I can see a situation where SVG gets serialized to
> a JPEG image. I realize that this overlaps XSL Formatting
> Objects, though. Perhaps a better example would be an XML
> serializer that outputs to WBXML.

SVG to JPEG is certainly something within the scope of serializers, just
like XML to PDF. Although I would advocate that XML support be added
directly to Acrobat Reader and rely on FO, instead of XML -> FO -> PDF,
some transformations like that make sense. With that in mind, the
serializer API is part of the XSLT API and should support outputing of
all such possible transformation.

XML, HTML, XHTML and Text are considered the default output methods,
which I assume every XSLT processor or even XML parser would like to
support.

There should not be a conflict with getting XML to DB. The Serializer
and OutputFormat objects are extensible, so they should allow you to add
additional properties, e.g.:

  DBSerializer   ser;
  DBOutputFormat format;

  ser = new DBSerializer();
  format = new DBOutputFormat();
  format.setSQLSyntax( "SQL92" );
  ser.setConnection( jdbc.getConnection() );
  ser.setTableName( "po" );
  ser.asDOMSerializer().serialize( doc );

If these were registered in the factory then:

  format = SerializerFactory.getOutputFormat( "wbxml:db" );
  ser = SerializerFactory.getSerializer( format );

arkin

> 
> Is anyone else thinking along these lines?
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Upcoming Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

Sorry for not being available lately, we have two major releases
scheduled for the O'Reilly conference. No time to breath.

The following is scheduled for release RSN:

The serializers have been revised to include some bug fixes, performance
improvement and preliminary support for encodings. They have also been
brought up to speed with the proposed API.

A WMLSerializer will be introduced along with a WML DOM (contributed by
David Li).

Minor bug fixes to the HTML DOM, and a version of the HTML parser for
testing purposes.

arkin

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Wong Kok Wai <wo...@pacific.net.sg>.

Here's a good tutorial on how to use the serialiser:
http://metalab.unc.edu/xml/slides/sd2000west/xmlandjava/185.html

Boris Garbuzov wrote:

> Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this:
>

Re: Review: Serializer API

Posted by Arkin <ar...@exoffice.com>.

I could not understand, were you just sending an endElement without a
stateElement?

(I can't check the line number right now, I have a newer copy on my
machine that hopefully fixes this bug.)

arkin

Boris Garbuzov wrote:
> 
>     String unexistingName = "unexistingName";
>     documentHandler.endElement (unexistingName);
> 
> Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this:
> 
> java.lang.NullPointerException:
>  at
> org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307)
> 
>  at
> org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421)
> 
>  at
> com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442)
> 
> 

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org

Re: Review: Serializer API

Posted by Boris Garbuzov <bo...@keystrokenet.com>.

    String unexistingName = "unexistingName";
    documentHandler.endElement (unexistingName);

Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this:

java.lang.NullPointerException:
 at
org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307)
 at
org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421)
 at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442)

Re: Review: Serializer API

Posted by Wong Kok Wai <wo...@pacific.net.sg>.

Andy Clark wrote:

>
> Method: I don't see a need for this class. If all it's doing
> is holding string constants, then I would say get rid of it
> completely. Otherwise, make it a Java "enumeration", like so:
>
>

I'm having the exact thoughts but I didn't articulate them well in my
last email. In addition, the current designe of Method is a final class
which does not allow for future extensions.

>
> But I think that we could do without it altogether and just
> make it possible to register new methods with the serializer
> factory. But I'll get to that in a minute.
>
> And the type of the method could be the mime type which would
> avoid the need of a set/getMediaType on the OutputFormat object.
> And if this thing is really representing the mime type, perhaps
> it should be called such instead of "Method". It would tie in
> better with existing standards.

I feel "MimeType" is a better choice. "Media" would include "screen",
"braille", "aural" etc. from the CSS2 specs. In this case, serialisation
only support "print".

Re: Review: Serializer API

Posted by Andy Clark <an...@apache.org>.

First, I'd like to look at what's currently in the API and then
discuss some points of design that I'd like to see in the
serializers.

DOMSerializer: I'm sort of surprised that there are methods to
serialize a Document, Element, and DocumentFragment but nothing
for a generic Node. In fact, if you wanted to serialize a text
node or entity reference, you would first have to remove or
clone it into a DocumentFragment and serialize that. And it is
impossible to serialize things like attributes outside of their
container elements. Would it be enough to have the following
method?

  public void serialize(Node node) throws IOException;

Method: I don't see a need for this class. If all it's doing
is holding string constants, then I would say get rid of it
completely. Otherwise, make it a Java "enumeration", like so:

  public class Method implements Serializable {

    // Constants
    public static final Method XML = new Method("text/xml");
    public static final Method HTML = new Method("text/html");
    public static final Method Text = new Method("text/plain");

    // Data
    private String type;

    // Constructors
    protected Method(String type) { this.type = type; }

    // Object methods: equals, hashCode, and toString
  }

But I think that we could do without it altogether and just
make it possible to register new methods with the serializer
factory. But I'll get to that in a minute.

And the type of the method could be the mime type which would
avoid the need of a set/getMediaType on the OutputFormat object.
And if this thing is really representing the mime type, perhaps
it should be called such instead of "Method". It would tie in
better with existing standards.

OutputFormat: It seems like a good idea to have a kind of 
properties object like OutputFormat. But it seems that the
OutputFormat (and in fact the whole serializer API) is based
on serializing to a text markup syntax. This sort of jumps
the gun on what I'd like to say in general about the 
serialization API so I won't go any further at this point.
Check out my comments below regarding this matter.

Serializer: I noticed that this design makes use of the SAX
interfaces but not of the traversal APIs added with DOM Level
2. Is there a way that we could leverage those interfaces?

SerializerFactory: There's no way to dynamically register
OutputMethods or Serializers. I think that there should be
a way to do this.

And overall, I'm not sure if we'd be allowed to drop stuff
into the org.xml package namespace. Arkin: have you checked
on this? And will any of this be superceded by DOM Level 3?
at least on the DOM serialization side, that is... Perhaps
Arnaud or someone else on the W3C commitee can shed light
on this.

Okay, now I'd like to make a few comments about what I'd
like to see in a serialization API. First, I don't strictly
see serialization as an output to some text markup. As
such, I would like a split between binary and character
serializers. Currently, there are both setOutputStream() 
and setWriter() methods on the Serializer objects. If 
possible, I'd like setOutputStream() only be on binary 
serializers and setWriter() be used on character serializers. 

All of the current serializer implementations (XML, HTML, 
XHTML) would be character serializers and the OutputFormat 
object seems to go very well with this. On the binary side,
however, I can see a situation where SVG gets serialized to 
a JPEG image. I realize that this overlaps XSL Formatting 
Objects, though. Perhaps a better example would be an XML 
serializer that outputs to WBXML.

Is anyone else thinking along these lines?

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org