You are viewing a plain text version of this content. The canonical link for it is here.
Posted to doxia-dev@maven.apache.org by Hervé BOUTEMY <he...@free.fr> on 2008/11/07 13:45:00 UTC

(output) encoding support in doxia-sink-api

Hi folks,

For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory 
interface to improve encoding support:
- Sink createSink( File outputDir, String outputName, String encoding )
- Sink createSink( Writer writer )
See [1] for the full interface.

I worked with Vincent to implement output encoding in Doxia, and we faced 
problems that lead us think that forcing a fixed encoding was the right 
approach to have something simple and reliable: with UTF-8 as this fixed 
encoding, this didn't limit end-users from any country in the world.

But now, I'm convinced it's not the right approach and API:
1. some formats need to output the encoding (like HTML, or XML): we need an 
encoding parameter, as we can't get it from a Writer instance
2. some formats embed images, like RTF or PDF, then need direct stream access 
to write binary data

Then I think "Sink createSink( Writer writer )" should be removed.
Or if we want an API without filename, this method could be transformed into 
Sink createSink( OutputStream output ) + Sink createSink( OutputStream 
output, String encoding ).

Any objection or idea?

Hervé

[1] 
http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup

Re: (output) encoding support in doxia-sink-api

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 07 novembre 2008, Benjamin Bentmann a écrit :
> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit.
Seems a good idea.
This would ensure too that every text-based sinks have the same default 
encoding: it gives some consistency. Is it too limitating? I don't think.

> > Or if we want an API without filename, this method could be transformed
> > into Sink createSink( OutputStream output ) + Sink createSink(
> > OutputStream output, String encoding ).
>
> Unless we actually have need for this (testing with in-memory streams?),
if we want to generate and zip or transfer on-the-fly, for example. It's only 
an general idea: I don't really have any plans. I was just guessing that if a 
method with a Writer was added, some had something in mind.

> I would keep the interface slim to ease its implementation.
ok with that

> Another question might be: Does the relation
>    1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple
> files? In the later case, createSink( OutputStream ) would be troublesome.
AFAIK, we always have 1 sink -> 1 file/stream
but I don't have sufficient experience with Doxia use-cases to affirm that.

>
>
> Benjamin
thanks for your comments

Hervé


Re: (output) encoding support in doxia-sink-api

Posted by Vincent Siveton <vi...@gmail.com>.
Hi Benjamin

2008/11/7 Benjamin Bentmann <be...@udo.edu>:

[SNIP]

> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit. For instance,
> XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
> detail, how about

+1 too

Vincent

Re: (output) encoding support in doxia-sink-api

Posted by Vincent Siveton <vi...@gmail.com>.
> Another question might be: Does the relation
>  1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple files?
> In the later case, createSink( OutputStream ) would be troublesome.

The relation should persist.

Cheers,

Vincent

Re: (output) encoding support in doxia-sink-api

Posted by Benjamin Bentmann <be...@udo.edu>.
Hervé BOUTEMY wrote:

> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory 
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )

Sounds good. This would enable the caller to configure the desired 
output encoding for text-based sinks like XHTML.

> - Sink createSink( Writer writer )
> [...]
> 2. some formats embed images, like RTF or PDF, then need direct stream access 
> to write binary data
> 
> Then I think "Sink createSink( Writer writer )" should be removed.

Makes sense, too. One can setup a writer on top of a stream but the 
reverse is not possible. So it's sensible to not limit the API to 
writers and support binary-based sinks.

It might however be convenient to create an AbstractTextSinkFactory from 
which all/most text-based sinks could inherit. For instance, 
XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more 
detail, how about

   public abstract class AbstractTextSinkFactory
   {
     /**
       * @param writer The writer for the sink output, never 
<code>null</code>.
       * @param encoding The character encoding used by the writer.
       */
     protected abstract Sink createSink( Writer writer, String encoding )
       throws IOException;

     public Sink createSink( File outputDir, String outputName )
     {
       return createSink( outputDir, outputName, "UTF-8" );
     }

     public Sink createSink( File outputDir, String outputName, String 
encoding )
     {
       // check args, create out dir, yadayada
       Writer writer = WriterFactory.newWriter( new File( outputDir, 
outputName ), encoding );
       return createSink( writer, encoding );
     }
   }

Based on this, subclasses could be trimmed down to

   public class XHtmlSinkFactory
   {
     protected abstract Sink createSink( Writer writer, String encoding )
     {
       return new XhtmlSink( writer, encoding );
     }
   }

The encoding parameter passed in here is apparently only informative. It 
would merely allow the sinks to determine the encoding, e.g. for the XML 
declaration.

> Or if we want an API without filename, this method could be transformed into 
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream 
> output, String encoding ).

Unless we actually have need for this (testing with in-memory streams?), 
I would keep the interface slim to ease its implementation.

Another question might be: Does the relation
   1 sink -> 1 output file/stream
always hold or are there formats that might need to output multiple 
files? In the later case, createSink( OutputStream ) would be troublesome.


Benjamin

Re: (output) encoding support in doxia-sink-api

Posted by Vincent Siveton <vi...@gmail.com>.
Other comments which maybe related to this thread:

* Parser needs at least 2 parsing: one for macro and another one for
processing, and now a third one to validate XML (DOXIA-263)
* Sink uses some time StringWriter to play with the writed content and
to create a valid content (DOXIA-177)

So, maybe it will more easy to have directly String in input/output
instead of stream.

Cheers,

Vincent


2008/11/7 Vincent Siveton <vi...@gmail.com>:
> Hi,
>
> 2008/11/7 Hervé BOUTEMY <he...@free.fr>:
>> Hi folks,
>>
>> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
>> interface to improve encoding support:
>> - Sink createSink( File outputDir, String outputName, String encoding )
>> - Sink createSink( Writer writer )
>> See [1] for the full interface.
>
> +1
>
>>
>> I worked with Vincent to implement output encoding in Doxia, and we faced
>> problems that lead us think that forcing a fixed encoding was the right
>> approach to have something simple and reliable: with UTF-8 as this fixed
>> encoding, this didn't limit end-users from any country in the world.
>>
>> But now, I'm convinced it's not the right approach and API:
>> 1. some formats need to output the encoding (like HTML, or XML): we need an
>> encoding parameter, as we can't get it from a Writer instance
>
> see also DOXIA-185
>
>> 2. some formats embed images, like RTF or PDF, then need direct stream access
>> to write binary data
>>
>> Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.
>
>> Or if we want an API without filename, this method could be transformed into
>> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
>> output, String encoding ).
>
> +1
>
> Thanks Hervé for that.
>
> Vincent
>
>> Any objection or idea?
>>
>> Hervé
>>
>> [1]
>> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>>
>

Re: (output) encoding support in doxia-sink-api

Posted by Hervé BOUTEMY <he...@free.fr>.
> > Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.

this method was added in 1.0-beta-1-SNAPSHOT: 1.0-beta-1 hasn't been released 
yet. Is there really any backward compatibility issue here?


Re: (output) encoding support in doxia-sink-api

Posted by Vincent Siveton <vi...@gmail.com>.
2008/11/7 Benjamin Bentmann <be...@udo.edu>:
> Vincent Siveton wrote:
>
>>> Then I think "Sink createSink( Writer writer )" should be removed.
>>
>> I prefer deprecated for backward compatibility issue.
>
> The source code says "@since 1.0-beta-1", i.e. this method was never part of
> a released Doxia version, right? Then we should be able to safely removed
> it. Having both "@since 1.0-beta-1" and "@deprecated since 1.0-beta-1" on a
> method would be quite strange ;-)

mmh seems logical  :)

Vincent

>
> Benjamin
>

Re: (output) encoding support in doxia-sink-api

Posted by Benjamin Bentmann <be...@udo.edu>.
Vincent Siveton wrote:

>> Then I think "Sink createSink( Writer writer )" should be removed.
> 
> I prefer deprecated for backward compatibility issue.

The source code says "@since 1.0-beta-1", i.e. this method was never 
part of a released Doxia version, right? Then we should be able to 
safely removed it. Having both "@since 1.0-beta-1" and "@deprecated 
since 1.0-beta-1" on a method would be quite strange ;-)


Benjamin

Re: (output) encoding support in doxia-sink-api

Posted by Vincent Siveton <vi...@gmail.com>.
Hi,

2008/11/7 Hervé BOUTEMY <he...@free.fr>:
> Hi folks,
>
> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )
> - Sink createSink( Writer writer )
> See [1] for the full interface.

+1

>
> I worked with Vincent to implement output encoding in Doxia, and we faced
> problems that lead us think that forcing a fixed encoding was the right
> approach to have something simple and reliable: with UTF-8 as this fixed
> encoding, this didn't limit end-users from any country in the world.
>
> But now, I'm convinced it's not the right approach and API:
> 1. some formats need to output the encoding (like HTML, or XML): we need an
> encoding parameter, as we can't get it from a Writer instance

see also DOXIA-185

> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.

I prefer deprecated for backward compatibility issue.

> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).

+1

Thanks Hervé for that.

Vincent

> Any objection or idea?
>
> Hervé
>
> [1]
> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>