You are viewing a plain text version of this content. The canonical link for it is here.
Posted to doxia-dev@maven.apache.org by Hervé BOUTEMY <he...@free.fr> on 2008/11/07 13:45:00 UTC
(output) encoding support in doxia-sink-api
Hi folks,
For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
interface to improve encoding support:
- Sink createSink( File outputDir, String outputName, String encoding )
- Sink createSink( Writer writer )
See [1] for the full interface.
I worked with Vincent to implement output encoding in Doxia, and we faced
problems that lead us think that forcing a fixed encoding was the right
approach to have something simple and reliable: with UTF-8 as this fixed
encoding, this didn't limit end-users from any country in the world.
But now, I'm convinced it's not the right approach and API:
1. some formats need to output the encoding (like HTML, or XML): we need an
encoding parameter, as we can't get it from a Writer instance
2. some formats embed images, like RTF or PDF, then need direct stream access
to write binary data
Then I think "Sink createSink( Writer writer )" should be removed.
Or if we want an API without filename, this method could be transformed into
Sink createSink( OutputStream output ) + Sink createSink( OutputStream
output, String encoding ).
Any objection or idea?
Hervé
[1]
http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
Re: (output) encoding support in doxia-sink-api
Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 07 novembre 2008, Benjamin Bentmann a écrit :
> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit.
Seems a good idea.
This would ensure too that every text-based sinks have the same default
encoding: it gives some consistency. Is it too limitating? I don't think.
> > Or if we want an API without filename, this method could be transformed
> > into Sink createSink( OutputStream output ) + Sink createSink(
> > OutputStream output, String encoding ).
>
> Unless we actually have need for this (testing with in-memory streams?),
if we want to generate and zip or transfer on-the-fly, for example. It's only
an general idea: I don't really have any plans. I was just guessing that if a
method with a Writer was added, some had something in mind.
> I would keep the interface slim to ease its implementation.
ok with that
> Another question might be: Does the relation
> 1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple
> files? In the later case, createSink( OutputStream ) would be troublesome.
AFAIK, we always have 1 sink -> 1 file/stream
but I don't have sufficient experience with Doxia use-cases to affirm that.
>
>
> Benjamin
thanks for your comments
Hervé
Re: (output) encoding support in doxia-sink-api
Posted by Vincent Siveton <vi...@gmail.com>.
Hi Benjamin
2008/11/7 Benjamin Bentmann <be...@udo.edu>:
[SNIP]
> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit. For instance,
> XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
> detail, how about
+1 too
Vincent
Re: (output) encoding support in doxia-sink-api
Posted by Vincent Siveton <vi...@gmail.com>.
> Another question might be: Does the relation
> 1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple files?
> In the later case, createSink( OutputStream ) would be troublesome.
The relation should persist.
Cheers,
Vincent
Re: (output) encoding support in doxia-sink-api
Posted by Benjamin Bentmann <be...@udo.edu>.
Hervé BOUTEMY wrote:
> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )
Sounds good. This would enable the caller to configure the desired
output encoding for text-based sinks like XHTML.
> - Sink createSink( Writer writer )
> [...]
> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.
Makes sense, too. One can setup a writer on top of a stream but the
reverse is not possible. So it's sensible to not limit the API to
writers and support binary-based sinks.
It might however be convenient to create an AbstractTextSinkFactory from
which all/most text-based sinks could inherit. For instance,
XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
detail, how about
public abstract class AbstractTextSinkFactory
{
/**
* @param writer The writer for the sink output, never
<code>null</code>.
* @param encoding The character encoding used by the writer.
*/
protected abstract Sink createSink( Writer writer, String encoding )
throws IOException;
public Sink createSink( File outputDir, String outputName )
{
return createSink( outputDir, outputName, "UTF-8" );
}
public Sink createSink( File outputDir, String outputName, String
encoding )
{
// check args, create out dir, yadayada
Writer writer = WriterFactory.newWriter( new File( outputDir,
outputName ), encoding );
return createSink( writer, encoding );
}
}
Based on this, subclasses could be trimmed down to
public class XHtmlSinkFactory
{
protected abstract Sink createSink( Writer writer, String encoding )
{
return new XhtmlSink( writer, encoding );
}
}
The encoding parameter passed in here is apparently only informative. It
would merely allow the sinks to determine the encoding, e.g. for the XML
declaration.
> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).
Unless we actually have need for this (testing with in-memory streams?),
I would keep the interface slim to ease its implementation.
Another question might be: Does the relation
1 sink -> 1 output file/stream
always hold or are there formats that might need to output multiple
files? In the later case, createSink( OutputStream ) would be troublesome.
Benjamin
Re: (output) encoding support in doxia-sink-api
Posted by Vincent Siveton <vi...@gmail.com>.
Other comments which maybe related to this thread:
* Parser needs at least 2 parsing: one for macro and another one for
processing, and now a third one to validate XML (DOXIA-263)
* Sink uses some time StringWriter to play with the writed content and
to create a valid content (DOXIA-177)
So, maybe it will more easy to have directly String in input/output
instead of stream.
Cheers,
Vincent
2008/11/7 Vincent Siveton <vi...@gmail.com>:
> Hi,
>
> 2008/11/7 Hervé BOUTEMY <he...@free.fr>:
>> Hi folks,
>>
>> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
>> interface to improve encoding support:
>> - Sink createSink( File outputDir, String outputName, String encoding )
>> - Sink createSink( Writer writer )
>> See [1] for the full interface.
>
> +1
>
>>
>> I worked with Vincent to implement output encoding in Doxia, and we faced
>> problems that lead us think that forcing a fixed encoding was the right
>> approach to have something simple and reliable: with UTF-8 as this fixed
>> encoding, this didn't limit end-users from any country in the world.
>>
>> But now, I'm convinced it's not the right approach and API:
>> 1. some formats need to output the encoding (like HTML, or XML): we need an
>> encoding parameter, as we can't get it from a Writer instance
>
> see also DOXIA-185
>
>> 2. some formats embed images, like RTF or PDF, then need direct stream access
>> to write binary data
>>
>> Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.
>
>> Or if we want an API without filename, this method could be transformed into
>> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
>> output, String encoding ).
>
> +1
>
> Thanks Hervé for that.
>
> Vincent
>
>> Any objection or idea?
>>
>> Hervé
>>
>> [1]
>> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>>
>
Re: (output) encoding support in doxia-sink-api
Posted by Hervé BOUTEMY <he...@free.fr>.
> > Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.
this method was added in 1.0-beta-1-SNAPSHOT: 1.0-beta-1 hasn't been released
yet. Is there really any backward compatibility issue here?
Re: (output) encoding support in doxia-sink-api
Posted by Vincent Siveton <vi...@gmail.com>.
2008/11/7 Benjamin Bentmann <be...@udo.edu>:
> Vincent Siveton wrote:
>
>>> Then I think "Sink createSink( Writer writer )" should be removed.
>>
>> I prefer deprecated for backward compatibility issue.
>
> The source code says "@since 1.0-beta-1", i.e. this method was never part of
> a released Doxia version, right? Then we should be able to safely removed
> it. Having both "@since 1.0-beta-1" and "@deprecated since 1.0-beta-1" on a
> method would be quite strange ;-)
mmh seems logical :)
Vincent
>
> Benjamin
>
Re: (output) encoding support in doxia-sink-api
Posted by Benjamin Bentmann <be...@udo.edu>.
Vincent Siveton wrote:
>> Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.
The source code says "@since 1.0-beta-1", i.e. this method was never
part of a released Doxia version, right? Then we should be able to
safely removed it. Having both "@since 1.0-beta-1" and "@deprecated
since 1.0-beta-1" on a method would be quite strange ;-)
Benjamin
Re: (output) encoding support in doxia-sink-api
Posted by Vincent Siveton <vi...@gmail.com>.
Hi,
2008/11/7 Hervé BOUTEMY <he...@free.fr>:
> Hi folks,
>
> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )
> - Sink createSink( Writer writer )
> See [1] for the full interface.
+1
>
> I worked with Vincent to implement output encoding in Doxia, and we faced
> problems that lead us think that forcing a fixed encoding was the right
> approach to have something simple and reliable: with UTF-8 as this fixed
> encoding, this didn't limit end-users from any country in the world.
>
> But now, I'm convinced it's not the right approach and API:
> 1. some formats need to output the encoding (like HTML, or XML): we need an
> encoding parameter, as we can't get it from a Writer instance
see also DOXIA-185
> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.
I prefer deprecated for backward compatibility issue.
> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).
+1
Thanks Hervé for that.
Vincent
> Any objection or idea?
>
> Hervé
>
> [1]
> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>