You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Maxime Petazzoni <ma...@bulix.org> on 2005/07/03 04:46:56 UTC

mod_mbox development plan

Hi,

As you probably already know, a couple of students will be working on
ASF projects during the summer within Google's Summer Of Code
program. I have the chance to being one of the few selected for this
adventure, and I will be working on improving mod_mbox's interface.

Members of the mod_mbox development mailing list (mbox-dev@httpd) have
recently decided to shutdown the list and fall back to this one in
order to have a larger audience for comments and ideas. Today, I would
like to take advantage of this new audience and write a RFC for what
we could call a "development plan" for mod_mbox.

First, let's do a quick state of the art. The last version of mod_mbox
is httpd-mbox 0.1, released July 17, 2001 (according to project's
STATUS file). It's currently set up and serving ASF mailing lists
archives at http://mail-archives.apache.org/mod_mbox/ . mod_mbox only
serves archives for *one* mailing list : the "lists list" page is not
auto-generated.

The output generated by mod_mbox is very simple : basic XHTML 1.0
(non-validating), no Css stylesheet. It's neither user-friendly, nor
very usable.

The main goal of my SoC project is to enhance mod_mbox's interface by
using newer web development techniques and/or technologies, while
avoiding any noticeable slowdowns.

The first thing that has to be done is, of course, get rid of the hard
coded HTML and switch to something more flexible. Two main solutions
are available :

 - a template system such as ClearSilver
 - XML + XSLT

Since I already have a good knowledge of XML and XSLT, I chose these
technologies for mod_mbox's output. Making the necessary changes to
mod_mbox was a good introduction to it's source code, and I already
sent the resulting patch (also featuring email obfuscation).

The DTD is not yet written for mod_mbox's XML output format because
what has currently been done may need changes (I'm far from being an
expert in XML data semantics).

XML output brings about another question : where should the XSL
transformation be done ? Two solutions are available :

 - server side processing with mod_transform. No slowdown thanks to
the XSLT caching feature. This solution has two main drawbacks :

   * The client receives HTML code instead of XML, which will make
additional features such as dynamic interface (with AJAX) difficult or
impossible to implement (since we don't know what the DOM tree will
look like after the XSL transformation, we can't implement DOM dynamic
updates)

   * mod_transform is not part of the HTTPd project yet. It needs to
be compiled, installed and setup separately. Using server side
processing would bring a dependency to an "unofficial" 3rd party
module.

 - client side processing : just give the XML output and the XSLT to
the client's browser, and let it manage the transformation itself. On
the one hand, this solution allows nice (and wanted) features that
AJAX could provide ; on the other hand, it requires an XSLT capable
browser (Gecko-based).

In the light of the above, I personally prefer the client-side
processing solution. Anyway, I've tested and now know how to set up
both of these solutions.

The XML output is a good start, but it's not enough to make mod_mbox's
interface good enough : we need more interactivity, especially when
browsing mail threads. The AJAX (Asynchronous Javascript and XML)
development technique is an interesting solution to this need. By
providing a quick, neat and dynamic interface, it can make the archive
browsing more user-friendly.

I'm currently making some AJAX experiments (I've never used it before,
but I'm here to learn :) in order to figure out what it can do for
mod_mbox and how. I'm working on an interface mockup that I'll soon
submit for review, too.

There are some other things I'm planning (or willing) to do on
mod_mbox. I've attached my local and up-to-date STATUS file for
additional information.

Connection closed by brain.localdomain. That's all for tonight ! Ideas
and/or comments are welcome.

Regards,
- Maxime Petazzoni

-- 
Maxime Petazzoni (http://www.bulix.org)
 -- gone crazy, back soon. leave message.

Re: mod_mbox development plan

Posted by "Roy T. Fielding" <fi...@gbiv.com>.

On Jul 3, 2005, at 1:29 AM, Paul Querna wrote:
> I believe the core goal is separation of data from presentation.

No, the core goal is to provide an ultra-efficient interface to
large mail archives.  The data is already available in mbox form.

> The standard reply is to use some kind of template language.  A glue.  
> Let
> it be PHP, XSLT, ClearSilver, or any of the other hundreds out there.

XSLT isn't a template language -- it is a transformation engine.

> Why do we want glue?
>
> I believe that XHTML, no matter how masterfully done, is messy.  Its 
> not
> a good method to sort data into logical groupings.  Yes, CSS does allow
> some separation of the presentation, but I do not believe it is
> completely viable for all design desires.
>
> I believe that hand coded (X)HTML inside C source files is wrong.  It
> hurts developer time.  Why should I have to recompile my module, stop
> and start apache, just to change simple things like a font or the order
> of something?

Use CSS.  CSS files are template presentation mechanisms that can
be specialized on a per-site basis regardless of the data sent from
server to browser.  Have a look at WordPress to see how effective
this can be without requiring additional overhead on the server.
They use PHP as a templating mechanism, but most people ignore that
and simply modify the css files to show or hide data as they wish.

> This is where the jump to a specific solution from the goal came.  We
> could debate the possible solutions for months, even write our own
> language, but the simple answer is that none of them are perfect, but I
> believe most people will agree that doing raw XHTML even with masterful
> CSS is painful.

I don't agree.  It is far less painful than doing XML with browser
negotiation and high-overhead transforms on the server-side.  We
are not talking about an arbitrary data interface wherein the admin
needs to configure the placement and names of data fields. Mail
messages have their own naming mechanism and data items can be
selectively configured for delivery using a simple configuration
mechanism.  The "template" mechanism can be limited to header and
footer files without any loss of generality to the user.

> Alternatives welcome. We already have a patch that does XML+XSLT.

Agreed.  Just keep an open mind, try the alternatives, and be prepared
to defend the implementation on the basis of its performance.

> Just because we use XSL for some parts, it doesn't mean that we have to
> apply it to all output.  Data meant for ajax can easily be passed
> through, and not converted to XHTML.

Right, the ajax interface can be a completely separate tree of URIs
that only needs to worry about its particular style of index/threading.

>> +1 to client-side.
>
> Good Luck.  I don't want to spend the summer testing XSL support on 6 
> or
> 7 different browsers/versions.  This is not even counting alternative
> browsers like Lynx, which have no support at all for XSLT on the client
> side.

XSL support?  Forget that -- I was talking about CSS and ajax.
XHTML works on all relevant browsers and looks as good as it gets
on Lynx/links.  We only have to make it look pretty on Firefox,
Safari, and Konqueror.  And there is no problem at all if the ajax
stuff only works on Firefox.

> Doing this kind of thing server-side gets us to (X)HTML. That is a
> better known quantity. I believe there is a better expectation for that
> to work on all browsers.
>
> Regardless of how we get there, I think the browser should be
> downloading XHTML.

Then it doesn't make any sense to use XSLT.  I wouldn't use a
sledge-hammer for tapping nails.

....Roy

Re: mod_mbox development plan

Posted by "Roy T. Fielding" <fi...@gbiv.com>.

On Jul 3, 2005, at 4:18 AM, Maxime Petazzoni wrote:
> +1. XML has the particularity to describe only the data, as opposed to
> XHTML which stores data *and* structure information.

No, XML is a data structuring (mark-up) metalanguage.  The only
difference between arbitrary XML and XHTML is that XHTML elements
have names that are backwards-compatible with HTML presentation
semantics, which provide an appropriate default presentation for
those browsers that do not support CSS.  The field names are placed
in the class attribute instead of the element name.  An XML or CSS
processing engine doesn't care which one is used -- they are
equally expressive and they are both XML.

> XML semantics
> allows us to represent mailing list data with mailing list data
> semantics instead of web page semantics. Given mailing list semantics
> in the problem space, you can then transform to any solution space
> semantics using XSL.

There are no XML semantics -- it is just a structuring language with
a bit of hierarchical containment.  An RDF/XML interface would
provide some added value, but that's even more of a mess.

> According to me, this is the cleaner way of doing things (regarding
> the module's source code).

Why don't you just make the elements table-driven?

>>> Don't bother with the DTD -- just be sure it is well-formed.
>
> Of course I will :) Writing a DTD may only be done when we'll have
> settled on what we output (and how), and when I'll have time to do so
> (or during a boring afternoon, something like this).

DTDs have no useful purpose in XML -- they are not extensible and
do not comprehend namespaces.  If you really want a formalism, then
define a schema (RELAX-NG or XML Schema).

....Roy

Re: mod_mbox development plan

Posted by Maxime Petazzoni <ma...@bulix.org>.

Hi,

> I believe the core goal is separation of data from presentation.  The
> standard reply is to use some kind of template language.  A glue.  Let
> it be PHP, XSLT, ClearSilver, or any of the other hundreds out there.
> 
> Why do we want glue?
> 
> I believe that XHTML, no matter how masterfully done, is messy.  Its not
> a good method to sort data into logical groupings.  Yes, CSS does allow
> some separation of the presentation, but I do not believe it is
> completely viable for all design desires.
> 
> I believe that hand coded (X)HTML inside C source files is wrong.  It
> hurts developer time.  Why should I have to recompile my module, stop
> and start apache, just to change simple things like a font or the order
> of something?

+1. XML has the particularity to describe only the data, as opposed to
XHTML which stores data *and* structure information. XML semantics
allows us to represent mailing list data with mailing list data
semantics instead of web page semantics. Given mailing list semantics
in the problem space, you can then transform to any solution space
semantics using XSL.

According to me, this is the cleaner way of doing things (regarding
the module's source code).

> Alternatives welcome. We already have a patch that does XML+XSLT.

Except from the small XML structure change you suggested (removing
year grouping in mailing list month index, and detect it while doing
the XSL transformation), we have the patch.

> > Don't bother with the DTD -- just be sure it is well-formed.

Of course I will :) Writing a DTD may only be done when we'll have
settled on what we output (and how), and when I'll have time to do so
(or during a boring afternoon, something like this).

> >>    * The client receives HTML code instead of XML, which will make
> >> additional features such as dynamic interface (with AJAX) difficult or
> >> impossible to implement (since we don't know what the DOM tree will
> >> look like after the XSL transformation, we can't implement DOM dynamic
> >> updates)
> 
> Just because we use XSL for some parts, it doesn't mean that we have to
> apply it to all output.  Data meant for ajax can easily be passed
> through, and not converted to XHTML.

I've made another AJAX experiment in order to check out on what DOM
our Javascript functions would operate when we send XML : is it on the
XML node tree, or on the XHTML resulting tree (after the XSL
transformation) ?

You can check it out at
http://skikda.bulix.org/~sam/ajax/exp-2/index.xml . It's the same as
the first one, except that I use an XML document and an XSL stylesheet
instead of direct XHTML.

The result is that Javascript operates on the XHTML tree, so after
every XSL transformation is done. This makes my client-side processing
argument absolutely void :)

> There are two type of caches here.  One is of the parsed XSL File.
> Another is a layer provided by mod_cache.  The first one should have
> an excellent hit rate, while like you mention, mod_cache would not.
> No slow down isn't truthful compared to a static file, but
> mod_transform can be pretty fast when the XSL cache is active.

+1 for the cache explanation. I was of course talking about the XSL
cache provided by mod_transform.

> Doing this kind of thing server-side gets us to (X)HTML. That is a
> better known quantity. I believe there is a better expectation for that
> to work on all browsers.
> 
> Regardless of how we get there, I think the browser should be
> downloading XHTML.

Well, finally, I don't know which solution is best. If mod_transform
becomes part of HTTPd by the end of the summer, then I surely prefer
server-side processing. But if it does not, the 3rd party module
dependency will make me prefer the client-side processing solution.

- Sam

-- 
Maxime Petazzoni (http://www.bulix.org)
 -- gone crazy, back soon. leave message.

Re: mod_mbox development plan

Posted by Paul Querna <ch...@force-elite.com>.

Roy T. Fielding wrote:
> On Jul 2, 2005, at 7:46 PM, Maxime Petazzoni wrote:
> 
>> The main goal of my SoC project is to enhance mod_mbox's interface by
>> using newer web development techniques and/or technologies, while
>> avoiding any noticeable slowdowns.
> 
> 
> That's good, but keep in mind the general design principle to
> avoid doing things on the server that could be done on the client.
> 
>> The first thing that has to be done is, of course, get rid of the hard
>> coded HTML and switch to something more flexible. Two main solutions
>> are available :
>>
>>  - a template system such as ClearSilver
>>  - XML + XSLT
> 
> 
> I guess I am having a hard time bridging that leap from goal
> to a fairly specific solution.  What are the real advantages of
> being flexible on the server?  I mean, as opposed to simply using
> a fixed data format with XHTML and class names?

I believe the core goal is separation of data from presentation.  The
standard reply is to use some kind of template language.  A glue.  Let
it be PHP, XSLT, ClearSilver, or any of the other hundreds out there.

Why do we want glue?

I believe that XHTML, no matter how masterfully done, is messy.  Its not
a good method to sort data into logical groupings.  Yes, CSS does allow
some separation of the presentation, but I do not believe it is
completely viable for all design desires.

I believe that hand coded (X)HTML inside C source files is wrong.  It
hurts developer time.  Why should I have to recompile my module, stop
and start apache, just to change simple things like a font or the order
of something?

This is where the jump to a specific solution from the goal came.  We
could debate the possible solutions for months, even write our own
language, but the simple answer is that none of them are perfect, but I
believe most people will agree that doing raw XHTML even with masterful
CSS is painful.

Alternatives welcome. We already have a patch that does XML+XSLT.

>> Since I already have a good knowledge of XML and XSLT, I chose these
>> technologies for mod_mbox's output. Making the necessary changes to
>> mod_mbox was a good introduction to it's source code, and I already
>> sent the resulting patch (also featuring email obfuscation).
>>
>> The DTD is not yet written for mod_mbox's XML output format because
>> what has currently been done may need changes (I'm far from being an
>> expert in XML data semantics).
> 
> 
> Don't bother with the DTD -- just be sure it is well-formed.
> 
>> XML output brings about another question : where should the XSL
>> transformation be done ? Two solutions are available :
>>
>>  - server side processing with mod_transform. No slowdown thanks to
>> the XSLT caching feature. This solution has two main drawbacks :
>>
>>    * The client receives HTML code instead of XML, which will make
>> additional features such as dynamic interface (with AJAX) difficult or
>> impossible to implement (since we don't know what the DOM tree will
>> look like after the XSL transformation, we can't implement DOM dynamic
>> updates)

Just because we use XSL for some parts, it doesn't mean that we have to
apply it to all output.  Data meant for ajax can easily be passed
through, and not converted to XHTML.

>>
>>    * mod_transform is not part of the HTTPd project yet. It needs to
>> be compiled, installed and setup separately. Using server side
>> processing would bring a dependency to an "unofficial" 3rd party
>> module.

For what its worth, the holders of the copyright on that module are all
HTTPD committers. I don't believe it would be a significant issue to
relicense it if people felt it was necessary.

> 
> Also, "no slowdown due to caching" is only applicable for request
> patterns that involve frequent duplication of a small set.  It doesn't
> work that way when the archive contains several million messages,
> since the cache hit rate will be too low to compensate for the
> transform cost.

There are two type of caches here.  One is of the parsed XSL File.
Another is a layer provided by mod_cache.  The first one should have an
excellent hit rate, while like you mention, mod_cache would not.  No
slow down isn't truthful compared to a static file, but mod_transform
can be pretty fast when the XSL cache is active.

>>  - client side processing : just give the XML output and the XSLT to
>> the client's browser, and let it manage the transformation itself. On
>> the one hand, this solution allows nice (and wanted) features that
>> AJAX could provide ; on the other hand, it requires an XSLT capable
>> browser (Gecko-based).
> 
> 
> This is not a problem with XHTML.
> 
>> In the light of the above, I personally prefer the client-side
>> processing solution. Anyway, I've tested and now know how to set up
>> both of these solutions.
> 
> 
> +1 to client-side.

Good Luck.  I don't want to spend the summer testing XSL support on 6 or
7 different browsers/versions.  This is not even counting alternative
browsers like Lynx, which have no support at all for XSLT on the client
side.

Doing this kind of thing server-side gets us to (X)HTML. That is a
better known quantity. I believe there is a better expectation for that
to work on all browsers.

Regardless of how we get there, I think the browser should be
downloading XHTML.

-Paul

Re: mod_mbox development plan

Posted by Nick Kew <ni...@webthing.com>.

Roy T.Fielding wrote:
> On Jul 2, 2005, at 7:46 PM, Maxime Petazzoni wrote:
>
>> The first thing that has to be done is, of course, get rid of the hard
>> coded HTML and switch to something more flexible.

Careful with "getting rid".  I haven't looked at it myself, but I wonder
if cleaning it up and keeping it as an option at least would be
preferable.

>	 Two main solutions
>> are available :
>>
>>  - a template system such as ClearSilver
>>  - XML + XSLT
> 
> 
> I guess I am having a hard time bridging that leap from goal
> to a fairly specific solution.  What are the real advantages of
> being flexible on the server?  I mean, as opposed to simply using
> a fixed data format with XHTML and class names?

That could be the basis for a "best of both worlds" approach:

(1) An output format that is basically clean XHTML with hooks for CSS.
(2) Where that is considered inadequate, expand it with your own
    elements from other namespaces such as FoaF and DC, or if
    necessary your own invention.  Try to make them optional!
(3) Now this can be the basis for your XML+XSLT approach.  But it
    leaves server admins with more flexibility than that, and
    those who want to avoid the overhead of XSLT have the option
    to do so - e.g. using the XMLNS framework.

> [chop]

-- 
Nick Kew

Re: mod_mbox development plan

Posted by "Roy T. Fielding" <fi...@gbiv.com>.

On Jul 2, 2005, at 7:46 PM, Maxime Petazzoni wrote:

> The main goal of my SoC project is to enhance mod_mbox's interface by
> using newer web development techniques and/or technologies, while
> avoiding any noticeable slowdowns.

That's good, but keep in mind the general design principle to
avoid doing things on the server that could be done on the client.

> The first thing that has to be done is, of course, get rid of the hard
> coded HTML and switch to something more flexible. Two main solutions
> are available :
>
>  - a template system such as ClearSilver
>  - XML + XSLT

I guess I am having a hard time bridging that leap from goal
to a fairly specific solution.  What are the real advantages of
being flexible on the server?  I mean, as opposed to simply using
a fixed data format with XHTML and class names?

> Since I already have a good knowledge of XML and XSLT, I chose these
> technologies for mod_mbox's output. Making the necessary changes to
> mod_mbox was a good introduction to it's source code, and I already
> sent the resulting patch (also featuring email obfuscation).
>
> The DTD is not yet written for mod_mbox's XML output format because
> what has currently been done may need changes (I'm far from being an
> expert in XML data semantics).

Don't bother with the DTD -- just be sure it is well-formed.

> XML output brings about another question : where should the XSL
> transformation be done ? Two solutions are available :
>
>  - server side processing with mod_transform. No slowdown thanks to
> the XSLT caching feature. This solution has two main drawbacks :
>
>    * The client receives HTML code instead of XML, which will make
> additional features such as dynamic interface (with AJAX) difficult or
> impossible to implement (since we don't know what the DOM tree will
> look like after the XSL transformation, we can't implement DOM dynamic
> updates)
>
>    * mod_transform is not part of the HTTPd project yet. It needs to
> be compiled, installed and setup separately. Using server side
> processing would bring a dependency to an "unofficial" 3rd party
> module.

Also, "no slowdown due to caching" is only applicable for request
patterns that involve frequent duplication of a small set.  It doesn't
work that way when the archive contains several million messages,
since the cache hit rate will be too low to compensate for the
transform cost.

>  - client side processing : just give the XML output and the XSLT to
> the client's browser, and let it manage the transformation itself. On
> the one hand, this solution allows nice (and wanted) features that
> AJAX could provide ; on the other hand, it requires an XSLT capable
> browser (Gecko-based).

This is not a problem with XHTML.

> In the light of the above, I personally prefer the client-side
> processing solution. Anyway, I've tested and now know how to set up
> both of these solutions.

+1 to client-side.

> The XML output is a good start, but it's not enough to make mod_mbox's
> interface good enough : we need more interactivity, especially when
> browsing mail threads. The AJAX (Asynchronous Javascript and XML)
> development technique is an interesting solution to this need. By
> providing a quick, neat and dynamic interface, it can make the archive
> browsing more user-friendly.
>
> I'm currently making some AJAX experiments (I've never used it before,
> but I'm here to learn :) in order to figure out what it can do for
> mod_mbox and how. I'm working on an interface mockup that I'll soon
> submit for review, too.

That sounds like a good plan to me.

....Roy