You are viewing a plain text version of this content. The canonical link for it is here.

Posted to docs@httpd.apache.org by Joshua Slive <jo...@slive.ca> on 2003/01/09 21:20:13 UTC

Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)

On Tue, 31 Dec 2002, André Malo wrote:
> Whohoo!
> ok, I think the current stuff is now applicable (but requires some further
> work :)
> You can get an impression at <http://test.perlig.de/manual/>.
>
> All of our XML source files got a PDF pendant _optimized for print_.

This is very nice.  BUT, I think the biggest gain from PDF (and what most
users want) comes from having one big PDF file with the entire
documentation.  There is probably a way to do that by assembling the
smaller ones with another program.

The available languages thing is also very nice.  Can you be a little more
specific about what changes we need to make to have that work?

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by Erik Abele <er...@codefaktor.de>.

> Von: Astrid Keßler <ke...@kess-net.de>
> Antworten an: docs@httpd.apache.org
> Datum: Fri, 10 Jan 2003 10:01:23 +0100
> An: docs@httpd.apache.org
> Betreff: Re: PDF transforms
> 
>> We can easily do that with our current build system. We will just need to
>> render all the single XML docs into one big XML doc. Then we can transform
>> this one doc into FOP and then into PDF.
> 
> This is a nice idea. It has a lot of advantages for creating a big PDF
> file. If will make it easy to create a table of contents. But maybe we will
> run out of memory for some things which need a walk through the whole
> document. This is not only the toc but also the footnotes, if we keep them.
> We should try it.
> 

Yes for sure, this will be a memory-eater but we will have to try... I have
done this before on some other (customer-related) projects and it went quite
well, but our docs tree is much more complicated.

We will definitely have to find a way which prevents us from the current
memory problems. I haven't had enough time to look through your and nd's
build.xml patches; perhaps they will help to accomplish this task.

cheers,
Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by Astrid Keßler <ke...@kess-net.de>.

> We can easily do that with our current build system. We will just need to
> render all the single XML docs into one big XML doc. Then we can transform
> this one doc into FOP and then into PDF. 

This is a nice idea. It has a lot of advantages for creating a big PDF 
file. If will make it easy to create a table of contents. But maybe we will 
run out of memory for some things which need a walk through the whole 
document. This is not only the toc but also the footnotes, if we keep them. 
We should try it. 

 Kess

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)

Posted by Erik Abele <er...@codefaktor.de>.

Joshua Slive wrote:
> On Tue, 31 Dec 2002, André Malo wrote:
> 
>>Whohoo!
>>ok, I think the current stuff is now applicable (but requires some further
>>work :)
>>You can get an impression at <http://test.perlig.de/manual/>.
>>
>>All of our XML source files got a PDF pendant _optimized for print_.
> 
> This is very nice.  BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation.  There is probably a way to do that by assembling the
> smaller ones with another program.

We can easily do that with our current build system. We will just need to render all the single XML docs into one big XML doc. Then we can transform this one doc into FOP and then into PDF. Probably we can use sitemap.xml plus an easy XSL to wrap all the single docs into the big one. The transformation into FOP/PDF should be similar to the current transformation of the single docs, except for an index of contents and the linkage in the big file plus some other nice features...?

Andre, sorry for my delay, I haven't forgotten this one :-) I'm just out of time as always, but i promise to get at it at the weekend!

cheers,
erik

> The available languages thing is also very nice.  Can you be a little more
> specific about what changes we need to make to have that work?
> 
> Joshua.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: Selectable languages

Posted by André Malo <nd...@perlig.de>.

* Joshua Slive wrote:

> Some thoughts:
> 
> 1. Having a perl script generate the metafiles is not a big deal.  We
> don't add or change files very often, so really the perl script can just
> be used for the initial change and we can even do it by hand after that.

ah right. This was mainly a rudiment from earlier trials. It does an 
additional dependency checking anyway. (metafile change -> touch all 
language variants). If we use the foreach task (see the posting about the 
out-of-memory/xalan-cache fix), we get the possibility to define a 
<dependset> that does that work (since we can determine every single 
filename)

> 2. Having the xml docs reference the metafile is not nice, but I agree
> probably unavoidable.  Again, it is a one-time thing for each doc, so we
> can live with it.

yep, exactly my thought.
By the way, the metafile can contain the relativepath, too (i.e. currently 
it does). So we can omit this particular element from the actual document 
sourcecode, which is a good thing, IMHO (since it actually describes 
metadata).

> 3. Once we have the metafiles, we can use some fancy xslt to generate
> mod_negotiation type maps.  This should be a real performance and
> simplicity improvement.

yes (performance) and yes/no (simplicity); see below.

> 4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
> easier distribution:
> 
> manual/file1.meta.xml
> manual/file1.html (generated type-map)
> manual/file2.meta.xml
> manual/file2.html (generated type-map)
> manual/en/file1.xml
> manual/en/file1.html (generated html)
> manual/en/file2.xml
> manual/en/file2.html (generated html)
> manual/de/file1.xml
> manual/de/file1.html (generated html)
> manual/de/file2.xml
> manual/de/file2.html (generated html)
> 
> Now all references in the html are relative.  By default, the url looks
> like http://example.com/manual/file1.html which hits the typemap for
> content-negotiation.  A relative link "<a href="file2.html">" keeps
> you in the type-map file directory.  Now file1.xml contains auto-generated
> links (generated by looking at file1.meta.xml) to each language
> specific version (<a href="../en/file1.html"> and <a
> href="../de/file1.html">).  Once you are at the
> http://example.com/de/file1.html, then all (relative) links keep you under
> the de/ tree and no content negotiation occurs.
> 
> This, of course, requires a
> AddType type-map .var
> and a RemoveType in each language sub-directory.
> 
> Is this a good idea?

Partially. I'm very +1 on type maps in general, since it improves the 
performance much (the more files are present in the current directory). 
However, it doesn't solve all problems:

- if you're in the negotiated "root" branch (which actually uses files from 
  a particular language branch), the switch to an explicit language branch 
  won't work by the relative link. Currently (with the rewrite rules) it's 
  solved the other way round. The links are considered to be relative to 
  the "root" branch, thus you get within a language branch links like 
  "/manual/de/en/foo.html", when switching from de-branch to en-branch. 
  This can be detected by a RewriteRule (currently) or a RedirectMatch.
- The source files (meaning CSS and images etc.) have to be copied into 
  every branch or have to be aliased.
- At the moment (and for a long future time, I guess) the documents are not 
  translated entirely. This could be solved by turning on Multiviews within 
  the language subdirectories and use (also autogenerated) .html.var files 
  instead of the not-yet-translated .html files (need different extensions 
  in order to catch them). Multiviews detects the type-map and evaluates 
  it. (hmm, after some further thinking, we have also the problem with 
  diffenrent charsets, so the mutliviews are probably really neccessary - 
  or we configure them statically, i.e. one Location or Directory section 
  for every language *hmpf*)

So a complete solution with separated directories would be something like 
this:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
...
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html.var (not transl., generated type-map)

the links to switch to another language would point to ./en/file.html 
instead of ../en/file.html (etc).
The config would be about the following: [untested, just written down]

# alias languaged /images/ and /style/
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/(?:images|style).*)$ \
            /path/to/manual$1

# alias the rest
Alias /manual /path/to/manual

# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)?$ \
               /manual/$1$2

<Location /manual/>
  AddHandler type-map .html
  [... general stuff ...]
</Location>

<Location ~ ^/+manual/+(en|de|ru|ja|pt-br)>
  RemoveHandler .html
  AddHandler type-map .var
  Options +MultiViews
</Location>

-----------

Said all of that, the problem rolled round in my head, too. I think, it's 
worth to present it here, too, grabbing an opinion :). The result of my 
thoughts was an extension of mod_negotiation. It would introduce a new 
special variable, say "prefer-language", evaluated my mod_negotiation in 
the way, that it would at first try to serve the preferred language and if 
not possible negotiate over all variants. (A similar manipulation like 
"no-gzip" and "gzip-only-text/html" for mod_deflate do). That solution 
would rely on holding the different languages within the same directory 
again. The configuration would be similar to the above one, but we wouldn't 
need such a lot of files, for example:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file1.html.en
manual/file1.html.de
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/file2.html.en
...

httpd.conf:

# alias manual directory (an virtual language dirs.)
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/.*)? \
            /path/to/manual$1

# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)? \
               /manual/$1$2

<Directory /path/to/manual>
  AddHandler type-map .html
  SetEnvIf Request_URI ^/manual/(en|de|ru|ja|pt-br)(?:/.*)? \
           prefered-language=$1

  <Files *.html.*>
    RemoveHandler .html
  </Files>

  [... general stuff ...]
</Directory>

The prefer-language feature should be easy to implement (afaics, just took 
a look into the mod_neg. code). Capturing results within setenvif also 
doesn't work at the moment, but should also be no problem to build in (and 
would be useful in general, too). I think, I'm able to code it up, but both 
changes probably have to be approved by some of the developers.

Sorry for the long post, I couldn't get it shorter ;-)

nd
-- 
Treat your password like your toothbrush. Don't let anybody else
use it, and get a new one every six months.  -- Clifford Stoll

                                    (found in ssl_engine_pphrase.c)

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Selectable languages (was Re: PDF transforms)

Posted by Joshua Slive <jo...@slive.ca>.

On Sat, 11 Jan 2003, André Malo wrote:
> yep. As said before, the printer friendly pdf files were more an exercise
> to become familiar with the xsl-fo stuff. My idea to assemble them to a big
> one is similar to Erik's. I've just thought to transform the fo-files
> first, then collect them via a script or a java task (which has to be
> written by someone with java knowledge ;-) and put the names into an xml
> files and run another transformation over this file, which finally feeds
> fop. voila. (hopefully ;-)

Sounds good.  +1 from me.

>
> > The available languages thing is also very nice.  Can you be a little more
> > specific about what changes we need to make to have that work?
>
> Oh, I already was (some weeks ago). Seems, the posting disappeared in the
> noise ;-)

I haven't been paying much attention lately.

Some thoughts:

1. Having a perl script generate the metafiles is not a big deal.  We
don't add or change files very often, so really the perl script can just
be used for the initial change and we can even do it by hand after that.

2. Having the xml docs reference the metafile is not nice, but I agree
probably unavoidable.  Again, it is a one-time thing for each doc, so we
can live with it.

3. Once we have the metafiles, we can use some fancy xslt to generate
mod_negotiation type maps.  This should be a real performance and
simplicity improvement.

4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
easier distribution:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/en/file1.xml
manual/en/file1.html (generated html)
manual/en/file2.xml
manual/en/file2.html (generated html)
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html (generated html)

Now all references in the html are relative.  By default, the url looks
like http://example.com/manual/file1.html which hits the typemap for
content-negotiation.  A relative link "<a href="file2.html">" keeps
you in the type-map file directory.  Now file1.xml contains auto-generated
links (generated by looking at file1.meta.xml) to each language
specific version (<a href="../en/file1.html"> and <a
href="../de/file1.html">).  Once you are at the
http://example.com/de/file1.html, then all (relative) links keep you under
the de/ tree and no content negotiation occurs.

This, of course, requires a
AddType type-map .var
and a RemoveType in each language sub-directory.

Is this a good idea?

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by André Malo <nd...@perlig.de>.

* Joshua Slive wrote:

> This is very nice.  BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation.  There is probably a way to do that by assembling the
> smaller ones with another program.

yep. As said before, the printer friendly pdf files were more an exercise 
to become familiar with the xsl-fo stuff. My idea to assemble them to a big 
one is similar to Erik's. I've just thought to transform the fo-files 
first, then collect them via a script or a java task (which has to be 
written by someone with java knowledge ;-) and put the names into an xml 
files and run another transformation over this file, which finally feeds 
fop. voila. (hopefully ;-)

> The available languages thing is also very nice.  Can you be a little more
> specific about what changes we need to make to have that work?

Oh, I already was (some weeks ago). Seems, the posting disappeared in the 
noise ;-)
However:
- We have to change the build system to create metafiles, that contain the
  available variants (languages, pdf files). This will happen 
  automatically. (but the metafiles have to be checked in, otherwise the 
  script has no chance to check the timestamp dependencies correctly)

  The perl script which maintains the metafiles could be rewritten in java, 
  so that we don't require perl for the build process. But again, this has 
  to be done by someone with java knowledge ;-)

- the metafiles have to be references anyway from within the xslt. There 
  are two possibilities to achieve this:
  * reference the particular metafile in a attribute in the document's 
    rootelement (metafile="documentbasename.meta")
  * inject the filename via ant into the transformation process.

  the latter would be initally more easy, but has some drawbacks: we loose 
  the last rest of browser compatibility, since the xslt relies on an 
  information that's only available, if we run the transformation with ant.
  Besides it requires the <foreach> task.
  Conclusion: I'd prefer the first variant. It seems to be more clean 
  anyway.

- We have to setup the rewrite rules on daedalus. (That is independant from 
  the changes above.) IMHO the rewrite map file itself should go into 
  style/lang, thus we can maintain it via CVS (but no errors should occur 
  then with that file, since it may break the whole daedalus apache...)

  But we have to decide, how to distribute the docs 
  within a release package. I'll require mod_rewrite for only viewing the 
  docs locally reluctantly. But currently I've no good idea how to solve 
  that problem.
  (Note that there are some pathes to adjust in the attached ruleset)

- some minor changes in CSS and XSLT, to build in the links.

I'm going to commit the neccessary changes (I think, at 2.1 docs for now, 
so that we can see, how it works, and port them back later), if the 
suggestions get acknowledged.
            ^^^ ? ;-)

nd
-- 
"Die Untergeschosse der Sempergalerie bleiben w�hrenddessen aus
 statistischen Gr�nden geflutet." -- Spiegel Online