You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by Erik Abele <er...@codefaktor.de> on 2002/12/30 23:28:45 UTC
PDF transforms, was Re: Stop shipping XML
André Malo wrote:
> Erik Abele wrote:
>>André Malo wrote:
>>
>>>For example, I'm currently working on PDF for print using fop (will
>>>introduce them today evening or tomorrow, whenever it gets ready). The html
>>>files are not parseable by the processor and therefore get no pdf.
>
>>Great, I was working on this two months ago but than I ran totally out of
>>time. Do you plan to generate one big pdf document which contains all the
>>xml sources or several pdf docs?
>
> The first step was to learn xsl-fo and the limitations of fop (*sigh*).
> The current stage consists of a pdf file per document - optimized for
> print. There are just some final nits, that I'm currently picking.
>
cool...I'm keen on seeing the first pages ... will have more time the next days and would really like to help picking out some nits :)
> The next stage then can base on this work and merge the stuff together.
> (But there are still some problems to solve before.)
>
Okay, ... any special problems? Perhaps I can help. The merging into one big doc shouldn't be a problem, except for a nice sidebar with all the links (probably taken from sitemap.xml) to directly hop throug the doc; this part could be a bit harder.
hmmm...but lets wait for a working base...
cheers,
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: Selectable languages
Posted by André Malo <nd...@perlig.de>.
* Joshua Slive wrote:
> Some thoughts:
>
> 1. Having a perl script generate the metafiles is not a big deal. We
> don't add or change files very often, so really the perl script can just
> be used for the initial change and we can even do it by hand after that.
ah right. This was mainly a rudiment from earlier trials. It does an
additional dependency checking anyway. (metafile change -> touch all
language variants). If we use the foreach task (see the posting about the
out-of-memory/xalan-cache fix), we get the possibility to define a
<dependset> that does that work (since we can determine every single
filename)
> 2. Having the xml docs reference the metafile is not nice, but I agree
> probably unavoidable. Again, it is a one-time thing for each doc, so we
> can live with it.
yep, exactly my thought.
By the way, the metafile can contain the relativepath, too (i.e. currently
it does). So we can omit this particular element from the actual document
sourcecode, which is a good thing, IMHO (since it actually describes
metadata).
> 3. Once we have the metafiles, we can use some fancy xslt to generate
> mod_negotiation type maps. This should be a real performance and
> simplicity improvement.
yes (performance) and yes/no (simplicity); see below.
> 4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
> easier distribution:
>
> manual/file1.meta.xml
> manual/file1.html (generated type-map)
> manual/file2.meta.xml
> manual/file2.html (generated type-map)
> manual/en/file1.xml
> manual/en/file1.html (generated html)
> manual/en/file2.xml
> manual/en/file2.html (generated html)
> manual/de/file1.xml
> manual/de/file1.html (generated html)
> manual/de/file2.xml
> manual/de/file2.html (generated html)
>
> Now all references in the html are relative. By default, the url looks
> like http://example.com/manual/file1.html which hits the typemap for
> content-negotiation. A relative link "<a href="file2.html">" keeps
> you in the type-map file directory. Now file1.xml contains auto-generated
> links (generated by looking at file1.meta.xml) to each language
> specific version (<a href="../en/file1.html"> and <a
> href="../de/file1.html">). Once you are at the
> http://example.com/de/file1.html, then all (relative) links keep you under
> the de/ tree and no content negotiation occurs.
>
> This, of course, requires a
> AddType type-map .var
> and a RemoveType in each language sub-directory.
>
> Is this a good idea?
Partially. I'm very +1 on type maps in general, since it improves the
performance much (the more files are present in the current directory).
However, it doesn't solve all problems:
- if you're in the negotiated "root" branch (which actually uses files from
a particular language branch), the switch to an explicit language branch
won't work by the relative link. Currently (with the rewrite rules) it's
solved the other way round. The links are considered to be relative to
the "root" branch, thus you get within a language branch links like
"/manual/de/en/foo.html", when switching from de-branch to en-branch.
This can be detected by a RewriteRule (currently) or a RedirectMatch.
- The source files (meaning CSS and images etc.) have to be copied into
every branch or have to be aliased.
- At the moment (and for a long future time, I guess) the documents are not
translated entirely. This could be solved by turning on Multiviews within
the language subdirectories and use (also autogenerated) .html.var files
instead of the not-yet-translated .html files (need different extensions
in order to catch them). Multiviews detects the type-map and evaluates
it. (hmm, after some further thinking, we have also the problem with
diffenrent charsets, so the mutliviews are probably really neccessary -
or we configure them statically, i.e. one Location or Directory section
for every language *hmpf*)
So a complete solution with separated directories would be something like
this:
manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
...
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html.var (not transl., generated type-map)
the links to switch to another language would point to ./en/file.html
instead of ../en/file.html (etc).
The config would be about the following: [untested, just written down]
# alias languaged /images/ and /style/
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/(?:images|style).*)$ \
/path/to/manual$1
# alias the rest
Alias /manual /path/to/manual
# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)?$ \
/manual/$1$2
<Location /manual/>
AddHandler type-map .html
[... general stuff ...]
</Location>
<Location ~ ^/+manual/+(en|de|ru|ja|pt-br)>
RemoveHandler .html
AddHandler type-map .var
Options +MultiViews
</Location>
-----------
Said all of that, the problem rolled round in my head, too. I think, it's
worth to present it here, too, grabbing an opinion :). The result of my
thoughts was an extension of mod_negotiation. It would introduce a new
special variable, say "prefer-language", evaluated my mod_negotiation in
the way, that it would at first try to serve the preferred language and if
not possible negotiate over all variants. (A similar manipulation like
"no-gzip" and "gzip-only-text/html" for mod_deflate do). That solution
would rely on holding the different languages within the same directory
again. The configuration would be similar to the above one, but we wouldn't
need such a lot of files, for example:
manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file1.html.en
manual/file1.html.de
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/file2.html.en
...
httpd.conf:
# alias manual directory (an virtual language dirs.)
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/.*)? \
/path/to/manual$1
# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)? \
/manual/$1$2
<Directory /path/to/manual>
AddHandler type-map .html
SetEnvIf Request_URI ^/manual/(en|de|ru|ja|pt-br)(?:/.*)? \
prefered-language=$1
<Files *.html.*>
RemoveHandler .html
</Files>
[... general stuff ...]
</Directory>
The prefer-language feature should be easy to implement (afaics, just took
a look into the mod_neg. code). Capturing results within setenvif also
doesn't work at the moment, but should also be no problem to build in (and
would be useful in general, too). I think, I'm able to code it up, but both
changes probably have to be approved by some of the developers.
Sorry for the long post, I couldn't get it shorter ;-)
nd
--
Treat your password like your toothbrush. Don't let anybody else
use it, and get a new one every six months. -- Clifford Stoll
(found in ssl_engine_pphrase.c)
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Selectable languages (was Re: PDF transforms)
Posted by Joshua Slive <jo...@slive.ca>.
On Sat, 11 Jan 2003, André Malo wrote:
> yep. As said before, the printer friendly pdf files were more an exercise
> to become familiar with the xsl-fo stuff. My idea to assemble them to a big
> one is similar to Erik's. I've just thought to transform the fo-files
> first, then collect them via a script or a java task (which has to be
> written by someone with java knowledge ;-) and put the names into an xml
> files and run another transformation over this file, which finally feeds
> fop. voila. (hopefully ;-)
Sounds good. +1 from me.
>
> > The available languages thing is also very nice. Can you be a little more
> > specific about what changes we need to make to have that work?
>
> Oh, I already was (some weeks ago). Seems, the posting disappeared in the
> noise ;-)
I haven't been paying much attention lately.
Some thoughts:
1. Having a perl script generate the metafiles is not a big deal. We
don't add or change files very often, so really the perl script can just
be used for the initial change and we can even do it by hand after that.
2. Having the xml docs reference the metafile is not nice, but I agree
probably unavoidable. Again, it is a one-time thing for each doc, so we
can live with it.
3. Once we have the metafiles, we can use some fancy xslt to generate
mod_negotiation type maps. This should be a real performance and
simplicity improvement.
4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
easier distribution:
manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/en/file1.xml
manual/en/file1.html (generated html)
manual/en/file2.xml
manual/en/file2.html (generated html)
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html (generated html)
Now all references in the html are relative. By default, the url looks
like http://example.com/manual/file1.html which hits the typemap for
content-negotiation. A relative link "<a href="file2.html">" keeps
you in the type-map file directory. Now file1.xml contains auto-generated
links (generated by looking at file1.meta.xml) to each language
specific version (<a href="../en/file1.html"> and <a
href="../de/file1.html">). Once you are at the
http://example.com/de/file1.html, then all (relative) links keep you under
the de/ tree and no content negotiation occurs.
This, of course, requires a
AddType type-map .var
and a RemoveType in each language sub-directory.
Is this a good idea?
Joshua.
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms
Posted by André Malo <nd...@perlig.de>.
* Joshua Slive wrote:
> This is very nice. BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation. There is probably a way to do that by assembling the
> smaller ones with another program.
yep. As said before, the printer friendly pdf files were more an exercise
to become familiar with the xsl-fo stuff. My idea to assemble them to a big
one is similar to Erik's. I've just thought to transform the fo-files
first, then collect them via a script or a java task (which has to be
written by someone with java knowledge ;-) and put the names into an xml
files and run another transformation over this file, which finally feeds
fop. voila. (hopefully ;-)
> The available languages thing is also very nice. Can you be a little more
> specific about what changes we need to make to have that work?
Oh, I already was (some weeks ago). Seems, the posting disappeared in the
noise ;-)
However:
- We have to change the build system to create metafiles, that contain the
available variants (languages, pdf files). This will happen
automatically. (but the metafiles have to be checked in, otherwise the
script has no chance to check the timestamp dependencies correctly)
The perl script which maintains the metafiles could be rewritten in java,
so that we don't require perl for the build process. But again, this has
to be done by someone with java knowledge ;-)
- the metafiles have to be references anyway from within the xslt. There
are two possibilities to achieve this:
* reference the particular metafile in a attribute in the document's
rootelement (metafile="documentbasename.meta")
* inject the filename via ant into the transformation process.
the latter would be initally more easy, but has some drawbacks: we loose
the last rest of browser compatibility, since the xslt relies on an
information that's only available, if we run the transformation with ant.
Besides it requires the <foreach> task.
Conclusion: I'd prefer the first variant. It seems to be more clean
anyway.
- We have to setup the rewrite rules on daedalus. (That is independant from
the changes above.) IMHO the rewrite map file itself should go into
style/lang, thus we can maintain it via CVS (but no errors should occur
then with that file, since it may break the whole daedalus apache...)
But we have to decide, how to distribute the docs
within a release package. I'll require mod_rewrite for only viewing the
docs locally reluctantly. But currently I've no good idea how to solve
that problem.
(Note that there are some pathes to adjust in the attached ruleset)
- some minor changes in CSS and XSLT, to build in the links.
I'm going to commit the neccessary changes (I think, at 2.1 docs for now,
so that we can see, how it works, and port them back later), if the
suggestions get acknowledged.
^^^ ? ;-)
nd
--
"Die Untergeschosse der Sempergalerie bleiben w�hrenddessen aus
statistischen Gr�nden geflutet." -- Spiegel Online
Re: PDF transforms
Posted by Astrid Keßler <ke...@kess-net.de>.
> [language cross linking and PDF]
>
> hmmmm. Maybe I'm too impatient. But I'm somehow unsettled.
> Nothing to say? Too complex stuff? Do you have questions?
> Vacation time?
I saw it growing and like it. Both, language cross links and the PDF. I
hope, the language links will be on soon. They are very helpful. I'm
working with a local version containing them and miss them online.
Also +1 for PDF. Personally I'm more interested in one big PDF file for
screen use as well as printing, containing document-internal and external
links and bookmarks. The per-document version is nice for printing a single
chapter. You noted some possible improvements yourself. Additionally there
is only one small unaestheticness: some footnote numbers are seperated by a
large space from the text they are assigned to (because of justifying the
text).
> Sorry, I'm just askin' for some feedback (comments, flames, suggestions,
> ovations ;-), whatever). If there's something wrong with the work, don't
> hesitate to say it. I think, I bear that ;-)
I've to own up that I did not look very close to the build process, but to
the result. The result is easy to use. I'm + 1 to commit the stuff and
update the docsformat.html (transformation process) and get more experience
while using it.
Kess
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms
Posted by André Malo <nd...@perlig.de>.
* André Malo wrote:
[language cross linking and PDF]
hmmmm. Maybe I'm too impatient. But I'm somehow unsettled.
Nothing to say? Too complex stuff? Do you have questions?
Vacation time?
Sorry, I'm just askin' for some feedback (comments, flames, suggestions,
ovations ;-), whatever). If there's something wrong with the work, don't
hesitate to say it. I think, I bear that ;-)
Thanks, nd
--
package Hacker::Perl::Another::Just;print
qq~@{[reverse split/::/ =>__PACKAGE__]}~;
# André Malo # http://www.perlig.de #
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms
Posted by Erik Abele <er...@codefaktor.de>.
> Von: Astrid Keßler <ke...@kess-net.de>
> Antworten an: docs@httpd.apache.org
> Datum: Fri, 10 Jan 2003 10:01:23 +0100
> An: docs@httpd.apache.org
> Betreff: Re: PDF transforms
>
>> We can easily do that with our current build system. We will just need to
>> render all the single XML docs into one big XML doc. Then we can transform
>> this one doc into FOP and then into PDF.
>
> This is a nice idea. It has a lot of advantages for creating a big PDF
> file. If will make it easy to create a table of contents. But maybe we will
> run out of memory for some things which need a walk through the whole
> document. This is not only the toc but also the footnotes, if we keep them.
> We should try it.
>
Yes for sure, this will be a memory-eater but we will have to try... I have
done this before on some other (customer-related) projects and it went quite
well, but our docs tree is much more complicated.
We will definitely have to find a way which prevents us from the current
memory problems. I haven't had enough time to look through your and nd's
build.xml patches; perhaps they will help to accomplish this task.
cheers,
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms
Posted by Astrid Keßler <ke...@kess-net.de>.
> We can easily do that with our current build system. We will just need to
> render all the single XML docs into one big XML doc. Then we can transform
> this one doc into FOP and then into PDF.
This is a nice idea. It has a lot of advantages for creating a big PDF
file. If will make it easy to create a table of contents. But maybe we will
run out of memory for some things which need a walk through the whole
document. This is not only the toc but also the footnotes, if we keep them.
We should try it.
Kess
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)
Posted by Erik Abele <er...@codefaktor.de>.
Joshua Slive wrote:
> On Tue, 31 Dec 2002, André Malo wrote:
>
>>Whohoo!
>>ok, I think the current stuff is now applicable (but requires some further
>>work :)
>>You can get an impression at <http://test.perlig.de/manual/>.
>>
>>All of our XML source files got a PDF pendant _optimized for print_.
>
> This is very nice. BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation. There is probably a way to do that by assembling the
> smaller ones with another program.
We can easily do that with our current build system. We will just need to render all the single XML docs into one big XML doc. Then we can transform this one doc into FOP and then into PDF. Probably we can use sitemap.xml plus an easy XSL to wrap all the single docs into the big one. The transformation into FOP/PDF should be similar to the current transformation of the single docs, except for an index of contents and the linkage in the big file plus some other nice features...?
Andre, sorry for my delay, I haven't forgotten this one :-) I'm just out of time as always, but i promise to get at it at the weekend!
cheers,
erik
> The available languages thing is also very nice. Can you be a little more
> specific about what changes we need to make to have that work?
>
> Joshua.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)
Posted by Joshua Slive <jo...@slive.ca>.
On Tue, 31 Dec 2002, André Malo wrote:
> Whohoo!
> ok, I think the current stuff is now applicable (but requires some further
> work :)
> You can get an impression at <http://test.perlig.de/manual/>.
>
> All of our XML source files got a PDF pendant _optimized for print_.
This is very nice. BUT, I think the biggest gain from PDF (and what most
users want) comes from having one big PDF file with the entire
documentation. There is probably a way to do that by assembling the
smaller ones with another program.
The available languages thing is also very nice. Can you be a little more
specific about what changes we need to make to have that work?
Joshua.
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)
Posted by André Malo <nd...@perlig.de>.
* Erik Abele wrote:
> André Malo wrote:
>> The first step was to learn xsl-fo and the limitations of fop (*sigh*).
>> The current stage consists of a pdf file per document - optimized for
>> print. There are just some final nits, that I'm currently picking.
> cool...I'm keen on seeing the first pages ... will have more time the next
> days and would really like to help picking out some nits :)
<snip>
Whohoo!
ok, I think the current stuff is now applicable (but requires some further
work :)
You can get an impression at <http://test.perlig.de/manual/>.
All of our XML source files got a PDF pendant _optimized for print_.
The PDF files don't contain any clickable links or other online reading
stuff. The layout is more or less obtained from the manual-print.css with
some enhancements that are not possible with pure CSS.
Instead of making links clickable, which is not useful for printing ;-), I
decided to extract the relevant URLs from the particular href-attributes
and put them as footnotes there (issue 4, see below).
However, the PDF stuff has a lot of implications:
1) including appropriate links into the corresponding HTML files requires
metafiles, I proposed also for the language links. (we need the filename of
the pdf). (I combined it on the example page, of course ;-)
2) href-extracting cripples out the given URLs and makes them absolute.
This requires knowledge about the current (from the view of the document)
path. Also solved by the metafiles.
3) For non-latin scripts we cannot use the standard PDF fonts, we have to
embed other. My generated PDFs currently use Unicode Times and Courier from
my Win2k-installation for Russian PDFs. For japanese I'm using currently
MSMincho from the japanese language pack. But I wasn't able to use bold or
italic variants. I also don't know, what monospace font is applicable for
japanese. Hope, I'll get some hints here :)
Font embedding in the current variant has also some general drawbacks:
- you cannot c&p from the non-latin pdfs, since there are no characters
stored rather than only *references to glyphs*. This has to be solved
anyway for a all-in-one pdf.
- I'm not sure about license issues. The TTFReader of fop says "no
restrictions", but who knows? It would be better in general, to use
some free fonts, I think, that we can put into CVS or so.
- the build system is currently somewhat specialized, since fop has some
serious bugs with path names etc. (needed the latest beta, to make it
work in general! *sigh*)
The whole pdf-build system needs some cleanup.
4) Footnote support of fop is buggy (produces sometimes notes overlapping
with regular content etc.), so I decided to put them into an extra section
which appears at last on the document. (look at a sample pdf file, if you
don't understand, what I mean).
5) Table support is limited. fop doesn't support automatic table layout, so
we have to manage that manually. Not such a problem, I think, since once
created, the tablelayout file will be touched very seldom.
I put the tabledefinitions of all xml-files in one file per language.
6) fop doesn't support a lot of useful things, keep-conditions etc. But I
can live with it, until it's implemented. (for example, sometimes headings
appear on bottom of one page and the text follows on the subsequent
page...)
However, I think, I'm missing a lot of stuff in this description, that I
can't remember now, will post it later then ;-)
comments, help, questions is welcome :)
The xsl stuff can be found at <http://test.perlig.de/manual/style/pdf/>.
The build stuff at <http://test.perlig.de/manual/build/> (including the fop
directory)
wishing you all a happy new year, etc.
nd
--
Treat your password like your toothbrush. Don't let anybody else
use it, and get a new one every six months. -- Clifford Stoll
(found in ssl_engine_pphrase.c)
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org