You are viewing a plain text version of this content. The canonical link for it is here.

Posted to docs@httpd.apache.org by Erik Abele <er...@codefaktor.de> on 2002/12/30 23:28:45 UTC

PDF transforms, was Re: Stop shipping XML

André Malo wrote:
> Erik Abele wrote:
>>André Malo wrote:
>>
>>>For example, I'm currently working on PDF for print using fop (will
>>>introduce them today evening or tomorrow, whenever it gets ready). The html
>>>files are not parseable by the processor and therefore get no pdf.
> 
>>Great, I was working on this two months ago but than I ran totally out of
>>time. Do you plan to generate one big pdf document which contains all the
>>xml sources or several pdf docs?
> 
> The first step was to learn xsl-fo and the limitations of fop (*sigh*).
> The current stage consists of a pdf file per document - optimized for 
> print. There are just some final nits, that I'm currently picking.
> 

cool...I'm keen on seeing the first pages ... will have more time the next days and would really like to help picking out some nits :)

> The next stage then can base on this work and merge the stuff together.
> (But there are still some problems to solve before.)
> 

Okay, ... any special problems? Perhaps I can help. The merging into one big doc shouldn't be a problem, except for a nice sidebar with all the links (probably taken from sitemap.xml) to directly hop throug the doc; this part could be a bit harder.

hmmm...but lets wait for a working base...

cheers,
Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: Selectable languages

Posted by André Malo <nd...@perlig.de>.

* Joshua Slive wrote:

> Some thoughts:
> 
> 1. Having a perl script generate the metafiles is not a big deal.  We
> don't add or change files very often, so really the perl script can just
> be used for the initial change and we can even do it by hand after that.

ah right. This was mainly a rudiment from earlier trials. It does an 
additional dependency checking anyway. (metafile change -> touch all 
language variants). If we use the foreach task (see the posting about the 
out-of-memory/xalan-cache fix), we get the possibility to define a 
<dependset> that does that work (since we can determine every single 
filename)

> 2. Having the xml docs reference the metafile is not nice, but I agree
> probably unavoidable.  Again, it is a one-time thing for each doc, so we
> can live with it.

yep, exactly my thought.
By the way, the metafile can contain the relativepath, too (i.e. currently 
it does). So we can omit this particular element from the actual document 
sourcecode, which is a good thing, IMHO (since it actually describes 
metadata).

> 3. Once we have the metafiles, we can use some fancy xslt to generate
> mod_negotiation type maps.  This should be a real performance and
> simplicity improvement.

yes (performance) and yes/no (simplicity); see below.

> 4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
> easier distribution:
> 
> manual/file1.meta.xml
> manual/file1.html (generated type-map)
> manual/file2.meta.xml
> manual/file2.html (generated type-map)
> manual/en/file1.xml
> manual/en/file1.html (generated html)
> manual/en/file2.xml
> manual/en/file2.html (generated html)
> manual/de/file1.xml
> manual/de/file1.html (generated html)
> manual/de/file2.xml
> manual/de/file2.html (generated html)
> 
> Now all references in the html are relative.  By default, the url looks
> like http://example.com/manual/file1.html which hits the typemap for
> content-negotiation.  A relative link "<a href="file2.html">" keeps
> you in the type-map file directory.  Now file1.xml contains auto-generated
> links (generated by looking at file1.meta.xml) to each language
> specific version (<a href="../en/file1.html"> and <a
> href="../de/file1.html">).  Once you are at the
> http://example.com/de/file1.html, then all (relative) links keep you under
> the de/ tree and no content negotiation occurs.
> 
> This, of course, requires a
> AddType type-map .var
> and a RemoveType in each language sub-directory.
> 
> Is this a good idea?

Partially. I'm very +1 on type maps in general, since it improves the 
performance much (the more files are present in the current directory). 
However, it doesn't solve all problems:

- if you're in the negotiated "root" branch (which actually uses files from 
  a particular language branch), the switch to an explicit language branch 
  won't work by the relative link. Currently (with the rewrite rules) it's 
  solved the other way round. The links are considered to be relative to 
  the "root" branch, thus you get within a language branch links like 
  "/manual/de/en/foo.html", when switching from de-branch to en-branch. 
  This can be detected by a RewriteRule (currently) or a RedirectMatch.
- The source files (meaning CSS and images etc.) have to be copied into 
  every branch or have to be aliased.
- At the moment (and for a long future time, I guess) the documents are not 
  translated entirely. This could be solved by turning on Multiviews within 
  the language subdirectories and use (also autogenerated) .html.var files 
  instead of the not-yet-translated .html files (need different extensions 
  in order to catch them). Multiviews detects the type-map and evaluates 
  it. (hmm, after some further thinking, we have also the problem with 
  diffenrent charsets, so the mutliviews are probably really neccessary - 
  or we configure them statically, i.e. one Location or Directory section 
  for every language *hmpf*)

So a complete solution with separated directories would be something like 
this:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
...
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html.var (not transl., generated type-map)

the links to switch to another language would point to ./en/file.html 
instead of ../en/file.html (etc).
The config would be about the following: [untested, just written down]

# alias languaged /images/ and /style/
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/(?:images|style).*)$ \
            /path/to/manual$1

# alias the rest
Alias /manual /path/to/manual

# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)?$ \
               /manual/$1$2

<Location /manual/>
  AddHandler type-map .html
  [... general stuff ...]
</Location>

<Location ~ ^/+manual/+(en|de|ru|ja|pt-br)>
  RemoveHandler .html
  AddHandler type-map .var
  Options +MultiViews
</Location>

-----------

Said all of that, the problem rolled round in my head, too. I think, it's 
worth to present it here, too, grabbing an opinion :). The result of my 
thoughts was an extension of mod_negotiation. It would introduce a new 
special variable, say "prefer-language", evaluated my mod_negotiation in 
the way, that it would at first try to serve the preferred language and if 
not possible negotiate over all variants. (A similar manipulation like 
"no-gzip" and "gzip-only-text/html" for mod_deflate do). That solution 
would rely on holding the different languages within the same directory 
again. The configuration would be similar to the above one, but we wouldn't 
need such a lot of files, for example:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file1.html.en
manual/file1.html.de
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/file2.html.en
...

httpd.conf:

# alias manual directory (an virtual language dirs.)
AliasMatch ^/manual(?:/(?:en|de|ru|ja|pt-br))?(/.*)? \
            /path/to/manual$1

# solve nested languages (lang2lang switch)
RedirectMatch ^/manual/(?:en|de|ru|ja|pt-br)/(en|de|ru|ja|pt-br)(/.*)? \
               /manual/$1$2

<Directory /path/to/manual>
  AddHandler type-map .html
  SetEnvIf Request_URI ^/manual/(en|de|ru|ja|pt-br)(?:/.*)? \
           prefered-language=$1

  <Files *.html.*>
    RemoveHandler .html
  </Files>

  [... general stuff ...]
</Directory>

The prefer-language feature should be easy to implement (afaics, just took 
a look into the mod_neg. code). Capturing results within setenvif also 
doesn't work at the moment, but should also be no problem to build in (and 
would be useful in general, too). I think, I'm able to code it up, but both 
changes probably have to be approved by some of the developers.

Sorry for the long post, I couldn't get it shorter ;-)

nd
-- 
Treat your password like your toothbrush. Don't let anybody else
use it, and get a new one every six months.  -- Clifford Stoll

                                    (found in ssl_engine_pphrase.c)

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Selectable languages (was Re: PDF transforms)

Posted by Joshua Slive <jo...@slive.ca>.

On Sat, 11 Jan 2003, André Malo wrote:
> yep. As said before, the printer friendly pdf files were more an exercise
> to become familiar with the xsl-fo stuff. My idea to assemble them to a big
> one is similar to Erik's. I've just thought to transform the fo-files
> first, then collect them via a script or a java task (which has to be
> written by someone with java knowledge ;-) and put the names into an xml
> files and run another transformation over this file, which finally feeds
> fop. voila. (hopefully ;-)

Sounds good.  +1 from me.

>
> > The available languages thing is also very nice.  Can you be a little more
> > specific about what changes we need to make to have that work?
>
> Oh, I already was (some weeks ago). Seems, the posting disappeared in the
> noise ;-)

I haven't been paying much attention lately.

Some thoughts:

1. Having a perl script generate the metafiles is not a big deal.  We
don't add or change files very often, so really the perl script can just
be used for the initial change and we can even do it by hand after that.

2. Having the xml docs reference the metafile is not nice, but I agree
probably unavoidable.  Again, it is a one-time thing for each doc, so we
can live with it.

3. Once we have the metafiles, we can use some fancy xslt to generate
mod_negotiation type maps.  This should be a real performance and
simplicity improvement.

4. Here's an idea to avoid the need for mod_rewrite, and thereby allow
easier distribution:

manual/file1.meta.xml
manual/file1.html (generated type-map)
manual/file2.meta.xml
manual/file2.html (generated type-map)
manual/en/file1.xml
manual/en/file1.html (generated html)
manual/en/file2.xml
manual/en/file2.html (generated html)
manual/de/file1.xml
manual/de/file1.html (generated html)
manual/de/file2.xml
manual/de/file2.html (generated html)

Now all references in the html are relative.  By default, the url looks
like http://example.com/manual/file1.html which hits the typemap for
content-negotiation.  A relative link "<a href="file2.html">" keeps
you in the type-map file directory.  Now file1.xml contains auto-generated
links (generated by looking at file1.meta.xml) to each language
specific version (<a href="../en/file1.html"> and <a
href="../de/file1.html">).  Once you are at the
http://example.com/de/file1.html, then all (relative) links keep you under
the de/ tree and no content negotiation occurs.

This, of course, requires a
AddType type-map .var
and a RemoveType in each language sub-directory.

Is this a good idea?

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by André Malo <nd...@perlig.de>.

* Joshua Slive wrote:

> This is very nice.  BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation.  There is probably a way to do that by assembling the
> smaller ones with another program.

yep. As said before, the printer friendly pdf files were more an exercise 
to become familiar with the xsl-fo stuff. My idea to assemble them to a big 
one is similar to Erik's. I've just thought to transform the fo-files 
first, then collect them via a script or a java task (which has to be 
written by someone with java knowledge ;-) and put the names into an xml 
files and run another transformation over this file, which finally feeds 
fop. voila. (hopefully ;-)

> The available languages thing is also very nice.  Can you be a little more
> specific about what changes we need to make to have that work?

Oh, I already was (some weeks ago). Seems, the posting disappeared in the 
noise ;-)
However:
- We have to change the build system to create metafiles, that contain the
  available variants (languages, pdf files). This will happen 
  automatically. (but the metafiles have to be checked in, otherwise the 
  script has no chance to check the timestamp dependencies correctly)

  The perl script which maintains the metafiles could be rewritten in java, 
  so that we don't require perl for the build process. But again, this has 
  to be done by someone with java knowledge ;-)

- the metafiles have to be references anyway from within the xslt. There 
  are two possibilities to achieve this:
  * reference the particular metafile in a attribute in the document's 
    rootelement (metafile="documentbasename.meta")
  * inject the filename via ant into the transformation process.

  the latter would be initally more easy, but has some drawbacks: we loose 
  the last rest of browser compatibility, since the xslt relies on an 
  information that's only available, if we run the transformation with ant.
  Besides it requires the <foreach> task.
  Conclusion: I'd prefer the first variant. It seems to be more clean 
  anyway.

- We have to setup the rewrite rules on daedalus. (That is independant from 
  the changes above.) IMHO the rewrite map file itself should go into 
  style/lang, thus we can maintain it via CVS (but no errors should occur 
  then with that file, since it may break the whole daedalus apache...)

  But we have to decide, how to distribute the docs 
  within a release package. I'll require mod_rewrite for only viewing the 
  docs locally reluctantly. But currently I've no good idea how to solve 
  that problem.
  (Note that there are some pathes to adjust in the attached ruleset)

- some minor changes in CSS and XSLT, to build in the links.

I'm going to commit the neccessary changes (I think, at 2.1 docs for now, 
so that we can see, how it works, and port them back later), if the 
suggestions get acknowledged.
            ^^^ ? ;-)

nd
-- 
"Die Untergeschosse der Sempergalerie bleiben w�hrenddessen aus
 statistischen Gr�nden geflutet." -- Spiegel Online

Re: PDF transforms

Posted by Astrid Keßler <ke...@kess-net.de>.

> [language cross linking and PDF]
> 
> hmmmm. Maybe I'm too impatient. But I'm somehow unsettled.
> Nothing to say? Too complex stuff? Do you have questions?
> Vacation time?

I saw it growing and like it. Both, language cross links and the PDF. I 
hope, the language links will be on soon. They are very helpful. I'm 
working with a local version containing them and miss them online.

Also +1 for PDF. Personally I'm more interested in one big PDF file for 
screen use as well as printing, containing document-internal and external 
links and bookmarks. The per-document version is nice for printing a single 
chapter. You noted some possible improvements yourself. Additionally there 
is only one small unaestheticness: some footnote numbers are seperated by a 
large space from the text they are assigned to (because of justifying the 
text).

> Sorry, I'm just askin' for some feedback (comments, flames, suggestions,
> ovations ;-), whatever). If there's something wrong with the work, don't
> hesitate to say it. I think, I bear that ;-)

I've to own up that I did not look very close to the build process, but to 
the result. The result is easy to use. I'm + 1 to commit the stuff and 
update the docsformat.html (transformation process) and get more experience 
while using it. 

 Kess

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by André Malo <nd...@perlig.de>.

* André Malo wrote:

[language cross linking and PDF]

hmmmm. Maybe I'm too impatient. But I'm somehow unsettled.
Nothing to say? Too complex stuff? Do you have questions?
Vacation time?

Sorry, I'm just askin' for some feedback (comments, flames, suggestions, 
ovations ;-), whatever). If there's something wrong with the work, don't 
hesitate to say it. I think, I bear that ;-)

Thanks, nd
-- 
package Hacker::Perl::Another::Just;print
qq~@{[reverse split/::/ =>__PACKAGE__]}~;

#  André Malo  #  http://www.perlig.de  #

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by Erik Abele <er...@codefaktor.de>.

> Von: Astrid Keßler <ke...@kess-net.de>
> Antworten an: docs@httpd.apache.org
> Datum: Fri, 10 Jan 2003 10:01:23 +0100
> An: docs@httpd.apache.org
> Betreff: Re: PDF transforms
> 
>> We can easily do that with our current build system. We will just need to
>> render all the single XML docs into one big XML doc. Then we can transform
>> this one doc into FOP and then into PDF.
> 
> This is a nice idea. It has a lot of advantages for creating a big PDF
> file. If will make it easy to create a table of contents. But maybe we will
> run out of memory for some things which need a walk through the whole
> document. This is not only the toc but also the footnotes, if we keep them.
> We should try it.
> 

Yes for sure, this will be a memory-eater but we will have to try... I have
done this before on some other (customer-related) projects and it went quite
well, but our docs tree is much more complicated.

We will definitely have to find a way which prevents us from the current
memory problems. I haven't had enough time to look through your and nd's
build.xml patches; perhaps they will help to accomplish this task.

cheers,
Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms

Posted by Astrid Keßler <ke...@kess-net.de>.

> We can easily do that with our current build system. We will just need to
> render all the single XML docs into one big XML doc. Then we can transform
> this one doc into FOP and then into PDF. 

This is a nice idea. It has a lot of advantages for creating a big PDF 
file. If will make it easy to create a table of contents. But maybe we will 
run out of memory for some things which need a walk through the whole 
document. This is not only the toc but also the footnotes, if we keep them. 
We should try it. 

 Kess

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)

Posted by Erik Abele <er...@codefaktor.de>.

Joshua Slive wrote:
> On Tue, 31 Dec 2002, André Malo wrote:
> 
>>Whohoo!
>>ok, I think the current stuff is now applicable (but requires some further
>>work :)
>>You can get an impression at <http://test.perlig.de/manual/>.
>>
>>All of our XML source files got a PDF pendant _optimized for print_.
> 
> This is very nice.  BUT, I think the biggest gain from PDF (and what most
> users want) comes from having one big PDF file with the entire
> documentation.  There is probably a way to do that by assembling the
> smaller ones with another program.

We can easily do that with our current build system. We will just need to render all the single XML docs into one big XML doc. Then we can transform this one doc into FOP and then into PDF. Probably we can use sitemap.xml plus an easy XSL to wrap all the single docs into the big one. The transformation into FOP/PDF should be similar to the current transformation of the single docs, except for an index of contents and the linkage in the big file plus some other nice features...?

Andre, sorry for my delay, I haven't forgotten this one :-) I'm just out of time as always, but i promise to get at it at the weekend!

cheers,
erik

> The available languages thing is also very nice.  Can you be a little more
> specific about what changes we need to make to have that work?
> 
> Joshua.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)

Posted by Joshua Slive <jo...@slive.ca>.

On Tue, 31 Dec 2002, André Malo wrote:
> Whohoo!
> ok, I think the current stuff is now applicable (but requires some further
> work :)
> You can get an impression at <http://test.perlig.de/manual/>.
>
> All of our XML source files got a PDF pendant _optimized for print_.

This is very nice.  BUT, I think the biggest gain from PDF (and what most
users want) comes from having one big PDF file with the entire
documentation.  There is probably a way to do that by assembling the
smaller ones with another program.

The available languages thing is also very nice.  Can you be a little more
specific about what changes we need to make to have that work?

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org

Re: PDF transforms (was: PDF transforms, was Re: Stop shipping XML)

Posted by André Malo <nd...@perlig.de>.

* Erik Abele wrote:

> André Malo wrote:

>> The first step was to learn xsl-fo and the limitations of fop (*sigh*).
>> The current stage consists of a pdf file per document - optimized for
>> print. There are just some final nits, that I'm currently picking.

> cool...I'm keen on seeing the first pages ... will have more time the next
> days and would really like to help picking out some nits :)

<snip>

Whohoo!
ok, I think the current stuff is now applicable (but requires some further 
work :)
You can get an impression at <http://test.perlig.de/manual/>.

All of our XML source files got a PDF pendant _optimized for print_.
The PDF files don't contain any clickable links or other online reading 
stuff. The layout is more or less obtained from the manual-print.css with 
some enhancements that are not possible with pure CSS.
Instead of making links clickable, which is not useful for printing ;-), I 
decided to extract the relevant URLs from the particular href-attributes 
and put them as footnotes there (issue 4, see below).

However, the PDF stuff has a lot of implications:

1) including appropriate links into the corresponding HTML files requires 
metafiles, I proposed also for the language links. (we need the filename of 
the pdf). (I combined it on the example page, of course ;-)

2) href-extracting cripples out the given URLs and makes them absolute. 
This requires knowledge about the current (from the view of the document) 
path. Also solved by the metafiles.

3) For non-latin scripts we cannot use the standard PDF fonts, we have to 
embed other. My generated PDFs currently use Unicode Times and Courier from 
my Win2k-installation for Russian PDFs. For japanese I'm using currently 
MSMincho from the japanese language pack. But I wasn't able to use bold or 
italic variants. I also don't know, what monospace font is applicable for 
japanese. Hope, I'll get some hints here :)
Font embedding in the current variant has also some general drawbacks:
  - you cannot c&p from the non-latin pdfs, since there are no characters 
    stored rather than only *references to glyphs*. This has to be solved 
    anyway for a all-in-one pdf.
  - I'm not sure about license issues. The TTFReader of fop says "no 
    restrictions", but who knows? It would be better in general, to use 
    some free fonts, I think, that we can put into CVS or so.
  - the build system is currently somewhat specialized, since fop has some 
    serious bugs with path names etc. (needed the latest beta, to make it
    work in general! *sigh*)
    The whole pdf-build system needs some cleanup.

4) Footnote support of fop is buggy (produces sometimes notes overlapping 
with regular content etc.), so I decided to put them into an extra section 
which appears at last on the document. (look at a sample pdf file, if you 
don't understand, what I mean).

5) Table support is limited. fop doesn't support automatic table layout, so 
we have to manage that manually. Not such a problem, I think, since once 
created, the tablelayout file will be touched very seldom.
I put the tabledefinitions of all xml-files in one file per language.

6) fop doesn't support a lot of useful things, keep-conditions etc. But I 
can live with it, until it's implemented. (for example, sometimes headings 
appear on bottom of one page and the text follows on the subsequent 
page...)

However, I think, I'm missing a lot of stuff in this description, that I 
can't remember now, will post it later then ;-)

comments, help, questions is welcome :)

The xsl stuff can be found at <http://test.perlig.de/manual/style/pdf/>. 
The build stuff at <http://test.perlig.de/manual/build/> (including the fop 
directory)

wishing you all a happy new year, etc.

nd
-- 
Treat your password like your toothbrush. Don't let anybody else
use it, and get a new one every six months.  -- Clifford Stoll

                                    (found in ssl_engine_pphrase.c)

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org