You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Thomas Fischer <fi...@aon.at> on 2010/04/16 19:50:40 UTC

pdfbox 1.1.0 & -sort

Hello,

I'm testing pdfbox 1.1.0 on my MacBook with some TeX-created PDF files and try to determine if the parameter "-sort" improves the result.
Here are the results of one particular test file, I would be interested in hints on how to improve the conversion.

I get

´E. Cartan	w/o sort and
E´. Cartan	with sort.

The latter is better (Unicode diacritics are written after the letter)  but still not correct, É. Cartan would be needed, the PDF viewer shows this correctly.

In some cases, the sorting gets it fairly nice:

lim sup ζ(c, t) = 0.
t→0 c∈C

as opposed to

lim
t→0
µ
sup
c∈C
ζ(c, t)
¶
= 0.

But in other cases the sorting degrades the conversion:
with sort:

Let ξ denote the normalized
Haar measure onZ K ×K. ThenZforZall f ∈ Cc(G)
f(g)dµ(g) = f(kah)dξ(k, h)dη(a).
G X K×K

W/o sort:

Let ξ denote the normalized
Haar measure on K ×K. Then for all f ∈ Cc(G)Z
G
f(g)dµ(g) =
Z
X
Z
K×K
f(kah)dξ(k, h)dη(a).

The "ThenZforZall" definitely is a problem.

Finally, I get additional problems with accents:

with sort:
[1] Cartan, E.´ : Sur certaines formes riemanniennes remarquables des
géométries à groupe fondamental simple. Annales Sci. Ecol´ e Norm. Sup.

w/o sort:
[1] Cartan, ´E.: Sur certaines formes riemanniennes remarquables des
géométries à groupe fondamental simple. Annales Sci. ´Ecole Norm. Sup.

Here the sorting separates the accent '´' from the receiving letter, which prevents any further post processing with a script. 

The PDF viewer shows: