You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2012/03/01 20:12:21 UTC
Re: Help needed to resolve issue with converting Arabic characters
to presentation forms
Hi,
Am 29.02.2012 09:49, schrieb Hamed Iravanchi:
> Hi Andreas,
>
> Regarding the glyph-drawing issue, since I didn't hear anything from you I
> decided to take a shot myself, so I checked out the code (1.6 release tag)
> and started modifying it to see if I can get the result I expect, but I am
> confused and need help :)
Sorry, but I hadn't any free cycles in the last week ....
> I managed to convert the sample PDF that I provided to image correctly, but
> I made almost everything else corrupt! Here's what I did:
>
> I added a "drawGlyph" to PDFont, next to "drawString" like this:
>
> public abstract void drawString( String string, Graphics g, float
> fontSize,
> AffineTransform at, float x, float y ) throws IOException;
>
> public abstract void drawGlyph(int[] codeString, Graphics g, float
> fontSize,
> AffineTransform at, float x, float y)
> throws IOException;
>
> I tried to use the codes extracted from page stream. In the PDFStreamEngine
> -> processEncodedText -> for loop -> when "font.encode" succeeds, I use the
> same code integer to draw glyphs, and I passed it along the string to
> "processTextPosition" and I called "drawGlyph" in it, instead of
> "drawString".
>
> Here's the drawGlyph code that I wrote, according to your guidance:
>
> @Override
> public void drawGlyph(int[] codeString, Graphics g, float fontSize,
> AffineTransform at, float x, float y)
> throws IOException
> {
> Font _awtFont = getawtFont();
> Graphics2D g2d = (Graphics2D)g;
> g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
> RenderingHints.VALUE_ANTIALIAS_ON);
> writeFont(g2d, at, _awtFont, x, y, codeString);
> }
>
>
> Which uses an overload of writeFont similar to the original:
>
>
> protected void writeFont(final Graphics2D g2d, final AffineTransform
> at, final Font awtFont,
> final float x, final float y, final int[]
> codeString)
> {
> FontRenderContext frc = new FontRenderContext(null, true, true);
>
> // check if we have a rotation
> if (!at.isIdentity())
> {
> try
> {
> AffineTransform atInv = at.createInverse();
> // do only apply the size of the transform, rotation will
> be realized by rotating the graphics,
> // otherwise the hp printers will not render the font
> Font derivedFont = awtFont.deriveFont(1f);
> g2d.setFont(derivedFont);
>
> GlyphVector glyphs = derivedFont.createGlyphVector(frc,
> codeString);
>
> // apply the inverse transformation to the graphics, which
> should be the same as applying the
> // transformation itself to the text
> g2d.transform(at);
> // translate the coordinates
> Point2D.Float newXy = new Point2D.Float(x,y);
> atInv.transform(new Point2D.Float( x, y), newXy);
> g2d.drawGlyphVector(glyphs, (float)newXy.getX(),
> (float)newXy.getY());
>
> // restore the original transformation
> g2d.transform(atInv);
> }
> catch (NoninvertibleTransformException e)
> {
> log.error("Error in " + getClass().getName() +
> ".writeFont", e);
> }
> }
> else
> {
> Font derivedFont = awtFont.deriveFont(at);
> g2d.setFont(derivedFont);
>
> GlyphVector glyphs = derivedFont.createGlyphVector(frc,
> codeString);
> g2d.drawGlyphVector(glyphs, x, y);
> }
>
> Well, that made everything work for the sample PDF that I was working on.
> But then I realized that it is only because the "glyph" codes in the font
> are equal to the codes used in the page stream.
>
> For example, in a simple English PDF, there is no "toUnicode" table, and
> the same character codes are used in the page stream. But the glyph codes
> in the font are different.
>
> In another PDF (which is RTL and uses connected characters) the code
> sequence in the page stream start from 1 (like 1, 2, 3, 4, 5, 3, 6, ...)
> but there is no "toUnicode" in it, and the glyph codes in the fonts are
> different than those codes, and I didn't find any relation between the two.
>
> After all, I don't know how can I decide when to use glyphs and when to use
> the extracted text (string) to draw the characters. Or, is there a way to
> convert everything to glyph codes and draw all the text using glyphs?
There are a lot of different ways to encode the text/glyph mapping as you
already found out. ;-) I'm afraid it's too much to write it down here.
I'm almost done, but I have to get rid of some unwanted side-effects. I hope to
find some time at the weekend to finish my work.
> BTW, in your sample code to draw glyphs (quoted below) there's a
> "CIDstring" which I didn't understand and I thought maybe it has something
> to do with my current trouble.
The CIDstring in my example contains the codes for the glyphs and not the
readable text.
> Thanks in advance,
> -Hamed
>
>
> On Sat, Feb 18, 2012 at 10:58 PM, Andreas Lehmkuehler<an...@lehmi.de>wrote:
>
>> Hi,
>>
>> Am 18.02.2012 18:52, schrieb Hamed Iravanchi:
>>
>> Hi again.
>>>
>>> Thanks for ur attention to the issue.
>>> I actually checked, and saw that the font itself (ttf stream) contains
>>> the
>>> correct cmap. If we can draw the text using glyph ID instead of
>>> characters, the font knows the right characters to draw.
>>>
>>> I checked the Font class instance in the debugger, it contains a cmap
>>> which is exactly right. First I was looking for ways to take the mapping
>>> from the font (since it is private member, specific to Sun impl).
>>>
>>> But I realized we could ask the font to draw glyphs instead of characters.
>>> But i couldn't still find a right way to draw a glyph on graphics.
>>>
>> That's exactly what I'm doing. It somehow lokks like the following:
>>
>> Create the needed glyphs:
>>
>> FontRenderContext frc = new FontRenderContext(null, true, true);
>> int stringLength = CIDstring.length();
>> int[] codePoints = new int[stringLength];
>> for (int i=0;i<stringLength;i++)
>> codePoints[i] = CIDstring.codePointAt(i);
>> GlyphVector glyphs = awtFont.createGlyphVector(frc, codePoints);
>>
>> ...
>>
>> Draw the glyphs:
>>
>> g2d.drawGlyphVector(glyphs, x, y);
>>
>>
>> BTW, I also can do the implementation and send u a patch once I realize
>>> what to do. Thanks for ur encouragement :-)
>>>
>> Thanks for the offer, I'm already on that, I just have to clean up the
>> code and to run some tests to avoid unwanted side effects.
>> Once my code is available you might want to doublecheck it.
>>
>>
>> - Hamed
>>> On Feb 18, 2012 7:05 PM, "Andreas Lehmkuehler"<an...@lehmi.de> wrote:
>>>
>>> <SNIP>
>>
>> BR
>> Andreas Lehmkühler
BR
Andreas Lehmkühler
Re: Help needed to resolve issue with converting Arabic characters to
presentation forms
Posted by Hamed Iravanchi <ir...@gmail.com>.
Hi,
I saw you updated the issue in JIRA, so I downloaded the trunk code
and tested it.
I confirm that the case that I was investigating is now fixed, and it
is converted to image correctly. Thanks a lot.
I tested it with some other PDF files, most of them worked well, but I
could find a few that didn't work, having a similar problem. Most
notably, PDF files that are created with OpenOffice.org writer.
I also found normal (English) pdf files that didn't work either.
I extracted a page from each of them, and I'm attaching them to this
email. I didn't comment on JIRA issue because I wasn't sure that there
are related to the same issue or not. I haven't tried to debug any of
these files. I'll keep you posted if I could analyse them too.
Files attached to this email:
j.pdf: Farsi PDF file created by OpenOffice.org
k.pdf: Another Farsi PDF file, created by Jaws PDF Creator
l.pdf: A page from an English e-book, created by PDF-XChange
Note: the previous sample that I sent (which works correctly now) was
created by Microsoft Word.
Thanks again for all your efforts,
-Hamed
On 3/1/12, Andreas Lehmkuehler <an...@lehmi.de> wrote:
> Hi,
>
> Am 29.02.2012 09:49, schrieb Hamed Iravanchi:
>> Hi Andreas,
>>
>> Regarding the glyph-drawing issue, since I didn't hear anything from you I
>> decided to take a shot myself, so I checked out the code (1.6 release tag)
>> and started modifying it to see if I can get the result I expect, but I am
>> confused and need help :)
> Sorry, but I hadn't any free cycles in the last week ....
>
>> I managed to convert the sample PDF that I provided to image correctly,
>> but
>> I made almost everything else corrupt! Here's what I did:
>>
>> I added a "drawGlyph" to PDFont, next to "drawString" like this:
>>
>> public abstract void drawString( String string, Graphics g, float
>> fontSize,
>> AffineTransform at, float x, float y ) throws IOException;
>>
>> public abstract void drawGlyph(int[] codeString, Graphics g, float
>> fontSize,
>> AffineTransform at, float x, float y)
>> throws IOException;
>>
>> I tried to use the codes extracted from page stream. In the
>> PDFStreamEngine
>> -> processEncodedText -> for loop -> when "font.encode" succeeds, I use
>> the
>> same code integer to draw glyphs, and I passed it along the string to
>> "processTextPosition" and I called "drawGlyph" in it, instead of
>> "drawString".
>>
>> Here's the drawGlyph code that I wrote, according to your guidance:
>>
>> @Override
>> public void drawGlyph(int[] codeString, Graphics g, float fontSize,
>> AffineTransform at, float x, float y)
>> throws IOException
>> {
>> Font _awtFont = getawtFont();
>> Graphics2D g2d = (Graphics2D)g;
>> g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
>> RenderingHints.VALUE_ANTIALIAS_ON);
>> writeFont(g2d, at, _awtFont, x, y, codeString);
>> }
>>
>>
>> Which uses an overload of writeFont similar to the original:
>>
>>
>> protected void writeFont(final Graphics2D g2d, final AffineTransform
>> at, final Font awtFont,
>> final float x, final float y, final int[]
>> codeString)
>> {
>> FontRenderContext frc = new FontRenderContext(null, true, true);
>>
>> // check if we have a rotation
>> if (!at.isIdentity())
>> {
>> try
>> {
>> AffineTransform atInv = at.createInverse();
>> // do only apply the size of the transform, rotation will
>> be realized by rotating the graphics,
>> // otherwise the hp printers will not render the font
>> Font derivedFont = awtFont.deriveFont(1f);
>> g2d.setFont(derivedFont);
>>
>> GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>>
>> // apply the inverse transformation to the graphics,
>> which
>> should be the same as applying the
>> // transformation itself to the text
>> g2d.transform(at);
>> // translate the coordinates
>> Point2D.Float newXy = new Point2D.Float(x,y);
>> atInv.transform(new Point2D.Float( x, y), newXy);
>> g2d.drawGlyphVector(glyphs, (float)newXy.getX(),
>> (float)newXy.getY());
>>
>> // restore the original transformation
>> g2d.transform(atInv);
>> }
>> catch (NoninvertibleTransformException e)
>> {
>> log.error("Error in " + getClass().getName() +
>> ".writeFont", e);
>> }
>> }
>> else
>> {
>> Font derivedFont = awtFont.deriveFont(at);
>> g2d.setFont(derivedFont);
>>
>> GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>> g2d.drawGlyphVector(glyphs, x, y);
>> }
>>
>> Well, that made everything work for the sample PDF that I was working on.
>> But then I realized that it is only because the "glyph" codes in the font
>> are equal to the codes used in the page stream.
>>
>> For example, in a simple English PDF, there is no "toUnicode" table, and
>> the same character codes are used in the page stream. But the glyph codes
>> in the font are different.
>>
>> In another PDF (which is RTL and uses connected characters) the code
>> sequence in the page stream start from 1 (like 1, 2, 3, 4, 5, 3, 6, ...)
>> but there is no "toUnicode" in it, and the glyph codes in the fonts are
>> different than those codes, and I didn't find any relation between the
>> two.
>>
>> After all, I don't know how can I decide when to use glyphs and when to
>> use
>> the extracted text (string) to draw the characters. Or, is there a way to
>> convert everything to glyph codes and draw all the text using glyphs?
> There are a lot of different ways to encode the text/glyph mapping as you
> already found out. ;-) I'm afraid it's too much to write it down here.
>
> I'm almost done, but I have to get rid of some unwanted side-effects. I hope
> to
> find some time at the weekend to finish my work.
>
>> BTW, in your sample code to draw glyphs (quoted below) there's a
>> "CIDstring" which I didn't understand and I thought maybe it has something
>> to do with my current trouble.
> The CIDstring in my example contains the codes for the glyphs and not the
> readable text.
>
>> Thanks in advance,
>> -Hamed
>>
>>
>> On Sat, Feb 18, 2012 at 10:58 PM, Andreas
>> Lehmkuehler<an...@lehmi.de>wrote:
>>
>>> Hi,
>>>
>>> Am 18.02.2012 18:52, schrieb Hamed Iravanchi:
>>>
>>> Hi again.
>>>>
>>>> Thanks for ur attention to the issue.
>>>> I actually checked, and saw that the font itself (ttf stream) contains
>>>> the
>>>> correct cmap. If we can draw the text using glyph ID instead of
>>>> characters, the font knows the right characters to draw.
>>>>
>>>> I checked the Font class instance in the debugger, it contains a cmap
>>>> which is exactly right. First I was looking for ways to take the mapping
>>>> from the font (since it is private member, specific to Sun impl).
>>>>
>>>> But I realized we could ask the font to draw glyphs instead of
>>>> characters.
>>>> But i couldn't still find a right way to draw a glyph on graphics.
>>>>
>>> That's exactly what I'm doing. It somehow lokks like the following:
>>>
>>> Create the needed glyphs:
>>>
>>> FontRenderContext frc = new FontRenderContext(null, true, true);
>>> int stringLength = CIDstring.length();
>>> int[] codePoints = new int[stringLength];
>>> for (int i=0;i<stringLength;i++)
>>> codePoints[i] = CIDstring.codePointAt(i);
>>> GlyphVector glyphs = awtFont.createGlyphVector(frc, codePoints);
>>>
>>> ...
>>>
>>> Draw the glyphs:
>>>
>>> g2d.drawGlyphVector(glyphs, x, y);
>>>
>>>
>>> BTW, I also can do the implementation and send u a patch once I
>>> realize
>>>> what to do. Thanks for ur encouragement :-)
>>>>
>>> Thanks for the offer, I'm already on that, I just have to clean up the
>>> code and to run some tests to avoid unwanted side effects.
>>> Once my code is available you might want to doublecheck it.
>>>
>>>
>>> - Hamed
>>>> On Feb 18, 2012 7:05 PM, "Andreas Lehmkuehler"<an...@lehmi.de>
>>>> wrote:
>>>>
>>>> <SNIP>
>>>
>>> BR
>>> Andreas Lehmkühler
>
> BR
> Andreas Lehmkühler
>
>