You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2012/03/01 20:12:21 UTC

Re: Help needed to resolve issue with converting Arabic characters to presentation forms

Hi,

Am 29.02.2012 09:49, schrieb Hamed Iravanchi:
> Hi Andreas,
>
> Regarding the glyph-drawing issue, since I didn't hear anything from you I
> decided to take a shot myself, so I checked out the code (1.6 release tag)
> and started modifying it to see if I can get the result I expect, but I am
> confused and need help :)
Sorry, but I hadn't any free cycles in the last week ....

> I managed to convert the sample PDF that I provided to image correctly, but
> I made almost everything else corrupt! Here's what I did:
>
> I added a "drawGlyph" to PDFont, next to "drawString" like this:
>
>      public abstract void drawString( String string, Graphics g, float
> fontSize,
>          AffineTransform at, float x, float y ) throws IOException;
>
>      public abstract void drawGlyph(int[] codeString, Graphics g, float
> fontSize,
>                                     AffineTransform at, float x, float y)
> throws IOException;
>
> I tried to use the codes extracted from page stream. In the PDFStreamEngine
> ->  processEncodedText ->  for loop ->  when "font.encode" succeeds, I use the
> same code integer to draw glyphs, and I passed it along the string to
> "processTextPosition" and I called "drawGlyph" in it, instead of
> "drawString".
>
> Here's the drawGlyph code that I wrote, according to your guidance:
>
>      @Override
>      public void drawGlyph(int[] codeString, Graphics g, float fontSize,
> AffineTransform at, float x, float y)
>              throws IOException
>      {
>          Font _awtFont = getawtFont();
>          Graphics2D g2d = (Graphics2D)g;
>          g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
> RenderingHints.VALUE_ANTIALIAS_ON);
>          writeFont(g2d, at, _awtFont, x, y, codeString);
>      }
>
>
> Which uses an overload of writeFont similar to the original:
>
>
>      protected void writeFont(final Graphics2D g2d, final AffineTransform
> at, final Font awtFont,
>                               final float x, final float y, final int[]
> codeString)
>      {
>          FontRenderContext frc = new FontRenderContext(null, true, true);
>
>          // check if we have a rotation
>          if (!at.isIdentity())
>          {
>              try
>              {
>                  AffineTransform atInv = at.createInverse();
>                  // do only apply the size of the transform, rotation will
> be realized by rotating the graphics,
>                  // otherwise the hp printers will not render the font
>                  Font derivedFont = awtFont.deriveFont(1f);
>                  g2d.setFont(derivedFont);
>
>                  GlyphVector glyphs = derivedFont.createGlyphVector(frc,
> codeString);
>
>                  // apply the inverse transformation to the graphics, which
> should be the same as applying the
>                  // transformation itself to the text
>                  g2d.transform(at);
>                  // translate the coordinates
>                  Point2D.Float newXy = new  Point2D.Float(x,y);
>                  atInv.transform(new Point2D.Float( x, y), newXy);
>                  g2d.drawGlyphVector(glyphs, (float)newXy.getX(),
> (float)newXy.getY());
>
>                  // restore the original transformation
>                  g2d.transform(atInv);
>              }
>              catch (NoninvertibleTransformException e)
>              {
>                  log.error("Error in " + getClass().getName() +
> ".writeFont", e);
>              }
>          }
>          else
>          {
>              Font derivedFont = awtFont.deriveFont(at);
>              g2d.setFont(derivedFont);
>
>              GlyphVector glyphs = derivedFont.createGlyphVector(frc,
> codeString);
>              g2d.drawGlyphVector(glyphs, x, y);
>          }
>
> Well, that made everything work for the sample PDF that I was working on.
> But then I realized that it is only because the "glyph" codes in the font
> are equal to the codes used in the page stream.
>
> For example, in a simple English PDF, there is no "toUnicode" table, and
> the same character codes are used in the page stream. But the glyph codes
> in the font are different.
>
> In another PDF (which is RTL and uses connected characters) the code
> sequence in the page stream start from 1 (like 1, 2, 3, 4, 5, 3, 6, ...)
> but there is no "toUnicode" in it, and the glyph codes in the fonts are
> different than those codes, and I didn't find any relation between the two.
>
> After all, I don't know how can I decide when to use glyphs and when to use
> the extracted text (string) to draw the characters. Or, is there a way to
> convert everything to glyph codes and draw all the text using glyphs?
There are a lot of different ways to encode the text/glyph mapping as you 
already found out. ;-) I'm afraid it's too much to write it down here.

I'm almost done, but I have to get rid of some unwanted side-effects. I hope to 
find some time at the weekend to finish my work.

> BTW, in your sample code to draw glyphs (quoted below) there's a
> "CIDstring" which I didn't understand and I thought maybe it has something
> to do with my current trouble.
The CIDstring in my example contains the codes for the glyphs and not the 
readable text.

> Thanks in advance,
> -Hamed
>
>
> On Sat, Feb 18, 2012 at 10:58 PM, Andreas Lehmkuehler<an...@lehmi.de>wrote:
>
>> Hi,
>>
>> Am 18.02.2012 18:52, schrieb Hamed Iravanchi:
>>
>>   Hi again.
>>>
>>> Thanks for ur attention to the issue.
>>> I actually checked,  and saw that the font itself (ttf stream) contains
>>> the
>>> correct cmap. If we can draw the text using glyph ID instead of
>>> characters,  the font knows the right characters to draw.
>>>
>>> I checked the Font class instance in the debugger,  it contains a cmap
>>> which is exactly right. First I was looking for ways to take the mapping
>>> from the font (since it is private member,  specific to Sun impl).
>>>
>>> But I realized we could ask the font to draw glyphs instead of characters.
>>> But i couldn't still find a right way to draw a glyph on graphics.
>>>
>> That's exactly what I'm doing. It somehow lokks like the following:
>>
>> Create the needed glyphs:
>>
>> FontRenderContext frc = new FontRenderContext(null, true, true);
>> int stringLength = CIDstring.length();
>> int[] codePoints = new int[stringLength];
>> for (int i=0;i<stringLength;i++)
>>    codePoints[i] = CIDstring.codePointAt(i);
>> GlyphVector glyphs = awtFont.createGlyphVector(frc, codePoints);
>>
>> ...
>>
>> Draw the glyphs:
>>
>> g2d.drawGlyphVector(glyphs, x, y);
>>
>>
>>   BTW,  I also can do the implementation and send u a patch once I realize
>>> what to do. Thanks for ur encouragement :-)
>>>
>> Thanks for the offer, I'm already on that, I just have to clean up the
>> code and to run some tests to avoid unwanted side effects.
>> Once my code is available you might want to doublecheck it.
>>
>>
>>   - Hamed
>>>   On Feb 18, 2012 7:05 PM, "Andreas Lehmkuehler"<an...@lehmi.de>   wrote:
>>>
>>>   <SNIP>
>>
>> BR
>> Andreas Lehmkühler

BR
Andreas Lehmkühler


Re: Help needed to resolve issue with converting Arabic characters to presentation forms

Posted by Hamed Iravanchi <ir...@gmail.com>.
Hi,

I saw you updated the issue in JIRA, so I downloaded the trunk code
and tested it.
I confirm that the case that I was investigating is now fixed, and it
is converted to image correctly. Thanks a lot.

I tested it with some other PDF files, most of them worked well, but I
could find a few that didn't work, having a similar problem. Most
notably, PDF files that are created with OpenOffice.org writer.

I also found normal (English) pdf files that didn't work either.

I extracted a page from each of them, and I'm attaching them to this
email. I didn't comment on JIRA issue because I wasn't sure that there
are related to the same issue or not. I haven't tried to debug any of
these files. I'll keep you posted if I could analyse them too.

Files attached to this email:
j.pdf: Farsi PDF file created by OpenOffice.org
k.pdf: Another Farsi PDF file, created by Jaws PDF Creator
l.pdf: A page from an English e-book, created by PDF-XChange

Note: the previous sample that I sent (which works correctly now) was
created by Microsoft Word.

Thanks again for all your efforts,
-Hamed


On 3/1/12, Andreas Lehmkuehler <an...@lehmi.de> wrote:
> Hi,
>
> Am 29.02.2012 09:49, schrieb Hamed Iravanchi:
>> Hi Andreas,
>>
>> Regarding the glyph-drawing issue, since I didn't hear anything from you I
>> decided to take a shot myself, so I checked out the code (1.6 release tag)
>> and started modifying it to see if I can get the result I expect, but I am
>> confused and need help :)
> Sorry, but I hadn't any free cycles in the last week ....
>
>> I managed to convert the sample PDF that I provided to image correctly,
>> but
>> I made almost everything else corrupt! Here's what I did:
>>
>> I added a "drawGlyph" to PDFont, next to "drawString" like this:
>>
>>      public abstract void drawString( String string, Graphics g, float
>> fontSize,
>>          AffineTransform at, float x, float y ) throws IOException;
>>
>>      public abstract void drawGlyph(int[] codeString, Graphics g, float
>> fontSize,
>>                                     AffineTransform at, float x, float y)
>> throws IOException;
>>
>> I tried to use the codes extracted from page stream. In the
>> PDFStreamEngine
>> ->  processEncodedText ->  for loop ->  when "font.encode" succeeds, I use
>> the
>> same code integer to draw glyphs, and I passed it along the string to
>> "processTextPosition" and I called "drawGlyph" in it, instead of
>> "drawString".
>>
>> Here's the drawGlyph code that I wrote, according to your guidance:
>>
>>      @Override
>>      public void drawGlyph(int[] codeString, Graphics g, float fontSize,
>> AffineTransform at, float x, float y)
>>              throws IOException
>>      {
>>          Font _awtFont = getawtFont();
>>          Graphics2D g2d = (Graphics2D)g;
>>          g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
>> RenderingHints.VALUE_ANTIALIAS_ON);
>>          writeFont(g2d, at, _awtFont, x, y, codeString);
>>      }
>>
>>
>> Which uses an overload of writeFont similar to the original:
>>
>>
>>      protected void writeFont(final Graphics2D g2d, final AffineTransform
>> at, final Font awtFont,
>>                               final float x, final float y, final int[]
>> codeString)
>>      {
>>          FontRenderContext frc = new FontRenderContext(null, true, true);
>>
>>          // check if we have a rotation
>>          if (!at.isIdentity())
>>          {
>>              try
>>              {
>>                  AffineTransform atInv = at.createInverse();
>>                  // do only apply the size of the transform, rotation will
>> be realized by rotating the graphics,
>>                  // otherwise the hp printers will not render the font
>>                  Font derivedFont = awtFont.deriveFont(1f);
>>                  g2d.setFont(derivedFont);
>>
>>                  GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>>
>>                  // apply the inverse transformation to the graphics,
>> which
>> should be the same as applying the
>>                  // transformation itself to the text
>>                  g2d.transform(at);
>>                  // translate the coordinates
>>                  Point2D.Float newXy = new  Point2D.Float(x,y);
>>                  atInv.transform(new Point2D.Float( x, y), newXy);
>>                  g2d.drawGlyphVector(glyphs, (float)newXy.getX(),
>> (float)newXy.getY());
>>
>>                  // restore the original transformation
>>                  g2d.transform(atInv);
>>              }
>>              catch (NoninvertibleTransformException e)
>>              {
>>                  log.error("Error in " + getClass().getName() +
>> ".writeFont", e);
>>              }
>>          }
>>          else
>>          {
>>              Font derivedFont = awtFont.deriveFont(at);
>>              g2d.setFont(derivedFont);
>>
>>              GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>>              g2d.drawGlyphVector(glyphs, x, y);
>>          }
>>
>> Well, that made everything work for the sample PDF that I was working on.
>> But then I realized that it is only because the "glyph" codes in the font
>> are equal to the codes used in the page stream.
>>
>> For example, in a simple English PDF, there is no "toUnicode" table, and
>> the same character codes are used in the page stream. But the glyph codes
>> in the font are different.
>>
>> In another PDF (which is RTL and uses connected characters) the code
>> sequence in the page stream start from 1 (like 1, 2, 3, 4, 5, 3, 6, ...)
>> but there is no "toUnicode" in it, and the glyph codes in the fonts are
>> different than those codes, and I didn't find any relation between the
>> two.
>>
>> After all, I don't know how can I decide when to use glyphs and when to
>> use
>> the extracted text (string) to draw the characters. Or, is there a way to
>> convert everything to glyph codes and draw all the text using glyphs?
> There are a lot of different ways to encode the text/glyph mapping as you
> already found out. ;-) I'm afraid it's too much to write it down here.
>
> I'm almost done, but I have to get rid of some unwanted side-effects. I hope
> to
> find some time at the weekend to finish my work.
>
>> BTW, in your sample code to draw glyphs (quoted below) there's a
>> "CIDstring" which I didn't understand and I thought maybe it has something
>> to do with my current trouble.
> The CIDstring in my example contains the codes for the glyphs and not the
> readable text.
>
>> Thanks in advance,
>> -Hamed
>>
>>
>> On Sat, Feb 18, 2012 at 10:58 PM, Andreas
>> Lehmkuehler<an...@lehmi.de>wrote:
>>
>>> Hi,
>>>
>>> Am 18.02.2012 18:52, schrieb Hamed Iravanchi:
>>>
>>>   Hi again.
>>>>
>>>> Thanks for ur attention to the issue.
>>>> I actually checked,  and saw that the font itself (ttf stream) contains
>>>> the
>>>> correct cmap. If we can draw the text using glyph ID instead of
>>>> characters,  the font knows the right characters to draw.
>>>>
>>>> I checked the Font class instance in the debugger,  it contains a cmap
>>>> which is exactly right. First I was looking for ways to take the mapping
>>>> from the font (since it is private member,  specific to Sun impl).
>>>>
>>>> But I realized we could ask the font to draw glyphs instead of
>>>> characters.
>>>> But i couldn't still find a right way to draw a glyph on graphics.
>>>>
>>> That's exactly what I'm doing. It somehow lokks like the following:
>>>
>>> Create the needed glyphs:
>>>
>>> FontRenderContext frc = new FontRenderContext(null, true, true);
>>> int stringLength = CIDstring.length();
>>> int[] codePoints = new int[stringLength];
>>> for (int i=0;i<stringLength;i++)
>>>    codePoints[i] = CIDstring.codePointAt(i);
>>> GlyphVector glyphs = awtFont.createGlyphVector(frc, codePoints);
>>>
>>> ...
>>>
>>> Draw the glyphs:
>>>
>>> g2d.drawGlyphVector(glyphs, x, y);
>>>
>>>
>>>   BTW,  I also can do the implementation and send u a patch once I
>>> realize
>>>> what to do. Thanks for ur encouragement :-)
>>>>
>>> Thanks for the offer, I'm already on that, I just have to clean up the
>>> code and to run some tests to avoid unwanted side effects.
>>> Once my code is available you might want to doublecheck it.
>>>
>>>
>>>   - Hamed
>>>>   On Feb 18, 2012 7:05 PM, "Andreas Lehmkuehler"<an...@lehmi.de>
>>>> wrote:
>>>>
>>>>   <SNIP>
>>>
>>> BR
>>> Andreas Lehmkühler
>
> BR
> Andreas Lehmkühler
>
>