You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Neeraj <ne...@gmail.com> on 2013/01/17 14:58:51 UTC

Fwd: FOP 1.1 - Unable to copy/paste text is not working

Hi,

I am using FOP 1.1 and through command line; trying to create PDF in
Arabic language. Now when i copy Arabic text from PDF to UTF-8 editor.
it not shows some Arabic chars, instead of shows box.

Please help me to resolve this issue. I have attached corresponding files.

command line:

fop -c fop.xconf -fo arabic.fo -pdf out.pdf

Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working

Posted by Luis Bernardo <lm...@gmail.com>.
The two lines look the same to me. Maybe you copied and pasted the same 
content twice?

The only reason I suggested Arial was because I didn't have your font 
and I know Arial has Arabic glyphs and it is known by all text editors.

If you use a text editor (say, openoffice) and export to PDF are you 
then able to copy and paste from that PDF? If so, can you send that PDF 
and the one generated by FOP (with full embedding) so that we can 
compare them?

On 1/30/13 1:44 PM, Neeraj wrote:
> Hi Luis,
>
> Thanks for reply.
>
> Yes, my editor can handle used font.
> If you highlight the text in the editor and set the font to Arial do you see any
> glyph? For PDF text - No
>
> For embedding this, May be I added embedding mode full later, after generating
> PDF, but in both the cases it is giving same results.
>
> The issue I reported was for non-Base14 font. You are using Arial which is
> Base14 font and FOP has full support for these kinds of fonts.
>
> Well as you said, I tried same functionality with Arial font also and found same
> issue in different form.
>
> Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح
> PDF Arabic text      - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح
>
> If I compare PDF and MS-Word files, it looks exactly similar but when I copy it
> to an editor(Font supported), the words look different (Glyphs are missing). You
> can check the above text.
>
> Why am I loosing text while doing copy/paste?
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working

Posted by Vincent Hennebert <vh...@gmail.com>.
[Moving over to fop-dev as this is getting technical]

On 30/01/13 15:58, Glenn Adams wrote:
> On Wed, Jan 30, 2013 at 6:44 AM, Neeraj <ne...@gmail.com> wrote:
> 
>>
>> Yes, my editor can handle used font.
>> If you highlight the text in the editor and set the font to Arial do you
>> see any
>> glyph? For PDF text - No
>>
>> For embedding this, May be I added embedding mode full later, after
>> generating
>> PDF, but in both the cases it is giving same results.
>>
>> The issue I reported was for non-Base14 font. You are using Arial which is
>> Base14 font and FOP has full support for these kinds of fonts.
>>
>> Well as you said, I tried same functionality with Arial font also and
>> found same
>> issue in different form.
>>
>> Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح
>> PDF Arabic text      - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح
>>
>> If I compare PDF and MS-Word files, it looks exactly similar but when I
>> copy it
>> to an editor(Font supported), the words look different (Glyphs are
>> missing). You
>> can check the above text.
>>
>> Why am I loosing text while doing copy/paste?
> 
> 
> One thing to keep in mind is that some fonts do not include entries in the
> CMAP table for all glyphs that can be referenced by performing the
> character to glyph transformation process. In this case FOP, synthesizes a
> CMAP entry which is used in the embedded font, where this entry uses a
> dynamically generated Unicode value in the PUA (private use area). This
> latter is necessary since PDF requires specifying *some* character code
> (and not glyph index directly) when performing text drawing.

I may be missing something, but I don’t understand this ‘PDF requires
specifying some character code’. AFAIU you can put glyph indices
directly in the PDF string; you just have to specify Identity-H as the
font’s encoding and Identity in the CIDToGIDMap. So I’m not sure why it
is necessary to use codes in the private use area.

Then, to have copy-paste working, you ‘just’ have to provide an
appropriate ToUnicode CMap, that re-maps the shaped glyph to the
original Unicode code point(s).


> If you then attempt to copy this text and paste into another editor that
> isn't aware of this dynamic mapping using the embedded font's CMAP, then
> you may lose that mapping information. One possible way to fix this, which
> I haven't investigated in detail, is to provide a separately encoding
> Unicode string that contains the original, pre-transformed text, and
> associate this string with the displayed post-transformed character string
> that may contain these dynamic PUA characters. The PDF viewer would then
> need to make use of the pre-transformed string when performing copy
> operations. However, I haven't researched this to see if PDF supports.
> 
> Anyway, I suspect this is what is causing your problem. I've opened a bug
> on this at [1].
> 
> [1] https://issues.apache.org/jira/browse/FOP-2204

Vincent

Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working

Posted by Glenn Adams <gl...@skynav.com>.
On Wed, Jan 30, 2013 at 6:44 AM, Neeraj <ne...@gmail.com> wrote:

>
> Yes, my editor can handle used font.
> If you highlight the text in the editor and set the font to Arial do you
> see any
> glyph? For PDF text - No
>
> For embedding this, May be I added embedding mode full later, after
> generating
> PDF, but in both the cases it is giving same results.
>
> The issue I reported was for non-Base14 font. You are using Arial which is
> Base14 font and FOP has full support for these kinds of fonts.
>
> Well as you said, I tried same functionality with Arial font also and
> found same
> issue in different form.
>
> Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح
> PDF Arabic text      - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح
>
> If I compare PDF and MS-Word files, it looks exactly similar but when I
> copy it
> to an editor(Font supported), the words look different (Glyphs are
> missing). You
> can check the above text.
>
> Why am I loosing text while doing copy/paste?


One thing to keep in mind is that some fonts do not include entries in the
CMAP table for all glyphs that can be referenced by performing the
character to glyph transformation process. In this case FOP, synthesizes a
CMAP entry which is used in the embedded font, where this entry uses a
dynamically generated Unicode value in the PUA (private use area). This
latter is necessary since PDF requires specifying *some* character code
(and not glyph index directly) when performing text drawing.

If you then attempt to copy this text and paste into another editor that
isn't aware of this dynamic mapping using the embedded font's CMAP, then
you may lose that mapping information. One possible way to fix this, which
I haven't investigated in detail, is to provide a separately encoding
Unicode string that contains the original, pre-transformed text, and
associate this string with the displayed post-transformed character string
that may contain these dynamic PUA characters. The PDF viewer would then
need to make use of the pre-transformed string when performing copy
operations. However, I haven't researched this to see if PDF supports.

Anyway, I suspect this is what is causing your problem. I've opened a bug
on this at [1].

[1] https://issues.apache.org/jira/browse/FOP-2204

Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working

Posted by Neeraj <ne...@gmail.com>.
Hi Luis,

Thanks for reply.

Yes, my editor can handle used font.
If you highlight the text in the editor and set the font to Arial do you see any 
glyph? For PDF text - No

For embedding this, May be I added embedding mode full later, after generating 
PDF, but in both the cases it is giving same results. 

The issue I reported was for non-Base14 font. You are using Arial which is 
Base14 font and FOP has full support for these kinds of fonts.

Well as you said, I tried same functionality with Arial font also and found same 
issue in different form. 

Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح
PDF Arabic text      - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح

If I compare PDF and MS-Word files, it looks exactly similar but when I copy it 
to an editor(Font supported), the words look different (Glyphs are missing). You 
can check the above text.

Why am I loosing text while doing copy/paste?





---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working

Posted by Luis Bernardo <lm...@gmail.com>.
does your editor know about the font you are using? if you highlight the 
text in the editor and set the font to Arial do you see any glyph?

I tried your example but with Arial, copied and pasted to OpenOffice and 
I am able to see the glyphs.

now, not that this makes a difference for your problem, are you sure the 
output file you sent was generated with the fop.xconf you sent? in your 
conf file you have embedding mode full but in the output file it is 
being subset.

On 1/17/13 1:58 PM, Neeraj wrote:
> Hi,
>
> I am using FOP 1.1 and through command line; trying to create PDF in
> Arabic language. Now when i copy Arabic text from PDF to UTF-8 editor.
> it not shows some Arabic chars, instead of shows box.
>
> Please help me to resolve this issue. I have attached corresponding files.
>
> command line:
>
> fop -c fop.xconf -fo arabic.fo -pdf out.pdf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org