You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jerry <je...@gmail.com> on 2016/03/30 00:00:15 UTC

bold and italic font variants misbehaving

I have written an application that generates an .epub document from user 
input.

I am now trying to use PdfBox to add PDF output of the same source text. 
But I have encountered problems when trying to render bold or italic text:

- In the italic font, the characters u and i in the word "quick" are 
overlapped.

- In the word-pair "brown fox" (where "brown" is in plain font and "fox" 
is italic) there is no space between the words but there is an extra 
space between the f and o in "fox".

- In the phrase "dog and ran" (which is bold) the single space between 
"and" and "ran" is too wide, and there is no space following "ran" and 
the next word.

And yet, the same string is rendered with correct spacing when output as 
plain text (no font changes).

See the output files at:

https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0

As a newbie to both PDF and PdfBox, I started with a tutorial I found at 
http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified that I 
had entered the tutorial correctly by running it and viewing the output, 
I began experimenting by displaying a simple test string that is long 
enough to require word wrapping. When I got that to work, I tried adding 
bold and italic HTML tags to the string (since the end goal is to create 
PDF from .epub source).

Here is my test code:

https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0

In TestBed.java, doTutorial() is the unmodified tutorial.

The method doMyCode() displays the test string by breaking it into 
individual whole words. If I mark words with <i> and <b> tags, they are 
correctly rendered with bold and italic fonts. But this limits font 
changes to whole words only, which rules out a font change in the middle 
of a string of characters. To handle that I need to output individual 
characters, not words.

The method doMyCode2() displays the test string word by word unless the 
word contains an HTML tag, then text is rendered character by character.

If the test string contains no tags, it renders correctly.

See the sample file withNoHtmlTags.pdf.

When <i> and <b> tags are encountered, fonts get changed to 
PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and the 
string is rendered, but the character spacing is mangled.

See the sample file withItalicsAndBoldFonts.pdf.

Both of these files were generated by the same code---the doMyCode2() 
method---with the only change being the addition or subtraction of <i> 
and <b> tags to the string paraText.

It does not appear to be a font problem, rather a rendering problem. I 
get the same (well, nearly the same) results with both Times and 
Helvetica---the "nearly the same" being the positioning of the u and I 
characters in the word "quick"---still overlapping, but in the Helvetica 
rendering, the i is in the middle of the u while in the Times rendering, 
the i overlaps the last stroke of the u so that it looks like a u with a 
dot over its tail.

What can I do to fix this?

Thanks.

Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: bold and italic font variants misbehaving

Posted by John Hewson <jo...@jahewson.com>.
> On 30 Mar 2016, at 16:29, Jerry <je...@gmail.com> wrote:
> 
>> On 3/29/2016 6:25 PM, John Hewson wrote:
>> Do you really need to handle that? Changing fonts mid-word is generally not a done thing. -- John
> John:
> 
> Yes---<i>oh, yes!</i> absolutely <b>required</b>---I'm certain I really need to handle it.
> 
> My app is a conduit for text created elsewhere. And one of the texts I'm using as a template for app development has multiple instances of italicized words followed (or preceded, or both) by hyphen/long dashes followed by non-italic text with no intervening spaces.

Ah yes, I see. Not a mid-word font change but a font change at punctuation with no spaces in-between. You might want to think about breaking your text into runs of one consistent font/style to avoid bloating the PDF with a text drawing operation for each character.

-- John

> Jerry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: bold and italic font variants misbehaving

Posted by Jerry <je...@gmail.com>.
On 3/29/2016 6:25 PM, John Hewson wrote:
> Do you really need to handle that? Changing fonts mid-word is 
> generally not a done thing. -- John
John:

Yes---<i>oh, yes!</i> absolutely <b>required</b>---I'm certain I really 
need to handle it.

My app is a conduit for text created elsewhere. And one of the texts I'm 
using as a template for app development has multiple instances of 
italicized words followed (or preceded, or both) by hyphen/long dashes 
followed by non-italic text with no intervening spaces.

Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: bold and italic font variants misbehaving

Posted by John Hewson <jo...@jahewson.com>.
> On 29 Mar 2016, at 23:00, Jerry <je...@gmail.com> wrote:
> 
> I have written an application that generates an .epub document from user input.
> 
> I am now trying to use PdfBox to add PDF output of the same source text. But I have encountered problems when trying to render bold or italic text:
> 
> - In the italic font, the characters u and i in the word "quick" are overlapped.
> 
> - In the word-pair "brown fox" (where "brown" is in plain font and "fox" is italic) there is no space between the words but there is an extra space between the f and o in "fox".
> 
> - In the phrase "dog and ran" (which is bold) the single space between "and" and "ran" is too wide, and there is no space following "ran" and the next word.
> 
> And yet, the same string is rendered with correct spacing when output as plain text (no font changes).
> 
> See the output files at:
> 
> https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
> https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0
> 
> As a newbie to both PDF and PdfBox, I started with a tutorial I found at http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified that I had entered the tutorial correctly by running it and viewing the output, I began experimenting by displaying a simple test string that is long enough to require word wrapping. When I got that to work, I tried adding bold and italic HTML tags to the string (since the end goal is to create PDF from .epub source).
> 
> Here is my test code:
> 
> https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0
> 
> In TestBed.java, doTutorial() is the unmodified tutorial.
> 
> The method doMyCode() displays the test string by breaking it into individual whole words. If I mark words with <i> and <b> tags, they are correctly rendered with bold and italic fonts. But this limits font changes to whole words only, which rules out a font change in the middle of a string of characters. To handle that I need to output individual characters, not words.

Do you really need to handle that? Changing fonts mid-word is generally not a done thing.

-- John

> The method doMyCode2() displays the test string word by word unless the word contains an HTML tag, then text is rendered character by character.
> If the test string contains no tags, it renders correctly.
> 
> See the sample file withNoHtmlTags.pdf.
> 
> When <i> and <b> tags are encountered, fonts get changed to PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and the string is rendered, but the character spacing is mangled.
> 
> See the sample file withItalicsAndBoldFonts.pdf.
> 
> Both of these files were generated by the same code---the doMyCode2() method---with the only change being the addition or subtraction of <i> and <b> tags to the string paraText.
> 
> It does not appear to be a font problem, rather a rendering problem. I get the same (well, nearly the same) results with both Times and Helvetica---the "nearly the same" being the positioning of the u and I characters in the word "quick"---still overlapping, but in the Helvetica rendering, the i is in the middle of the u while in the Times rendering, the i overlaps the last stroke of the u so that it looks like a u with a dot over its tail.
> 
> What can I do to fix this?
> 
> Thanks.
> 
> Jerry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: bold and italic font variants misbehaving

Posted by Jerry <je...@gmail.com>.
Tilman:

Yes, you are right---cause was a programming problem.  After seeing your 
comments, I revisited the code and it seems my attempt at mixing whole 
word and single character output scrambled my calculations of the 
required offset from the left margin.

I have reworked the code to output everything single character by single 
character, and it now works as expected.

Thanks.

Jerry

On 3/29/2016 3:33 PM, Tilman Hausherr wrote:
> Hi,
>
> I suspect that the cause is a programming problem, your comments 
> mention separation of words but I see this:
>
> BT
>   /F2 12 Tf
>   112 652 Td
>   (q) Tj
> ET
> BT
>   /F2 12 Tf
>   118 652 Td
>   (u) Tj
> ET
> BT
>   /F2 12 Tf
>   121 652 Td
>   (i) Tj
> ET
> BT
>   /F2 12 Tf
>   126 652 Td
>   (c) Tj
> ET
> BT
>   /F2 12 Tf
>   131 652 Td
>   (k) Tj
> ET
>
>
> I do also suspect that you are calculating the offset based on what 
> you are intending to write, but you need to do this based on what you 
> just wrote.
>
> Tilman
>
> Am 30.03.2016 um 00:00 schrieb Jerry:
>> I have written an application that generates an .epub document from 
>> user input.
>>
>> I am now trying to use PdfBox to add PDF output of the same source 
>> text. But I have encountered problems when trying to render bold or 
>> italic text:
>>
>> - In the italic font, the characters u and i in the word "quick" are 
>> overlapped.
>>
>> - In the word-pair "brown fox" (where "brown" is in plain font and 
>> "fox" is italic) there is no space between the words but there is an 
>> extra space between the f and o in "fox".
>>
>> - In the phrase "dog and ran" (which is bold) the single space 
>> between "and" and "ran" is too wide, and there is no space following 
>> "ran" and the next word.
>>
>> And yet, the same string is rendered with correct spacing when output 
>> as plain text (no font changes).
>>
>> See the output files at:
>>
>> https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
>> https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0 
>>
>>
>> As a newbie to both PDF and PdfBox, I started with a tutorial I found 
>> at http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified 
>> that I had entered the tutorial correctly by running it and viewing 
>> the output, I began experimenting by displaying a simple test string 
>> that is long enough to require word wrapping. When I got that to 
>> work, I tried adding bold and italic HTML tags to the string (since 
>> the end goal is to create PDF from .epub source).
>>
>> Here is my test code:
>>
>> https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0
>>
>> In TestBed.java, doTutorial() is the unmodified tutorial.
>>
>> The method doMyCode() displays the test string by breaking it into 
>> individual whole words. If I mark words with <i> and <b> tags, they 
>> are correctly rendered with bold and italic fonts. But this limits 
>> font changes to whole words only, which rules out a font change in 
>> the middle of a string of characters. To handle that I need to output 
>> individual characters, not words.
>>
>> The method doMyCode2() displays the test string word by word unless 
>> the word contains an HTML tag, then text is rendered character by 
>> character.
>>
>> If the test string contains no tags, it renders correctly.
>>
>> See the sample file withNoHtmlTags.pdf.
>>
>> When <i> and <b> tags are encountered, fonts get changed to 
>> PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and 
>> the string is rendered, but the character spacing is mangled.
>>
>> See the sample file withItalicsAndBoldFonts.pdf.
>>
>> Both of these files were generated by the same code---the doMyCode2() 
>> method---with the only change being the addition or subtraction of 
>> <i> and <b> tags to the string paraText.
>>
>> It does not appear to be a font problem, rather a rendering problem. 
>> I get the same (well, nearly the same) results with both Times and 
>> Helvetica---the "nearly the same" being the positioning of the u and 
>> I characters in the word "quick"---still overlapping, but in the 
>> Helvetica rendering, the i is in the middle of the u while in the 
>> Times rendering, the i overlaps the last stroke of the u so that it 
>> looks like a u with a dot over its tail.
>>
>> What can I do to fix this?
>>
>> Thanks.
>>
>> Jerry
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: bold and italic font variants misbehaving

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

I suspect that the cause is a programming problem, your comments mention 
separation of words but I see this:

BT
   /F2 12 Tf
   112 652 Td
   (q) Tj
ET
BT
   /F2 12 Tf
   118 652 Td
   (u) Tj
ET
BT
   /F2 12 Tf
   121 652 Td
   (i) Tj
ET
BT
   /F2 12 Tf
   126 652 Td
   (c) Tj
ET
BT
   /F2 12 Tf
   131 652 Td
   (k) Tj
ET


I do also suspect that you are calculating the offset based on what you 
are intending to write, but you need to do this based on what you just 
wrote.

Tilman

Am 30.03.2016 um 00:00 schrieb Jerry:
> I have written an application that generates an .epub document from 
> user input.
>
> I am now trying to use PdfBox to add PDF output of the same source 
> text. But I have encountered problems when trying to render bold or 
> italic text:
>
> - In the italic font, the characters u and i in the word "quick" are 
> overlapped.
>
> - In the word-pair "brown fox" (where "brown" is in plain font and 
> "fox" is italic) there is no space between the words but there is an 
> extra space between the f and o in "fox".
>
> - In the phrase "dog and ran" (which is bold) the single space between 
> "and" and "ran" is too wide, and there is no space following "ran" and 
> the next word.
>
> And yet, the same string is rendered with correct spacing when output 
> as plain text (no font changes).
>
> See the output files at:
>
> https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
> https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0 
>
>
> As a newbie to both PDF and PdfBox, I started with a tutorial I found 
> at http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified that 
> I had entered the tutorial correctly by running it and viewing the 
> output, I began experimenting by displaying a simple test string that 
> is long enough to require word wrapping. When I got that to work, I 
> tried adding bold and italic HTML tags to the string (since the end 
> goal is to create PDF from .epub source).
>
> Here is my test code:
>
> https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0
>
> In TestBed.java, doTutorial() is the unmodified tutorial.
>
> The method doMyCode() displays the test string by breaking it into 
> individual whole words. If I mark words with <i> and <b> tags, they 
> are correctly rendered with bold and italic fonts. But this limits 
> font changes to whole words only, which rules out a font change in the 
> middle of a string of characters. To handle that I need to output 
> individual characters, not words.
>
> The method doMyCode2() displays the test string word by word unless 
> the word contains an HTML tag, then text is rendered character by 
> character.
>
> If the test string contains no tags, it renders correctly.
>
> See the sample file withNoHtmlTags.pdf.
>
> When <i> and <b> tags are encountered, fonts get changed to 
> PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and 
> the string is rendered, but the character spacing is mangled.
>
> See the sample file withItalicsAndBoldFonts.pdf.
>
> Both of these files were generated by the same code---the doMyCode2() 
> method---with the only change being the addition or subtraction of <i> 
> and <b> tags to the string paraText.
>
> It does not appear to be a font problem, rather a rendering problem. I 
> get the same (well, nearly the same) results with both Times and 
> Helvetica---the "nearly the same" being the positioning of the u and I 
> characters in the word "quick"---still overlapping, but in the 
> Helvetica rendering, the i is in the middle of the u while in the 
> Times rendering, the i overlaps the last stroke of the u so that it 
> looks like a u with a dot over its tail.
>
> What can I do to fix this?
>
> Thanks.
>
> Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org