You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jürgen Uhl <ju...@gmx.de> on 2015/08/24 20:11:08 UTC

quotedbl causes NullPointerException

I have a pdf document using (besides others) the font 
CourierNewPS-BoldMT and text with this font containing a double quote.

When calling PDFont.encode, this results in a NullPointerException due 
to the following:

 1. The font encoding is built using pdf /DIFFERENCES which overwrites
    the original "quotedbl" at index 34 with an "A". The entries for
    quotedblbase/left/right are left unchanged. As a result, the
    inverted font does not contain "quotedbl" as key.
 2. Within encode, the character code 34 gets assigned the name
    "quotedbl", which is then not found in the inverse encoding
    (PDTrueTypeFont.encode -> int code = inverted.get(name))
 3. Right before this code line causing the NullPointerException, there
    is a check whether ttf.hasGlyph("quotedbl") (which in this case is
    false) and, if not, whether ttf.hasGlyph("uni0022") (which in this
    case is true); however, this has no consequence for the continuation
    of the code, which then crashes, since inverted.get("quotedbl") is
    null (which is assigned to an int).

I believe, this is a bug in PDFBox, but have no idea, whether the 
handling within encode should be changed (maybe using the "else" part in 
case ttf.hasGlyph("quotedbl") is false or whether code 34 should be 
assigned to quotedblbase in the first place, or even something else.

In any case, I'd of course be eager to learn about ways to circumvent 
this situation as a PDFBox user.

Juergen


Re: quotedbl causes NullPointerException

Posted by John Hewson <jo...@jahewson.com>.
Hi Juergen,

Thanks for letting us know about this, the NullPointerException certainly sounds like a PDFBox bug.
Please open an issue on JIRA (https://issues.apache.org/jira/browse/PDFBOX/ <https://issues.apache.org/jira/browse/PDFBOX/>) and upload the problem PDF (via More > Attach Files).

Thanks,

— John

> On 24 Aug 2015, at 11:11, Jürgen Uhl <ju...@gmx.de> wrote:
> 
> I have a pdf document using (besides others) the font CourierNewPS-BoldMT and text with this font containing a double quote.
> 
> When calling PDFont.encode, this results in a NullPointerException due to the following:
> The font encoding is built using pdf /DIFFERENCES which overwrites the original "quotedbl" at index 34 with an "A". The entries for quotedblbase/left/right are left unchanged. As a result, the inverted font does not contain "quotedbl" as key.
> Within encode, the character code 34 gets assigned the name "quotedbl", which is then not found in the inverse encoding (PDTrueTypeFont.encode -> int code = inverted.get(name))
> Right before this code line causing the NullPointerException, there is a check whether ttf.hasGlyph("quotedbl") (which in this case is false) and, if not, whether ttf.hasGlyph("uni0022") (which in this case is true); however, this has no consequence for the continuation of the code, which then crashes, since inverted.get("quotedbl") is null (which is assigned to an int).
> I believe, this is a bug in PDFBox, but have no idea, whether the handling within encode should be changed (maybe using the "else" part in case ttf.hasGlyph("quotedbl") is false or whether code 34 should be assigned to quotedblbase in the first place, or even something else.
> In any case, I'd of course be eager to learn about ways to circumvent this situation as a PDFBox user.
> Juergen