You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/06/17 21:45:12 UTC
[jira] [Closed] (PDFBOX-2009) PDFStreamEngine.processEncodedText
incorrectly handling UTF-16 text with BOM FEFF
[ https://issues.apache.org/jira/browse/PDFBOX-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson closed PDFBOX-2009.
-------------------------------
Resolution: Not a Problem
It turns out that the section in the PDF spec about the BOM does not apply to content stream text - instead it applies to bookmarks, annotations, etc. The code size of content stream text should actually be determined by the CMap of the current Font.
> PDFStreamEngine.processEncodedText incorrectly handling UTF-16 text with BOM FEFF
> ---------------------------------------------------------------------------------
>
> Key: PDFBOX-2009
> URL: https://issues.apache.org/jira/browse/PDFBOX-2009
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Philip Helger
> Fix For: 2.0.0
>
> Attachments: test-properties.pdf
>
>
> When having a text print operation like
> <FEFF21222193219103B103A003A6> Tj
> than the PDFStreamEngine.processEncodedText does not handle this correctly.
> Am I correct that if a BOM was determined, the codelength should be set to 2 (and not be changed)? Or should alternatively simply the BOM be skipped?
> It may be related to PDFBOX-920
--
This message was sent by Atlassian JIRA
(v6.2#6252)