You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/12/21 10:01:00 UTC
[jira] [Updated] (PDFBOX-4000) Wrong line break detection for the
before ordinal indicator superscripts.
[ https://issues.apache.org/jira/browse/PDFBOX-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-4000:
------------------------------------
Attachment: nk7-p19.pdf
File nk7-p19.pdf is from Dan Liu from the mailing list, full file at http://proj.gz-yibo.com:2880/nk7.pdf
{quote}
a text line in page 19:
7.放射性核素扫描应用133 氙或99m 锝-二乙三胺五乙酸(99mTc-DTPA)雾化吸人。99m 锝
becomes:
133 99m 99m 99m
7.放射性核素扫描应用 氙或 锝-二乙三胺五乙酸(Tc-DTPA)雾化吸人。 锝
{quote}
> Wrong line break detection for the before ordinal indicator superscripts.
> -------------------------------------------------------------------------
>
> Key: PDFBOX-4000
> URL: https://issues.apache.org/jira/browse/PDFBOX-4000
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.6, 2.0.7, 2.0.8
> Environment: Windows 10 64-bit
> Reporter: Harun Reşit Zafer
> Attachments: contract_00569_SEDAR-experimental.txt, contract_00569_SEDAR-marked-1.png, contract_00569_SEDAR.pdf, contract_00882_SEDAR.pdf, contract_00968_SEDAR.pdf, nk7-p19.pdf
>
>
> Attached 3 documents have lines similar to {{THIS AGREEMENT is made as of the 5th day of February, 2016.}} PdfBox returns this line as 3 separate lines:
> {{THIS AGREEMENT is made as of the 5}}
> {{th}}
> {{day of February, 2016.}}
> You can find each line close to the top of documents.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org