You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Navnath Kumbhar (JIRA)" <ji...@apache.org> on 2017/10/31 11:40:02 UTC

[jira] [Updated] (PDFBOX-3986) Bounding box of mathematical symbols are not proper

     [ https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navnath Kumbhar updated PDFBOX-3986:
------------------------------------
    Description: 
Hello Support Team,

I am working on a task where I have to extract formulas from PDF document and convert them into images.

But when I extract them using PDFBox, some of the symbols like *Summation*, *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.

I checked the output of DrawPrintTextLocations example with that particular PDF document and result does not look normal.
Red boxes are not aligned properly in the output as you will see in the attachment files.

I am, herewith, attaching the output of two pages and PDF document itself.

*Please refer page no. 34 or 37 for this issue.*

Thank you in advance!

  was:
Hello Support Team,

I am working on a task where I have to extract formulas from PDF document and convert them into images.

But when I extract them using PDFBox, some of the symbols like *Summation*, *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.

I checked the output of DrawPrintTextLocations example with that particular PDF document and result does not look normal.
Red boxes are not aligned properly in the output as you can see.

I am, herewith, attaching the output of two pages and PDF document itself.

*Please refer page no. 34 or 37 for this issue.*

Thank you in advance!


> Bounding box of mathematical symbols are not proper
> ---------------------------------------------------
>
>                 Key: PDFBOX-3986
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3986
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>         Environment: Windows 7 (64 bit)
>            Reporter: Navnath Kumbhar
>         Attachments: formula-marked-34.png, formula-marked-37.png, formula.pdf
>
>
> Hello Support Team,
> I am working on a task where I have to extract formulas from PDF document and convert them into images.
> But when I extract them using PDFBox, some of the symbols like *Summation*, *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.
> I checked the output of DrawPrintTextLocations example with that particular PDF document and result does not look normal.
> Red boxes are not aligned properly in the output as you will see in the attachment files.
> I am, herewith, attaching the output of two pages and PDF document itself.
> *Please refer page no. 34 or 37 for this issue.*
> Thank you in advance!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org