You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by S S Satyanarayana Damarla <sa...@oracle.com> on 2017/11/22 12:36:23 UTC
How to retrieve rectangle bounds for Chart element in the PDF
document
Hi,
I have attached a PDF document which contains a Chart.
For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.
Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.
Thanks
-Satya
RE: How to retrieve rectangle bounds for Chart element in the PDF
document
Posted by S S Satyanarayana Damarla <sa...@oracle.com>.
Thanks much Tilman.
I will check this.
-Satya
On 2017-11-23 22:23, Tilman Hausherr <T....@t-online.de> wrote:
> There is no out-of-the-box solution for this (and the other posting).
> PDF is not a format that has a <TABLE>...</TABLE> or <CHART>...</CHART> syntax. PDF is just graphics. You can get the lines / shapes with this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_38931422_pdfbox-2D2-2D0-2D2-2Dcalling-2Dof-2Dpagedrawer-2Dprocesspage-2Dmethod-2Dcaught-2Dexceptions&d=DwID-g&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Umz7CyI3oUIonhXJ_99mFSdcUoC6CT8dW2epsP4YiKA&m=IB2gX1iiTh-q19KLM-UPP-49J0DFe97h6hDlgsVMaMU&s=Cf4q6lM8isTqmuc3ycDv2TJpkR3hEpzXzBOPGTS4gDw&e=
> However you'll still have to do something to find out where your table / chart is.
> To get some understanding on how tricky this is, open your file with PDFDebugger and look at the "contents" part. The operators you see are explained in the PDF 32000 specification ( https://urldefense.proofpoint.com/v2/url?u=https-3A__www.adobe.com_content_dam_acom_en_devnet_pdf_pdfs_PDF32000-5F2008.pdf&d=DwID-g&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Umz7CyI3oUIonhXJ_99mFSdcUoC6CT8dW2epsP4YiKA&m=IB2gX1iiTh-q19KLM-UPP-49J0DFe97h6hDlgsVMaMU&s=jm2rJG_7fgUpXkotxN-Ie8obwgm7m88HAPolMOEKlZ4&e=
), in the segment "operator summary". (start with operators m, l, c, f and s).
> Your shape object is this:
>
> 0.357 0.608 0.835 rg
> 125.06 715.44 m
> 125.06 717.96 127.1 720 129.61 720 c
> 204 720 l
> 206.51 720 208.56 717.96 208.56 715.44 c
> 208.56 697.21 l
> 208.56 694.69 206.51 692.65 204 692.65 c
> 129.61 692.65 l
> 127.1 692.65 125.06 694.69 125.06 697.21 c
> h
> f*
> 1 w
> 0.255 0.443 0.612 RG
> 125.06 715.44 m
> 125.06 717.96 127.1 720 129.61 720 c
> 204 720 l
> 206.51 720 208.56 717.96 208.56 715.44 c
> 208.56 697.21 l
> 208.56 694.69 206.51 692.65 204 692.65 c
> 129.61 692.65 l
> 127.1 692.65 125.06 694.69 125.06 697.21 c
> h
> S
>
> The chart in the other file is more difficult to find, I didn't even try.
>
> Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: How to retrieve rectangle bounds for Chart element in the PDF
document
Posted by Tilman Hausherr <TH...@t-online.de>.
There is no out-of-the-box solution for this (and the other posting).
PDF is not a format that has a <TABLE>...</TABLE> or <CHART>...</CHART>
syntax. PDF is just graphics. You can get the lines / shapes with this:
https://stackoverflow.com/questions/38931422/pdfbox-2-0-2-calling-of-pagedrawer-processpage-method-caught-exceptions
However you'll still have to do something to find out where your table /
chart is.
To get some understanding on how tricky this is, open your file with
PDFDebugger and look at the "contents" part. The operators you see are
explained in the PDF 32000 specification (
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
), in the segment "operator summary". (start with operators m, l, c, f
and s).
Your shape object is this:
0.357 0.608 0.835 rg
125.06 715.44 m
125.06 717.96 127.1 720 129.61 720 c
204 720 l
206.51 720 208.56 717.96 208.56 715.44 c
208.56 697.21 l
208.56 694.69 206.51 692.65 204 692.65 c
129.61 692.65 l
127.1 692.65 125.06 694.69 125.06 697.21 c
h
f*
1 w
0.255 0.443 0.612 RG
125.06 715.44 m
125.06 717.96 127.1 720 129.61 720 c
204 720 l
206.51 720 208.56 717.96 208.56 715.44 c
208.56 697.21 l
208.56 694.69 206.51 692.65 204 692.65 c
129.61 692.65 l
127.1 692.65 125.06 694.69 125.06 697.21 c
h
S
The chart in the other file is more difficult to find, I didn't even try.
Tilman
Am 23.11.2017 um 05:00 schrieb S S Satyanarayana Damarla:
> Looks like PDF document attachment didn't get through.
>
>
>
> I have uploaded the PDF document at the following location:
>
> https://drive.google.com/file/d/1uYoQweCVbO4cNQiMnJuVjM1WZu7Cr7Ae/view
>
>
>
> Please look into above link for accessing the PDF document that contains this Chart.
>
>
>
> Appreciate any help on this.
>
>
>
> Thanks,
>
> -Satya
>
>
>
>
>
> On 2017-11-22 18:06, S S Satyanarayana Damarla <HYPERLINK "mailto:s...@oracle.com"s...@oracle.com> wrote:
>
>> Hi,
>> I have attached a PDF document which contains a Chart.
>> For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.
>> Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.
>> Thanks
>> -Satya
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
RE: How to retrieve rectangle bounds for Chart element in the PDF
document
Posted by S S Satyanarayana Damarla <sa...@oracle.com>.
Looks like PDF document attachment didn't get through.
I have uploaded the PDF document at the following location:
https://drive.google.com/file/d/1uYoQweCVbO4cNQiMnJuVjM1WZu7Cr7Ae/view
Please look into above link for accessing the PDF document that contains this Chart.
Appreciate any help on this.
Thanks,
-Satya
On 2017-11-22 18:06, S S Satyanarayana Damarla <HYPERLINK "mailto:s...@oracle.com"s...@oracle.com> wrote:
> Hi,
> I have attached a PDF document which contains a Chart.
>
> For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.
>
> Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.
>
> Thanks
> -Satya