You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by S S Satyanarayana Damarla <sa...@oracle.com> on 2017/11/22 12:36:23 UTC

How to retrieve rectangle bounds for Chart element in the PDF document

Hi,

I have attached a PDF document which contains a Chart.

 

For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.

 

Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.

 

Thanks

-Satya

RE: How to retrieve rectangle bounds for Chart element in the PDF document

Posted by S S Satyanarayana Damarla <sa...@oracle.com>.
Thanks much Tilman.

I will check this.

-Satya


On 2017-11-23 22:23, Tilman Hausherr <T....@t-online.de> wrote: 
> There is no out-of-the-box solution for this (and the other posting). 
> PDF is not a format that has a <TABLE>...</TABLE>  or <CHART>...</CHART> syntax. PDF is just graphics. You can get the lines / shapes with this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_38931422_pdfbox-2D2-2D0-2D2-2Dcalling-2Dof-2Dpagedrawer-2Dprocesspage-2Dmethod-2Dcaught-2Dexceptions&d=DwID-g&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Umz7CyI3oUIonhXJ_99mFSdcUoC6CT8dW2epsP4YiKA&m=IB2gX1iiTh-q19KLM-UPP-49J0DFe97h6hDlgsVMaMU&s=Cf4q6lM8isTqmuc3ycDv2TJpkR3hEpzXzBOPGTS4gDw&e=
> However you'll still have to do something to find out where your table / chart is.

> To get some understanding on how tricky this is, open your file with PDFDebugger and look at the "contents" part. The operators you see are explained in the PDF 32000 specification ( https://urldefense.proofpoint.com/v2/url?u=https-3A__www.adobe.com_content_dam_acom_en_devnet_pdf_pdfs_PDF32000-5F2008.pdf&d=DwID-g&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Umz7CyI3oUIonhXJ_99mFSdcUoC6CT8dW2epsP4YiKA&m=IB2gX1iiTh-q19KLM-UPP-49J0DFe97h6hDlgsVMaMU&s=jm2rJG_7fgUpXkotxN-Ie8obwgm7m88HAPolMOEKlZ4&e=
), in the segment "operator summary". (start with operators m, l, c, f and s).

> Your shape object is this:
>
>    0.357 0.608 0.835 rg
>    125.06 715.44 m
>    125.06 717.96 127.1 720 129.61 720 c
>    204 720 l
>    206.51 720 208.56 717.96 208.56 715.44 c
>    208.56 697.21 l
>    208.56 694.69 206.51 692.65 204 692.65 c
>    129.61 692.65 l
>    127.1 692.65 125.06 694.69 125.06 697.21 c
>    h
>    f*
>    1 w
>    0.255 0.443 0.612 RG
>    125.06 715.44 m
>    125.06 717.96 127.1 720 129.61 720 c
>    204 720 l
>    206.51 720 208.56 717.96 208.56 715.44 c
>    208.56 697.21 l
>    208.56 694.69 206.51 692.65 204 692.65 c
>    129.61 692.65 l
>    127.1 692.65 125.06 694.69 125.06 697.21 c
>    h
>    S
>
> The chart in the other file is more difficult to find, I didn't even try.
>
> Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to retrieve rectangle bounds for Chart element in the PDF document

Posted by Tilman Hausherr <TH...@t-online.de>.
There is no out-of-the-box solution for this (and the other posting). 
PDF is not a format that has a <TABLE>...</TABLE>  or <CHART>...</CHART> 
syntax. PDF is just graphics. You can get the lines / shapes with this:
https://stackoverflow.com/questions/38931422/pdfbox-2-0-2-calling-of-pagedrawer-processpage-method-caught-exceptions
However you'll still have to do something to find out where your table / 
chart is.

To get some understanding on how tricky this is, open your file with 
PDFDebugger and look at the "contents" part. The operators you see are 
explained in the PDF 32000 specification ( 
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf 
), in the segment "operator summary". (start with operators m, l, c, f 
and s).

Your shape object is this:

   0.357 0.608 0.835 rg
   125.06 715.44 m
   125.06 717.96 127.1 720 129.61 720 c
   204 720 l
   206.51 720 208.56 717.96 208.56 715.44 c
   208.56 697.21 l
   208.56 694.69 206.51 692.65 204 692.65 c
   129.61 692.65 l
   127.1 692.65 125.06 694.69 125.06 697.21 c
   h
   f*
   1 w
   0.255 0.443 0.612 RG
   125.06 715.44 m
   125.06 717.96 127.1 720 129.61 720 c
   204 720 l
   206.51 720 208.56 717.96 208.56 715.44 c
   208.56 697.21 l
   208.56 694.69 206.51 692.65 204 692.65 c
   129.61 692.65 l
   127.1 692.65 125.06 694.69 125.06 697.21 c
   h
   S

The chart in the other file is more difficult to find, I didn't even try.

Tilman

Am 23.11.2017 um 05:00 schrieb S S Satyanarayana Damarla:
> Looks like PDF document attachment didn't get through.
>
>   
>
> I have uploaded the PDF document at the following location:
>
> https://drive.google.com/file/d/1uYoQweCVbO4cNQiMnJuVjM1WZu7Cr7Ae/view
>
>   
>
> Please look into above link for accessing the PDF document that contains this Chart.
>
>   
>
> Appreciate any help on this.
>
>   
>
> Thanks,
>
> -Satya
>
>   
>
>   
>
> On 2017-11-22 18:06, S S Satyanarayana Damarla <HYPERLINK "mailto:s...@oracle.com"s...@oracle.com> wrote:
>
>> Hi,
>> I have attached a PDF document which contains a Chart.
>> For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.
>> Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.
>> Thanks
>> -Satya



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: How to retrieve rectangle bounds for Chart element in the PDF document

Posted by S S Satyanarayana Damarla <sa...@oracle.com>.
Looks like PDF document attachment didn't get through.

 

I have uploaded the PDF document at the following location:

https://drive.google.com/file/d/1uYoQweCVbO4cNQiMnJuVjM1WZu7Cr7Ae/view

 

Please look into above link for accessing the PDF document that contains this Chart.

 

Appreciate any help on this.

 

Thanks,

-Satya

 

 

On 2017-11-22 18:06, S S Satyanarayana Damarla <HYPERLINK "mailto:s...@oracle.com"s...@oracle.com> wrote:

> Hi,

> I have attached a PDF document which contains a Chart.

> 

> For our project, we need the ability to retrieve rectangle bounds for the Chart present in the attached PDF document. This chart is not recognized as image object (PDImageXObject). Looks like it is embedded in the content stream.

> 

> Appreciate if you can help me with a sample code in retrieving rectangle bounds for the chart present in the attached PDF document.

> 

> Thanks

> -Satya