You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Conlin, Joshua [USA]" <co...@bah.com> on 2016/08/24 16:22:59 UTC
Extracting non-form checkboxe values
Hello,
I am trying to extract checkbox values from a document where the acro form is null. I have seen several previous inquiries to this scenario but haven't found a definitive answer. I was wondering if there is a suggested approach?
Alternatively, Is there a way to extract a subsection of a PDF and create an image from that. To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within a page? In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not. Any help or insight you could provide would be appreciated.
Thanks,
Josh
Re: [External] Re: Extracting non-form checkboxe values
Posted by "Conlin, Joshua [USA]" <co...@bah.com>.
First off, Thanks for your quick reply and help. I am new to PDFBox, and
am using version 2.0.1. XFA is indeed unavailable. I am unable to upload
a sample PDF due to privacy concerns. I ran the PDFDebugger against this
file and it produced some output. Here is the general structure for page
1 (which contains 48 check boxes):
Page:1
[] Annots: (0)
<<>> Contents: (2) [5 0 R]
/ Filter: FlateDecode
84 Length: 7141
[] MediaBox: (4)
84 0: 0
84 1: 0
84 2: 612
84 3: 792
<<>> Parent: (4) [ 4 0 R] /T:Pages (not sure if more details is needed on
this)
<<>> Resources: (2) [7 0 R]
<<>> Font (4)
<<>> TT1: (8) [8 0 R] /T:Font /S:TrueType
<<>> TT2: (8) [9 0 R] /T:Font /S:TrueType
<<>> TT3: (8) [10 0 R] /T:Font /S:TrueType
<<>> TT4: (8) [11 0 R] /T:Font /S:TrueType
[]ProcSet: (2)
/ 0: PDF
/ 1: Text
I¹m sort of leaning towards the image capture idea but not sure where to
start (extracting a pDF subsection as an image). Any insight there? Worst
case scenario I suppose I could export the entire page to an image and do
some analysis there. The solution doesn¹t necessarily have to be
performant.
I¹d like to avoid using a separate OCR framework and just stick with
PDFBox if possible.
Thanks again for your help.
Josh
On 8/25/16, 1:21 AM, "Maruan Sahyoun" <sa...@fileaffairs.de> wrote:
>
>> Am 24.08.2016 um 19:24 schrieb Tilman Hausherr <TH...@t-online.de>:
>>
>> Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
>>> Hello,
>>>
>>>
>>> I am trying to extract checkbox values from a document where the acro
>>>form is null. I have seen several previous inquiries to this scenario
>>>but haven't found a definitive answer. I was wondering if there is a
>>>suggested approach?
>>
>> Maybe XFA?
>
>AFAIU if there is no acroform there will also be no XFA.
>
>Would it be possible to upload a sample PDF to a public location so we
>can take a look.
>
>BR
>
>Maruan
>
>>
>> Tilman
>>
>>
>>>
>>>
>>> Alternatively, Is there a way to extract a subsection of a PDF and
>>>create an image from that. To be clear I am not talking about
>>>extracting an image, but creating an image from a rectangle or similar
>>>area within a page? In this maybe naive approach I could extract the
>>>checkbox location as an image and determine if it is checked or not.
>>>Any help or insight you could provide would be appreciated.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Josh
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Extracting non-form checkboxe values
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
> Am 24.08.2016 um 19:24 schrieb Tilman Hausherr <TH...@t-online.de>:
>
> Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
>> Hello,
>>
>>
>> I am trying to extract checkbox values from a document where the acro form is null. I have seen several previous inquiries to this scenario but haven't found a definitive answer. I was wondering if there is a suggested approach?
>
> Maybe XFA?
AFAIU if there is no acroform there will also be no XFA.
Would it be possible to upload a sample PDF to a public location so we can take a look.
BR
Maruan
>
> Tilman
>
>
>>
>>
>> Alternatively, Is there a way to extract a subsection of a PDF and create an image from that. To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within a page? In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not. Any help or insight you could provide would be appreciated.
>>
>>
>> Thanks,
>>
>>
>> Josh
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Extracting non-form checkboxe values
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
> Hello,
>
>
> I am trying to extract checkbox values from a document where the acro form is null. I have seen several previous inquiries to this scenario but haven't found a definitive answer. I was wondering if there is a suggested approach?
Maybe XFA?
Tilman
>
>
> Alternatively, Is there a way to extract a subsection of a PDF and create an image from that. To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within a page? In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not. Any help or insight you could provide would be appreciated.
>
>
> Thanks,
>
>
> Josh
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org