You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Conlin, Joshua [USA]" <co...@bah.com> on 2016/08/24 16:22:59 UTC

Extracting non-form checkboxe values

Hello,


I am trying to extract checkbox values from a document where the acro form is null.  I have seen several previous inquiries to this scenario but haven't found a definitive answer.  I was wondering if there is a suggested approach?


Alternatively, Is there a way to extract a subsection of a PDF and create an image from that.  To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within  a page?  In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not.  Any help or insight you could provide would be appreciated.


Thanks,


Josh

Re: [External] Re: Extracting non-form checkboxe values

Posted by "Conlin, Joshua [USA]" <co...@bah.com>.
First off, Thanks for your quick reply and help.  I am new to PDFBox, and
am using version 2.0.1.  XFA is indeed unavailable.  I am unable to upload
a sample PDF due to privacy concerns.  I ran the PDFDebugger against this
file and it produced some output.  Here is the general structure for page
1 (which contains 48 check boxes):

Page:1
 [] Annots: (0)
 <<>> Contents: (2) [5 0 R]
     / Filter: FlateDecode
     84 Length: 7141
 []  MediaBox: (4)
     84 0: 0
     84 1: 0
     84 2: 612
     84 3: 792
 <<>> Parent: (4) [ 4 0 R] /T:Pages (not sure if more details is needed on
this)
 <<>> Resources: (2) [7 0 R]
       <<>> Font (4)
          <<>> TT1: (8) [8 0 R] /T:Font /S:TrueType
          <<>> TT2: (8) [9 0 R] /T:Font /S:TrueType

          <<>> TT3: (8) [10 0 R] /T:Font /S:TrueType

          <<>> TT4: (8) [11 0 R] /T:Font /S:TrueType

 []ProcSet: (2)
      / 0: PDF
      / 1: Text

I¹m sort of leaning towards the image capture idea but not sure where to
start (extracting a pDF subsection as an image). Any insight there?  Worst
case scenario I suppose I could export the entire page to an image and do
some analysis there.  The solution doesn¹t necessarily have to be
performant.

I¹d like to avoid using a separate OCR framework and just stick with
PDFBox if possible.

Thanks again for your help.

Josh

On 8/25/16, 1:21 AM, "Maruan Sahyoun" <sa...@fileaffairs.de> wrote:

>
>> Am 24.08.2016 um 19:24 schrieb Tilman Hausherr <TH...@t-online.de>:
>> 
>> Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
>>> Hello,
>>> 
>>> 
>>> I am trying to extract checkbox values from a document where the acro
>>>form is null.  I have seen several previous inquiries to this scenario
>>>but haven't found a definitive answer.  I was wondering if there is a
>>>suggested approach?
>> 
>> Maybe XFA?
>
>AFAIU if there is no acroform there will also be no XFA.
>
>Would it be possible to upload a sample PDF to a public location so we
>can take a look.
>
>BR
>
>Maruan
>
>> 
>> Tilman
>> 
>> 
>>> 
>>> 
>>> Alternatively, Is there a way to extract a subsection of a PDF and
>>>create an image from that.  To be clear I am not talking about
>>>extracting an image, but creating an image from a rectangle or similar
>>>area within  a page?  In this maybe naive approach I could extract the
>>>checkbox location as an image and determine if it is checked or not.
>>>Any help or insight you could provide would be appreciated.
>>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> Josh
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extracting non-form checkboxe values

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
> Am 24.08.2016 um 19:24 schrieb Tilman Hausherr <TH...@t-online.de>:
> 
> Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
>> Hello,
>> 
>> 
>> I am trying to extract checkbox values from a document where the acro form is null.  I have seen several previous inquiries to this scenario but haven't found a definitive answer.  I was wondering if there is a suggested approach?
> 
> Maybe XFA?

AFAIU if there is no acroform there will also be no XFA. 

Would it be possible to upload a sample PDF to a public location so we can take a look.

BR

Maruan

> 
> Tilman
> 
> 
>> 
>> 
>> Alternatively, Is there a way to extract a subsection of a PDF and create an image from that.  To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within  a page?  In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not.  Any help or insight you could provide would be appreciated.
>> 
>> 
>> Thanks,
>> 
>> 
>> Josh
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extracting non-form checkboxe values

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 24.08.2016 um 18:22 schrieb Conlin, Joshua [USA]:
> Hello,
>
>
> I am trying to extract checkbox values from a document where the acro form is null.  I have seen several previous inquiries to this scenario but haven't found a definitive answer.  I was wondering if there is a suggested approach?

Maybe XFA?

Tilman


>
>
> Alternatively, Is there a way to extract a subsection of a PDF and create an image from that.  To be clear I am not talking about extracting an image, but creating an image from a rectangle or similar area within  a page?  In this maybe naive approach I could extract the checkbox location as an image and determine if it is checked or not.  Any help or insight you could provide would be appreciated.
>
>
> Thanks,
>
>
> Josh
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org