You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Kevin Ternes <KT...@thegeneral.com> on 2016/03/29 19:54:02 UTC

How to manipulate a pdf object

I have successfully updated form widgets on pre-existing PDFs.
But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
But how do I even get a reference to an existing object?
Viewing the document in Acrobat does not give me a clue as to what the object might even be called.

PDFBox-2.0.0


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to manipulate a pdf object

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

> Am 29.03.2016 um 20:46 schrieb Kevin Ternes <KT...@thegeneral.com>:
> 
> Maruan and Tilman,
> I think you have answered my question--that I am basically out of luck.
> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
> 
> But here is the use case:
> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
> 
> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
> 
> I wanted to write a quick command line program to do this.
> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
> 
> Any suggestions appreciated.

Would you be able to share a PDF to take a closer look?

BR
Maruan 

> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: Tuesday, March 29, 2016 1:06 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
> 
> Hi,
> 
>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>> 
>> I have successfully updated form widgets on pre-existing PDFs.
>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>> But how do I even get a reference to an existing object?
> 
> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
> 
> Maybe you can elaborate a little more on your use case.
> 
> BR
> Maruan
> 
>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>> 
>> PDFBox-2.0.0
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: How to manipulate a pdf object

Posted by Gary Grosso <ga...@oberontech.com>.
In case it helps, here's my Windows "shortcut"

C:\Windows\System32\cmd.exe /k java -jar C:\Users\gary.grosso\Downloads\PDFBox\pdfbox-app-2.0.0-RC1.jar PDFDebugger && exit

It's possible you only need to add ".\ " in front of " PDFDebugger" (or would that be "./"?) but the shortcut is much handier IMO.


Gary



-----Original Message-----
From: Kevin Ternes [mailto:KTernes@thegeneral.com] 
Sent: Tuesday, March 29, 2016 3:22 PM
To: users@pdfbox.apache.org
Subject: RE: How to manipulate a pdf object

Thanks guys. 
Also, I meant to add in my last email that I was not able to find the PDFDebugger.
My best effort was:

   C:\Users\ntiskt02\Downloads>java -jar pdfbox-2.0.0.jar PDFDebugger RenewalFaxCover_MN_MP.pdf
   no main manifest attribute, in pdfbox-2.0.0.jar

Am I missing something?


-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, March 29, 2016 2:09 PM
To: users@pdfbox.apache.org
Subject: Re: How to manipulate a pdf object

Am 29.03.2016 um 20:46 schrieb Kevin Ternes:
> Maruan and Tilman,
> I think you have answered my question--that I am basically out of luck.
> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
>
> But here is the use case:
> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
>
> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
>
> I wanted to write a quick command line program to do this.

Likely won't be possible. What I do is to run the WriteDecodedDoc command line utility and then do the changes manually. However you need to understand the PDF operators and the sizes of the content streams should not change, i.e. all object positions must stay the same.

Alternatively, get Acrobat Professional.

Tilman

> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
>
> Any suggestions appreciated.
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Tuesday, March 29, 2016 1:06 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
>
> Hi,
>
>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>>
>> I have successfully updated form widgets on pre-existing PDFs.
>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>> But how do I even get a reference to an existing object?
> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
>
> Maybe you can elaborate a little more on your use case.
>
> BR
> Maruan
>
>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>>
>> PDFBox-2.0.0
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: How to manipulate a pdf object

Posted by Kevin Ternes <KT...@thegeneral.com>.
Got it.  Whoah!  That is just freaking slick.

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, March 29, 2016 3:57 PM
To: users@pdfbox.apache.org
Subject: Re: How to manipulate a pdf object

Am 29.03.2016 um 21:21 schrieb Kevin Ternes:
> Thanks guys.
> Also, I meant to add in my last email that I was not able to find the PDFDebugger.
> My best effort was:
>
>     C:\Users\ntiskt02\Downloads>java -jar pdfbox-2.0.0.jar PDFDebugger RenewalFaxCover_MN_MP.pdf
>     no main manifest attribute, in pdfbox-2.0.0.jar

It's a separate download from the download page.

Tilman

>
> Am I missing something?
>
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, March 29, 2016 2:09 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
>
> Am 29.03.2016 um 20:46 schrieb Kevin Ternes:
>> Maruan and Tilman,
>> I think you have answered my question--that I am basically out of luck.
>> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
>>
>> But here is the use case:
>> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
>>
>> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
>>
>> I wanted to write a quick command line program to do this.
> Likely won't be possible. What I do is to run the WriteDecodedDoc command line utility and then do the changes manually. However you need to understand the PDF operators and the sizes of the content streams should not change, i.e. all object positions must stay the same.
>
> Alternatively, get Acrobat Professional.
>
> Tilman
>
>> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
>>
>> Any suggestions appreciated.
>>
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>> Sent: Tuesday, March 29, 2016 1:06 PM
>> To: users@pdfbox.apache.org
>> Subject: Re: How to manipulate a pdf object
>>
>> Hi,
>>
>>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>>>
>>> I have successfully updated form widgets on pre-existing PDFs.
>>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>>> But how do I even get a reference to an existing object?
>> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
>>
>> Maybe you can elaborate a little more on your use case.
>>
>> BR
>> Maruan
>>
>>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>>>
>>> PDFBox-2.0.0
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to manipulate a pdf object

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 29.03.2016 um 21:21 schrieb Kevin Ternes:
> Thanks guys.
> Also, I meant to add in my last email that I was not able to find the PDFDebugger.
> My best effort was:
>
>     C:\Users\ntiskt02\Downloads>java -jar pdfbox-2.0.0.jar PDFDebugger RenewalFaxCover_MN_MP.pdf
>     no main manifest attribute, in pdfbox-2.0.0.jar

It's a separate download from the download page.

Tilman

>
> Am I missing something?
>
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, March 29, 2016 2:09 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
>
> Am 29.03.2016 um 20:46 schrieb Kevin Ternes:
>> Maruan and Tilman,
>> I think you have answered my question--that I am basically out of luck.
>> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
>>
>> But here is the use case:
>> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
>>
>> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
>>
>> I wanted to write a quick command line program to do this.
> Likely won't be possible. What I do is to run the WriteDecodedDoc command line utility and then do the changes manually. However you need to understand the PDF operators and the sizes of the content streams should not change, i.e. all object positions must stay the same.
>
> Alternatively, get Acrobat Professional.
>
> Tilman
>
>> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
>>
>> Any suggestions appreciated.
>>
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>> Sent: Tuesday, March 29, 2016 1:06 PM
>> To: users@pdfbox.apache.org
>> Subject: Re: How to manipulate a pdf object
>>
>> Hi,
>>
>>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>>>
>>> I have successfully updated form widgets on pre-existing PDFs.
>>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>>> But how do I even get a reference to an existing object?
>> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
>>
>> Maybe you can elaborate a little more on your use case.
>>
>> BR
>> Maruan
>>
>>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>>>
>>> PDFBox-2.0.0
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: How to manipulate a pdf object

Posted by Kevin Ternes <KT...@thegeneral.com>.
Thanks guys. 
Also, I meant to add in my last email that I was not able to find the PDFDebugger.
My best effort was:

   C:\Users\ntiskt02\Downloads>java -jar pdfbox-2.0.0.jar PDFDebugger RenewalFaxCover_MN_MP.pdf
   no main manifest attribute, in pdfbox-2.0.0.jar

Am I missing something?


-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, March 29, 2016 2:09 PM
To: users@pdfbox.apache.org
Subject: Re: How to manipulate a pdf object

Am 29.03.2016 um 20:46 schrieb Kevin Ternes:
> Maruan and Tilman,
> I think you have answered my question--that I am basically out of luck.
> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
>
> But here is the use case:
> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
>
> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
>
> I wanted to write a quick command line program to do this.

Likely won't be possible. What I do is to run the WriteDecodedDoc command line utility and then do the changes manually. However you need to understand the PDF operators and the sizes of the content streams should not change, i.e. all object positions must stay the same.

Alternatively, get Acrobat Professional.

Tilman

> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
>
> Any suggestions appreciated.
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Tuesday, March 29, 2016 1:06 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
>
> Hi,
>
>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>>
>> I have successfully updated form widgets on pre-existing PDFs.
>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>> But how do I even get a reference to an existing object?
> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
>
> Maybe you can elaborate a little more on your use case.
>
> BR
> Maruan
>
>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>>
>> PDFBox-2.0.0
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to manipulate a pdf object

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 29.03.2016 um 20:46 schrieb Kevin Ternes:
> Maruan and Tilman,
> I think you have answered my question--that I am basically out of luck.
> I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.
>
> But here is the use case:
> I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .
>
> However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.
>
> I wanted to write a quick command line program to do this.

Likely won't be possible. What I do is to run the WriteDecodedDoc 
command line utility and then do the changes manually. However you need 
to understand the PDF operators and the sizes of the content streams 
should not change, i.e. all object positions must stay the same.

Alternatively, get Acrobat Professional.

Tilman

> I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.
>
> Any suggestions appreciated.
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Tuesday, March 29, 2016 1:06 PM
> To: users@pdfbox.apache.org
> Subject: Re: How to manipulate a pdf object
>
> Hi,
>
>> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
>>
>> I have successfully updated form widgets on pre-existing PDFs.
>> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
>> But how do I even get a reference to an existing object?
> What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.
>
> Maybe you can elaborate a little more on your use case.
>
> BR
> Maruan
>
>> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>>
>> PDFBox-2.0.0
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: How to manipulate a pdf object

Posted by Kevin Ternes <KT...@thegeneral.com>.
Maruan and Tilman,
I think you have answered my question--that I am basically out of luck.
I already ran one through the usual PDF-Tools Debugger but it did not tell me anything that I thought was useful.  I also tried looking at the PDF under Acrobat's preflight.

But here is the use case:
I have a large number of PDF "templates" that in our usual business process, we use PDFBox to load, set form field values, add images, merge, flatten, protect, . . .

However, it turns out that the specification for many of these templates has changed so that a piece of text needs to be moved slightly up, a cm to the left and have the font size changed.  Then there are some places where someone drew lines around hundreds of form checkboxes!!!  So while I'm at it I'd like to delete those lines and set the form field widgets to have a border.

I wanted to write a quick command line program to do this.
I estimate that to do this one-pdf-at-a-time would take 10-20 hours.  That would not be a problem except that we don't have an intern.

Any suggestions appreciated.

-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: Tuesday, March 29, 2016 1:06 PM
To: users@pdfbox.apache.org
Subject: Re: How to manipulate a pdf object

Hi,

> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
> 
> I have successfully updated form widgets on pre-existing PDFs.
> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
> But how do I even get a reference to an existing object?

What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word ... could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.

Maybe you can elaborate a little more on your use case.

BR
Maruan

> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
> 
> PDFBox-2.0.0


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to manipulate a pdf object

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

> Am 29.03.2016 um 19:54 schrieb Kevin Ternes <KT...@thegeneral.com>:
> 
> I have successfully updated form widgets on pre-existing PDFs.
> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
> But how do I even get a reference to an existing object?

What is it that you are trying to achieve? You can parse an existing content stream and look for individual tokens. But there is no guarantee that, what your are calling a box of text, is treated like that in the PDF as there is no such concept. E.g. individual lines, word, characters forming a word … could be placed individually in different operations. It even might not be text but a vector or bitmap image. Your best bet is to look into the content using the PDFDebugger and see if you can identify the parts you are looking for.

Maybe you can elaborate a little more on your use case.

BR
Maruan

> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
> 
> PDFBox-2.0.0
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to manipulate a pdf object

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 29.03.2016 um 19:54 schrieb Kevin Ternes:
> I have successfully updated form widgets on pre-existing PDFs.
> But what about ordinary non-form objects like a box of text?  I can add NEW objects to the PDPageContentStream.
> But how do I even get a reference to an existing object?
> Viewing the document in Acrobat does not give me a clue as to what the object might even be called.
>
> PDFBox-2.0.0
>

Your question is somewhat broad. To get an idea how a PDF is made, 
please view a PDF in PDFDebugger. A "box of text" is not something that 
you can "reference" if it is in the content stream. It is just some 
vector graphics somewhere.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org