You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Maruan Sahyoun <sa...@fileaffairs.de> on 2015/11/10 12:30:35 UTC

[DISCUSS] Enhance AcroForms functionality

Hi,

as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 

What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.

A similar approach could also be taken e.g. for signing a PDF ...

WDYT?

Maruan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [DISCUSS] Enhance AcroForms functionality

Posted by John Hewson <jo...@jahewson.com>.
> On 11 Nov 2015, at 09:03, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> 
>> 
>> Am 10.11.2015 um 22:16 schrieb John Hewson <jo...@jahewson.com>:
>> 
>> 
>>> On 10 Nov 2015, at 12:10, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>> 
>>>> 
>>>> Am 10.11.2015 um 19:19 schrieb John Hewson <jo...@jahewson.com>:
>>>> 
>>>> Correction: That’s how *PDFBox* is designed.
>>>> 
>>>>> On 10 Nov 2015, at 10:15, John Hewson <jo...@jahewson.com> wrote:
>>>>> 
>>>>>> 
>>>>>> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
>>>>>> 
>>>>>> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.
>>>>> 
>>>>> You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.
>>>>> 
>>>>> The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.
>>> 
>>> that's how it's currently designed which may or may not be the case moving forward. And we have a number of tools which work on a single document but are not part of PD such as ExtractImages, ExtractText or PDFSplit.
>> 
>> ExtractText and ExtractText are command lines tools, so of course they’re in the tools jar - but the logic which powers them is in PD. Same for PDFSplit, for the most part, though that one’s a bit messy. If you’re proposing to add a new command line tool, then follow this pattern, with a wrapper in ‘tools’ and the logic in PD.
>> 
>>> Some of them are base on individual packages such as o.a.p.text. So we do already have cases where functionality is not part of PD (e.g. we could have had PDDocument.extractText(), PDDocument.split()).
>> 
>> Again, text extraction logic is in PD, it’s just a wrapper which is elsewhere. Split is arguably a mess and not something we want to re-create.
>> 
>>> As an example we can have PDDocument.flatten() to flatten AcroForms and Annotations - would be in line with your thoughts and how PDFBox is currently (mainly) designed. And of course we can add PDDocument.refreshAppearances() … - my proposal is to not add that there but keep that in a separate class in a separate package. 
>> 
>> Actually I was thinking PDAcroForm.flatten().
> 
> I've already added that - but that also prompted me to think about not keeping to add new functionality there.

Ok, good, well that’s the place for it.

> 
>> 
>>> With the package name being used for more such (future) additions  e.g. o.a.p.services.appearance, o.a.p.services.signature …
>> 
>> We’ve had this exact discussion in the past. Packages *are not services*. APIs *are not services*. Services are daemons, web servers, etc.. APIs do not expose services.
> 
> I'm open to different names

It’s not just the names, it’s also the idea: moving core PDF manipulating functionality out of PD and into other packages would be a huge change in direction for PDFBox’s design, resulting in something which is no longer cohesive. For better or worse PDFBox is structured around manipulating PDF files and that’s not something which can be changed without introducing a messy inconsistency about where various APIs are to be found. There has to be some serious upsides to introducing a new and different way of doing things, especially when it doesn’’t fit naturally into the existing way - and I’m not seeing that here at all.

— John

>> 
>> — John
>> 
>>> BR
>>> Maruan
>>> 
>>>>> 
>>>>> — John
>>>>> 
>>>>>> A similar approach could also be taken e.g. for signing a PDF ...
>>>>>> 
>>>>>> WDYT?
>>>>>> 
>>>>>> Maruan
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>> <mailto:dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>>>
>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>> <mailto:dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>>>
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [DISCUSS] Enhance AcroForms functionality

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
> Am 10.11.2015 um 22:16 schrieb John Hewson <jo...@jahewson.com>:
> 
> 
>> On 10 Nov 2015, at 12:10, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>> 
>>> 
>>> Am 10.11.2015 um 19:19 schrieb John Hewson <jo...@jahewson.com>:
>>> 
>>> Correction: That’s how *PDFBox* is designed.
>>> 
>>>> On 10 Nov 2015, at 10:15, John Hewson <jo...@jahewson.com> wrote:
>>>> 
>>>>> 
>>>>> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
>>>>> 
>>>>> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.
>>>> 
>>>> You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.
>>>> 
>>>> The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.
>> 
>> that's how it's currently designed which may or may not be the case moving forward. And we have a number of tools which work on a single document but are not part of PD such as ExtractImages, ExtractText or PDFSplit.
> 
> ExtractText and ExtractText are command lines tools, so of course they’re in the tools jar - but the logic which powers them is in PD. Same for PDFSplit, for the most part, though that one’s a bit messy. If you’re proposing to add a new command line tool, then follow this pattern, with a wrapper in ‘tools’ and the logic in PD.
> 
>> Some of them are base on individual packages such as o.a.p.text. So we do already have cases where functionality is not part of PD (e.g. we could have had PDDocument.extractText(), PDDocument.split()).
> 
> Again, text extraction logic is in PD, it’s just a wrapper which is elsewhere. Split is arguably a mess and not something we want to re-create.
> 
>> As an example we can have PDDocument.flatten() to flatten AcroForms and Annotations - would be in line with your thoughts and how PDFBox is currently (mainly) designed. And of course we can add PDDocument.refreshAppearances() … - my proposal is to not add that there but keep that in a separate class in a separate package. 
> 
> Actually I was thinking PDAcroForm.flatten().

I've already added that - but that also prompted me to think about not keeping to add new functionality there.

> 
>> With the package name being used for more such (future) additions  e.g. o.a.p.services.appearance, o.a.p.services.signature …
> 
> We’ve had this exact discussion in the past. Packages *are not services*. APIs *are not services*. Services are daemons, web servers, etc.. APIs do not expose services.

I'm open to different names

> 
> — John
> 
>> BR
>> Maruan
>> 
>>>> 
>>>> — John
>>>> 
>>>>> A similar approach could also be taken e.g. for signing a PDF ...
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> Maruan
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>>
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>>
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [DISCUSS] Enhance AcroForms functionality

Posted by John Hewson <jo...@jahewson.com>.
> On 10 Nov 2015, at 12:10, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> 
>> 
>> Am 10.11.2015 um 19:19 schrieb John Hewson <jo...@jahewson.com>:
>> 
>> Correction: That’s how *PDFBox* is designed.
>> 
>>> On 10 Nov 2015, at 10:15, John Hewson <jo...@jahewson.com> wrote:
>>> 
>>>> 
>>>> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
>>>> 
>>>> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.
>>> 
>>> You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.
>>> 
>>> The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.
> 
> that's how it's currently designed which may or may not be the case moving forward. And we have a number of tools which work on a single document but are not part of PD such as ExtractImages, ExtractText or PDFSplit.

ExtractText and ExtractText are command lines tools, so of course they’re in the tools jar - but the logic which powers them is in PD. Same for PDFSplit, for the most part, though that one’s a bit messy. If you’re proposing to add a new command line tool, then follow this pattern, with a wrapper in ‘tools’ and the logic in PD.

> Some of them are base on individual packages such as o.a.p.text. So we do already have cases where functionality is not part of PD (e.g. we could have had PDDocument.extractText(), PDDocument.split()).

Again, text extraction logic is in PD, it’s just a wrapper which is elsewhere. Split is arguably a mess and not something we want to re-create.

> As an example we can have PDDocument.flatten() to flatten AcroForms and Annotations - would be in line with your thoughts and how PDFBox is currently (mainly) designed. And of course we can add PDDocument.refreshAppearances() … - my proposal is to not add that there but keep that in a separate class in a separate package. 

Actually I was thinking PDAcroForm.flatten().

> With the package name being used for more such (future) additions  e.g. o.a.p.services.appearance, o.a.p.services.signature …

We’ve had this exact discussion in the past. Packages *are not services*. APIs *are not services*. Services are daemons, web servers, etc.. APIs do not expose services.

— John

> BR
> Maruan
> 
>>> 
>>> — John
>>> 
>>>> A similar approach could also be taken e.g. for signing a PDF ...
>>>> 
>>>> WDYT?
>>>> 
>>>> Maruan
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>>
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org> <mailto:dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>

Re: [DISCUSS] Enhance AcroForms functionality

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
> Am 10.11.2015 um 19:19 schrieb John Hewson <jo...@jahewson.com>:
> 
> Correction: That’s how *PDFBox* is designed.
> 
>> On 10 Nov 2015, at 10:15, John Hewson <jo...@jahewson.com> wrote:
>> 
>>> 
>>> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>>> 
>>> Hi,
>>> 
>>> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
>>> 
>>> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.
>> 
>> You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.
>> 
>> The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.

that's how it's currently designed which may or may not be the case moving forward. And we have a number of tools which work on a single document but are not part of PD such as ExtractImages, ExtractText or PDFSplit. Some of them are base on individual packages such as o.a.p.text. So we do already have cases where functionality is not part of PD (e.g. we could have had PDDocument.extractText(), PDDocument.split()).

As an example we can have PDDocument.flatten() to flatten AcroForms and Annotations - would be in line with your thoughts and how PDFBox is currently (mainly) designed. And of course we can add PDDocument.refreshAppearances() … - my proposal is to not add that there but keep that in a separate class in a separate package. With the package name being used for more such (future) additions  e.g. o.a.p.services.appearance, o.a.p.services.signature …

BR
Maruan

>> 
>> — John
>> 
>>> A similar approach could also be taken e.g. for signing a PDF ...
>>> 
>>> WDYT?
>>> 
>>> Maruan
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [DISCUSS] Enhance AcroForms functionality

Posted by John Hewson <jo...@jahewson.com>.
Correction: That’s how *PDFBox* is designed.

> On 10 Nov 2015, at 10:15, John Hewson <jo...@jahewson.com> wrote:
> 
>> 
>> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>> 
>> Hi,
>> 
>> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
>> 
>> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.
> 
> You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.
> 
> The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.
> 
> — John
> 
>> A similar approach could also be taken e.g. for signing a PDF ...
>> 
>> WDYT?
>> 
>> Maruan
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org <ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org <ma...@pdfbox.apache.org>

Re: [DISCUSS] Enhance AcroForms functionality

Posted by John Hewson <jo...@jahewson.com>.
> On 10 Nov 2015, at 03:30, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> 
> Hi,
> 
> as discussed on http://stackoverflow.com/questions/33383389/pdfbox-how-can-a-pdacroform-be-flattened/33489651#33489651 now we have a flatten() method there is also the need to (re-) generate the appearances on demand. The same applies if we'd like to flatten annotations. With the current package and class structure that would go into PDAcroForm for interactive forms. 
> 
> What I'm proposing is - instead of adding to the PD model - have user case oriented functionality in a new package (services or so) so we have COS (abstraction of low level PDF elements), PD (abstraction of COS for PDF elements) and services (application of PD model to 'do' something with the PDF). As we add higher level functionality this would help us keeping the PD model clean.

You’re under-selling PD here. PD *is* a high-level abstraction, it’s not just a wrapper around COS, look at PDFont for example. PDDocument lets you ‘do’ something with a document, PDPage lets you ‘do’ something with a Page, and PDAcroForm lets you ‘do’ something with an acro form. That’s what PD is all about.

The only caveat is that PD is tied to a single document, so we recently introduced the “multipdf” package. But any functionality which manipulates a single PDF should be in PD. That’s how PDF is designed.

— John

> A similar approach could also be taken e.g. for signing a PDF ...
> 
> WDYT?
> 
> Maruan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org