You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "sahyoun@fileaffairs.de" <sa...@fileaffairs.de> on 2021/03/28 13:00:21 UTC

[DISCUSS] XMPBox

Fellow colleagues,

there was some discussion about the ability of XMPBox to parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through the various
specs for XMP and PDF/A as it stands now XMPBox in it's current
implementation is too restricted from the start as it not only per
default (although there is a way around it) only supports parsing
predefined XMP schemas restricted to the ones defined in PDF/A-1 but
also does some validation in the parsing phase.

Now, in order to get to an implementation for arbritary XMP that needs
to change with the validation for PDF/A-1 put on top. We could use the
existing implementation in a generalized way, use an existing Java XMP
parser such as Adobes XMPCore or approach it in a layered fashion XML -
> RDF -> XMP with supporting libs for that.

The other option would be to keep XMPBox as is and for general purpose
add a general parser into the project or simply refer to XMPCore.

That leads me to the question about the benefit of having a general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?

BR    
 
-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahyoun@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by "sahyoun@fileaffairs.de" <sa...@fileaffairs.de>.

quick addition - I'm happy to put the work into that if we think it's
worth the effort.

Maruan

Am Sonntag, dem 28.03.2021 um 15:00 +0200 schrieb
sahyoun@fileaffairs.de:
> Fellow colleagues,
> 
> there was some discussion about the ability of XMPBox to parse
> arbritary XMP which lead to PDFBOX-5128.
> 
> Now, after digging into the code and after reading through the
> various
> specs for XMP and PDF/A as it stands now XMPBox in it's current
> implementation is too restricted from the start as it not only per
> default (although there is a way around it) only supports parsing
> predefined XMP schemas restricted to the ones defined in PDF/A-1 but
> also does some validation in the parsing phase.
> 
> Now, in order to get to an implementation for arbritary XMP that
> needs
> to change with the validation for PDF/A-1 put on top. We could use
> the
> existing implementation in a generalized way, use an existing Java
> XMP
> parser such as Adobes XMPCore or approach it in a layered fashion XML
> -
> > RDF -> XMP with supporting libs for that.
> 
> The other option would be to keep XMPBox as is and for general
> purpose
> add a general parser into the project or simply refer to XMPCore.
> 
> That leads me to the question about the benefit of having a general
> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> 
> BR    
>  

-- 
-- 
Maruan Sahyoun



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by Guillaume Bailleul <gb...@gmail.com>.

Hi all,

When we wrote xmpbox, we tried to keep compatibility with the previous
jempbox. It had some limitations.

So some years ago, I needed a more open xmp implementation and I wrote
xemph [1]. You can have a look, I can rework on it if needed (I guess there
is an invalid dependency).

Regards,

[1] https://github.com/gbm-bailleul/xemph

Guillaume


Le dim. 28 mars 2021 à 19:37, Andreas Lehmkuehler <an...@lehmi.de> a
écrit :

> Am 28.03.21 um 19:27 schrieb sahyoun@fileaffairs.de:
> > Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:
> >> Am 28.03.2021 um 18:44 schrieb sahyoun@fileaffairs.de:
> >>> Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
> >>>> I don't have an opinion on XMP because I don't use it.
> >>> As XMP is needed for getting/setting metadata esp. since PDF 2.0
> >>> there
> >>> needs to be support for it - not neccesarily from us directly i.e.
> >>> we
> >>> could integrate a different lib.
> >>>
> >>> I'll revert the work done in PDFBOX-5128 and we get back to it
> >>> after
> >>> 3.0 - WDYT?
> >>
> >>
> >> No, why revert? As far as I understand it, it makes possible that
> >> XMPs
> >> with non standard schemas can still be parsed so that people can
> >> retrieve the standard stuff, so that is very useful.
> >
> > it's still very limited - I can keep it but as long as the XMP doesn't
> > conform to the (strict) initial parsing rules it will still fail. The
> > idea to revert was because of getting time to work on it (if we decide
> > to do so) or otherwise keep it in the state it has been before i.e.
> > targeted to PDF/A-1 conforming XMPs.
>
> I'm going to start a vote about the future of preflight after the release
> of the
> first RC for 3.0.0. Depending on the output we should think about a vote
> about
> the future of xmpbox as well.
>
> Let us see what happens and decide afterwards.
>
> Andreas
>
> >
> > BR
> > Maruan
> >
> >>
> >> Tilman
> >>
> >>
> >>
> >>>
> >>> BR
> >>> Maruan
> >>>
> >>>> Re preflight, I agree with you. It was great but it has hit a
> >>>> dead end,
> >>>> and VeraPDF is better because it is more flexible.
> >>>
> >>>> Tilman
> >>>>
> >>>> Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> >>>>> Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
> >>>>>> Fellow colleagues,
> >>>>>>
> >>>>>> there was some discussion about the ability of XMPBox to
> >>>>>> parse
> >>>>>> arbritary XMP which lead to PDFBOX-5128.
> >>>>>>
> >>>>>> Now, after digging into the code and after reading through
> >>>>>> the
> >>>>>> various
> >>>>>> specs for XMP and PDF/A as it stands now XMPBox in it's
> >>>>>> current
> >>>>>> implementation is too restricted from the start as it not
> >>>>>> only per
> >>>>>> default (although there is a way around it) only supports
> >>>>>> parsing
> >>>>>> predefined XMP schemas restricted to the ones defined in
> >>>>>> PDF/A-1
> >>>>>> but
> >>>>>> also does some validation in the parsing phase.
> >>>>> Exactly the point where I stopped some time ago, when trying to
> >>>>> just
> >>>>> expand the parser ;-)
> >>>>>
> >>>>>
> >>>>>> Now, in order to get to an implementation for arbritary XMP
> >>>>>> that
> >>>>>> needs
> >>>>>> to change with the validation for PDF/A-1 put on top. We
> >>>>>> could use
> >>>>>> the
> >>>>>> existing implementation in a generalized way, use an existing
> >>>>>> Java
> >>>>>> XMP
> >>>>>> parser such as Adobes XMPCore or approach it in a layered
> >>>>>> fashion
> >>>>>> XML -
> >>>>>>> RDF -> XMP with supporting libs for that.
> >>>>>> The other option would be to keep XMPBox as is and for
> >>>>>> general
> >>>>>> purpose
> >>>>>> add a general parser into the project or simply refer to
> >>>>>> XMPCore.
> >>>>>>
> >>>>>> That leads me to the question about the benefit of having a
> >>>>>> general
> >>>>>> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> >>>>> It replaced JempBox when preflight was added to PDFBox, saying
> >>>>> that,
> >>>>> it was a more or less historical reason.
> >>>>>
> >>>>> I myself never needed that XMP-stuff. It is used by TIKA and
> >>>>> preflight
> >>>>> and maybe others.
> >>>>>
> >>>>> I have to admit that I already thought about the future of
> >>>>> preflight.
> >>>>> I've planned to come up with that topic after releasing 3.0.0,
> >>>>> but
> >>>>> why
> >>>>> waiting.
> >>>>>
> >>>>> Preflight is part of PDFBox but is practically not maintained.
> >>>>> Preflight support is limited to A1B and I don't see anybody who
> >>>>> plans
> >>>>> to extend it. VeraPDF has a lot more to offer and is open
> >>>>> source as
> >>>>> well, so maybe a better alternative ...
> >>>>>
> >>>>> How about removing preflight with 4.0.0? This would remove the
> >>>>> one
> >>>>> and
> >>>>> only hard dependency of XMPBox, so that it would be easier to
> >>>>> decide
> >>>>> if we really need to maintain out own XMP lib.
> >>>>>
> >>>>>
> >>>>> Andreas
> >>>>>
> >>>>> ---------------------------------------------------------------
> >>>>> ------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> >>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
> >>>>>
> >>>>
> >>>> -----------------------------------------------------------------
> >>>> ----
> >>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> >>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: dev-help@pdfbox.apache.org
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>

Re: [DISCUSS] XMPBox

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Am 28.03.21 um 19:27 schrieb sahyoun@fileaffairs.de:
> Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:
>> Am 28.03.2021 um 18:44 schrieb sahyoun@fileaffairs.de:
>>> Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
>>>> I don't have an opinion on XMP because I don't use it.
>>> As XMP is needed for getting/setting metadata esp. since PDF 2.0
>>> there
>>> needs to be support for it - not neccesarily from us directly i.e.
>>> we
>>> could integrate a different lib.
>>>
>>> I'll revert the work done in PDFBOX-5128 and we get back to it
>>> after
>>> 3.0 - WDYT?
>>
>>
>> No, why revert? As far as I understand it, it makes possible that
>> XMPs
>> with non standard schemas can still be parsed so that people can
>> retrieve the standard stuff, so that is very useful.
> 
> it's still very limited - I can keep it but as long as the XMP doesn't
> conform to the (strict) initial parsing rules it will still fail. The
> idea to revert was because of getting time to work on it (if we decide
> to do so) or otherwise keep it in the state it has been before i.e.
> targeted to PDF/A-1 conforming XMPs.

I'm going to start a vote about the future of preflight after the release of the 
first RC for 3.0.0. Depending on the output we should think about a vote about 
the future of xmpbox as well.

Let us see what happens and decide afterwards.

Andreas

> 
> BR
> Maruan
> 
>>
>> Tilman
>>
>>
>>
>>>
>>> BR
>>> Maruan
>>>
>>>> Re preflight, I agree with you. It was great but it has hit a
>>>> dead end,
>>>> and VeraPDF is better because it is more flexible.
>>>
>>>> Tilman
>>>>
>>>> Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
>>>>> Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
>>>>>> Fellow colleagues,
>>>>>>
>>>>>> there was some discussion about the ability of XMPBox to
>>>>>> parse
>>>>>> arbritary XMP which lead to PDFBOX-5128.
>>>>>>
>>>>>> Now, after digging into the code and after reading through
>>>>>> the
>>>>>> various
>>>>>> specs for XMP and PDF/A as it stands now XMPBox in it's
>>>>>> current
>>>>>> implementation is too restricted from the start as it not
>>>>>> only per
>>>>>> default (although there is a way around it) only supports
>>>>>> parsing
>>>>>> predefined XMP schemas restricted to the ones defined in
>>>>>> PDF/A-1
>>>>>> but
>>>>>> also does some validation in the parsing phase.
>>>>> Exactly the point where I stopped some time ago, when trying to
>>>>> just
>>>>> expand the parser ;-)
>>>>>
>>>>>
>>>>>> Now, in order to get to an implementation for arbritary XMP
>>>>>> that
>>>>>> needs
>>>>>> to change with the validation for PDF/A-1 put on top. We
>>>>>> could use
>>>>>> the
>>>>>> existing implementation in a generalized way, use an existing
>>>>>> Java
>>>>>> XMP
>>>>>> parser such as Adobes XMPCore or approach it in a layered
>>>>>> fashion
>>>>>> XML -
>>>>>>> RDF -> XMP with supporting libs for that.
>>>>>> The other option would be to keep XMPBox as is and for
>>>>>> general
>>>>>> purpose
>>>>>> add a general parser into the project or simply refer to
>>>>>> XMPCore.
>>>>>>
>>>>>> That leads me to the question about the benefit of having a
>>>>>> general
>>>>>> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
>>>>> It replaced JempBox when preflight was added to PDFBox, saying
>>>>> that,
>>>>> it was a more or less historical reason.
>>>>>
>>>>> I myself never needed that XMP-stuff. It is used by TIKA and
>>>>> preflight
>>>>> and maybe others.
>>>>>
>>>>> I have to admit that I already thought about the future of
>>>>> preflight.
>>>>> I've planned to come up with that topic after releasing 3.0.0,
>>>>> but
>>>>> why
>>>>> waiting.
>>>>>
>>>>> Preflight is part of PDFBox but is practically not maintained.
>>>>> Preflight support is limited to A1B and I don't see anybody who
>>>>> plans
>>>>> to extend it. VeraPDF has a lot more to offer and is open
>>>>> source as
>>>>> well, so maybe a better alternative ...
>>>>>
>>>>> How about removing preflight with 4.0.0? This would remove the
>>>>> one
>>>>> and
>>>>> only hard dependency of XMPBox, so that it would be easier to
>>>>> decide
>>>>> if we really need to maintain out own XMP lib.
>>>>>
>>>>>
>>>>> Andreas
>>>>>
>>>>> ---------------------------------------------------------------
>>>>> ------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>
>>>>
>>>> -----------------------------------------------------------------
>>>> ----
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by "sahyoun@fileaffairs.de" <sa...@fileaffairs.de>.

Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:
> Am 28.03.2021 um 18:44 schrieb sahyoun@fileaffairs.de:
> > Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
> > > I don't have an opinion on XMP because I don't use it.
> > As XMP is needed for getting/setting metadata esp. since PDF 2.0
> > there
> > needs to be support for it - not neccesarily from us directly i.e.
> > we
> > could integrate a different lib.
> > 
> > I'll revert the work done in PDFBOX-5128 and we get back to it
> > after
> > 3.0 - WDYT?
> 
> 
> No, why revert? As far as I understand it, it makes possible that
> XMPs 
> with non standard schemas can still be parsed so that people can 
> retrieve the standard stuff, so that is very useful.

it's still very limited - I can keep it but as long as the XMP doesn't
conform to the (strict) initial parsing rules it will still fail. The
idea to revert was because of getting time to work on it (if we decide
to do so) or otherwise keep it in the state it has been before i.e.
targeted to PDF/A-1 conforming XMPs.

BR
Maruan

> 
> Tilman
> 
> 
> 
> > 
> > BR
> > Maruan
> > 
> > > Re preflight, I agree with you. It was great but it has hit a
> > > dead end,
> > > and VeraPDF is better because it is more flexible.
> > 
> > > Tilman
> > > 
> > > Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> > > > Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
> > > > > Fellow colleagues,
> > > > > 
> > > > > there was some discussion about the ability of XMPBox to
> > > > > parse
> > > > > arbritary XMP which lead to PDFBOX-5128.
> > > > > 
> > > > > Now, after digging into the code and after reading through
> > > > > the
> > > > > various
> > > > > specs for XMP and PDF/A as it stands now XMPBox in it's
> > > > > current
> > > > > implementation is too restricted from the start as it not
> > > > > only per
> > > > > default (although there is a way around it) only supports
> > > > > parsing
> > > > > predefined XMP schemas restricted to the ones defined in
> > > > > PDF/A-1
> > > > > but
> > > > > also does some validation in the parsing phase.
> > > > Exactly the point where I stopped some time ago, when trying to
> > > > just
> > > > expand the parser ;-)
> > > > 
> > > > 
> > > > > Now, in order to get to an implementation for arbritary XMP
> > > > > that
> > > > > needs
> > > > > to change with the validation for PDF/A-1 put on top. We
> > > > > could use
> > > > > the
> > > > > existing implementation in a generalized way, use an existing
> > > > > Java
> > > > > XMP
> > > > > parser such as Adobes XMPCore or approach it in a layered
> > > > > fashion
> > > > > XML -
> > > > > > RDF -> XMP with supporting libs for that.
> > > > > The other option would be to keep XMPBox as is and for
> > > > > general
> > > > > purpose
> > > > > add a general parser into the project or simply refer to
> > > > > XMPCore.
> > > > > 
> > > > > That leads me to the question about the benefit of having a
> > > > > general
> > > > > purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> > > > It replaced JempBox when preflight was added to PDFBox, saying
> > > > that,
> > > > it was a more or less historical reason.
> > > > 
> > > > I myself never needed that XMP-stuff. It is used by TIKA and
> > > > preflight
> > > > and maybe others.
> > > > 
> > > > I have to admit that I already thought about the future of
> > > > preflight.
> > > > I've planned to come up with that topic after releasing 3.0.0,
> > > > but
> > > > why
> > > > waiting.
> > > > 
> > > > Preflight is part of PDFBox but is practically not maintained.
> > > > Preflight support is limited to A1B and I don't see anybody who
> > > > plans
> > > > to extend it. VeraPDF has a lot more to offer and is open
> > > > source as
> > > > well, so maybe a better alternative ...
> > > > 
> > > > How about removing preflight with 4.0.0? This would remove the
> > > > one
> > > > and
> > > > only hard dependency of XMPBox, so that it would be easier to
> > > > decide
> > > > if we really need to maintain out own XMP lib.
> > > > 
> > > > 
> > > > Andreas
> > > > 
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > > > For additional commands, e-mail: dev-help@pdfbox.apache.org
> > > > 
> > > 
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > > For additional commands, e-mail: dev-help@pdfbox.apache.org
> > > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahyoun@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 28.03.2021 um 18:44 schrieb sahyoun@fileaffairs.de:
> Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
>> I don't have an opinion on XMP because I don't use it.
> As XMP is needed for getting/setting metadata esp. since PDF 2.0 there
> needs to be support for it - not neccesarily from us directly i.e. we
> could integrate a different lib.
>
> I'll revert the work done in PDFBOX-5128 and we get back to it after
> 3.0 - WDYT?


No, why revert? As far as I understand it, it makes possible that XMPs 
with non standard schemas can still be parsed so that people can 
retrieve the standard stuff, so that is very useful.

Tilman



>
> BR
> Maruan
>
>> Re preflight, I agree with you. It was great but it has hit a dead end,
>> and VeraPDF is better because it is more flexible.
>
>> Tilman
>>
>> Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
>>> Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
>>>> Fellow colleagues,
>>>>
>>>> there was some discussion about the ability of XMPBox to parse
>>>> arbritary XMP which lead to PDFBOX-5128.
>>>>
>>>> Now, after digging into the code and after reading through the
>>>> various
>>>> specs for XMP and PDF/A as it stands now XMPBox in it's current
>>>> implementation is too restricted from the start as it not only per
>>>> default (although there is a way around it) only supports parsing
>>>> predefined XMP schemas restricted to the ones defined in PDF/A-1
>>>> but
>>>> also does some validation in the parsing phase.
>>> Exactly the point where I stopped some time ago, when trying to just
>>> expand the parser ;-)
>>>
>>>
>>>> Now, in order to get to an implementation for arbritary XMP that
>>>> needs
>>>> to change with the validation for PDF/A-1 put on top. We could use
>>>> the
>>>> existing implementation in a generalized way, use an existing Java
>>>> XMP
>>>> parser such as Adobes XMPCore or approach it in a layered fashion
>>>> XML -
>>>>> RDF -> XMP with supporting libs for that.
>>>> The other option would be to keep XMPBox as is and for general
>>>> purpose
>>>> add a general parser into the project or simply refer to XMPCore.
>>>>
>>>> That leads me to the question about the benefit of having a general
>>>> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
>>> It replaced JempBox when preflight was added to PDFBox, saying that,
>>> it was a more or less historical reason.
>>>
>>> I myself never needed that XMP-stuff. It is used by TIKA and
>>> preflight
>>> and maybe others.
>>>
>>> I have to admit that I already thought about the future of preflight.
>>> I've planned to come up with that topic after releasing 3.0.0, but
>>> why
>>> waiting.
>>>
>>> Preflight is part of PDFBox but is practically not maintained.
>>> Preflight support is limited to A1B and I don't see anybody who plans
>>> to extend it. VeraPDF has a lot more to offer and is open source as
>>> well, so maybe a better alternative ...
>>>
>>> How about removing preflight with 4.0.0? This would remove the one
>>> and
>>> only hard dependency of XMPBox, so that it would be easier to decide
>>> if we really need to maintain out own XMP lib.
>>>
>>>
>>> Andreas
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by "sahyoun@fileaffairs.de" <sa...@fileaffairs.de>.

Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
> I don't have an opinion on XMP because I don't use it.

As XMP is needed for getting/setting metadata esp. since PDF 2.0 there
needs to be support for it - not neccesarily from us directly i.e. we
could integrate a different lib. 

I'll revert the work done in PDFBOX-5128 and we get back to it after
3.0 - WDYT?

BR
Maruan

> 
> Re preflight, I agree with you. It was great but it has hit a dead end,
> and VeraPDF is better because it is more flexible.


> 
> Tilman
> 
> Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> > Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
> > > Fellow colleagues,
> > > 
> > > there was some discussion about the ability of XMPBox to parse
> > > arbritary XMP which lead to PDFBOX-5128.
> > > 
> > > Now, after digging into the code and after reading through the
> > > various
> > > specs for XMP and PDF/A as it stands now XMPBox in it's current
> > > implementation is too restricted from the start as it not only per
> > > default (although there is a way around it) only supports parsing
> > > predefined XMP schemas restricted to the ones defined in PDF/A-1
> > > but
> > > also does some validation in the parsing phase.
> > Exactly the point where I stopped some time ago, when trying to just 
> > expand the parser ;-)
> > 
> > 
> > > Now, in order to get to an implementation for arbritary XMP that
> > > needs
> > > to change with the validation for PDF/A-1 put on top. We could use
> > > the
> > > existing implementation in a generalized way, use an existing Java
> > > XMP
> > > parser such as Adobes XMPCore or approach it in a layered fashion
> > > XML -
> > > > RDF -> XMP with supporting libs for that.
> > > 
> > > The other option would be to keep XMPBox as is and for general
> > > purpose
> > > add a general parser into the project or simply refer to XMPCore.
> > > 
> > > That leads me to the question about the benefit of having a general
> > > purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> > It replaced JempBox when preflight was added to PDFBox, saying that, 
> > it was a more or less historical reason.
> > 
> > I myself never needed that XMP-stuff. It is used by TIKA and
> > preflight 
> > and maybe others.
> > 
> > I have to admit that I already thought about the future of preflight.
> > I've planned to come up with that topic after releasing 3.0.0, but
> > why 
> > waiting.
> > 
> > Preflight is part of PDFBox but is practically not maintained. 
> > Preflight support is limited to A1B and I don't see anybody who plans
> > to extend it. VeraPDF has a lot more to offer and is open source as
> > well, so maybe a better alternative ...
> > 
> > How about removing preflight with 4.0.0? This would remove the one
> > and 
> > only hard dependency of XMPBox, so that it would be easier to decide 
> > if we really need to maintain out own XMP lib.
> > 
> > 
> > Andreas
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahyoun@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by Tilman Hausherr <TH...@t-online.de>.

I don't have an opinion on XMP because I don't use it.

Re preflight, I agree with you. It was great but it has hit a dead end, 
and VeraPDF is better because it is more flexible.

Tilman

Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
>> Fellow colleagues,
>>
>> there was some discussion about the ability of XMPBox to parse
>> arbritary XMP which lead to PDFBOX-5128.
>>
>> Now, after digging into the code and after reading through the various
>> specs for XMP and PDF/A as it stands now XMPBox in it's current
>> implementation is too restricted from the start as it not only per
>> default (although there is a way around it) only supports parsing
>> predefined XMP schemas restricted to the ones defined in PDF/A-1 but
>> also does some validation in the parsing phase.
> Exactly the point where I stopped some time ago, when trying to just 
> expand the parser ;-)
>
>
>> Now, in order to get to an implementation for arbritary XMP that needs
>> to change with the validation for PDF/A-1 put on top. We could use the
>> existing implementation in a generalized way, use an existing Java XMP
>> parser such as Adobes XMPCore or approach it in a layered fashion XML -
>>> RDF -> XMP with supporting libs for that.
>>
>> The other option would be to keep XMPBox as is and for general purpose
>> add a general parser into the project or simply refer to XMPCore.
>>
>> That leads me to the question about the benefit of having a general
>> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> It replaced JempBox when preflight was added to PDFBox, saying that, 
> it was a more or less historical reason.
>
> I myself never needed that XMP-stuff. It is used by TIKA and preflight 
> and maybe others.
>
> I have to admit that I already thought about the future of preflight. 
> I've planned to come up with that topic after releasing 3.0.0, but why 
> waiting.
>
> Preflight is part of PDFBox but is practically not maintained. 
> Preflight support is limited to A1B and I don't see anybody who plans 
> to extend it. VeraPDF has a lot more to offer and is open source as 
> well, so maybe a better alternative ...
>
> How about removing preflight with 4.0.0? This would remove the one and 
> only hard dependency of XMPBox, so that it would be easier to decide 
> if we really need to maintain out own XMP lib.
>
>
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: [DISCUSS] XMPBox

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Am 28.03.21 um 15:00 schrieb sahyoun@fileaffairs.de:
> Fellow colleagues,
> 
> there was some discussion about the ability of XMPBox to parse
> arbritary XMP which lead to PDFBOX-5128.
> 
> Now, after digging into the code and after reading through the various
> specs for XMP and PDF/A as it stands now XMPBox in it's current
> implementation is too restricted from the start as it not only per
> default (although there is a way around it) only supports parsing
> predefined XMP schemas restricted to the ones defined in PDF/A-1 but
> also does some validation in the parsing phase.
Exactly the point where I stopped some time ago, when trying to just expand the 
parser ;-)


> Now, in order to get to an implementation for arbritary XMP that needs
> to change with the validation for PDF/A-1 put on top. We could use the
> existing implementation in a generalized way, use an existing Java XMP
> parser such as Adobes XMPCore or approach it in a layered fashion XML -
>> RDF -> XMP with supporting libs for that.
> 
> The other option would be to keep XMPBox as is and for general purpose
> add a general parser into the project or simply refer to XMPCore.
> 
> That leads me to the question about the benefit of having a general
> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
It replaced JempBox when preflight was added to PDFBox, saying that, it was a 
more or less historical reason.

I myself never needed that XMP-stuff. It is used by TIKA and preflight and maybe 
others.

I have to admit that I already thought about the future of preflight. I've 
planned to come up with that topic after releasing 3.0.0, but why waiting.

Preflight is part of PDFBox but is practically not maintained. Preflight support 
is limited to A1B and I don't see anybody who plans to extend it. VeraPDF has a 
lot more to offer and is open source as well, so maybe a better alternative ...

How about removing preflight with 4.0.0? This would remove the one and only hard 
dependency of XMPBox, so that it would be easier to decide if we really need to 
maintain out own XMP lib.


Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org