You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2011/12/01 12:12:05 UTC

Re: Merging jempbox and xmpbox

Hi,

Am 30.11.2011 20:49, schrieb Guillaume Bailleul:
> Hi all,
>
> I need opinions and ideas.
>
> Today, there are 2 implementations of xmp parsers in PDFBox. Jempbox,
> the historical one and Xmpbox which joined the project stuck to
> preflight.
>
> When we developed preflight in 2010, we started using Jempbox. But
> after some patches, we decided to build an other library because there
> was too many things to change to be able to do strict validation, and
> we add to few time to make it in JempBox. Xmpbox was really inspire by
> Jempbox so making one of the two should be possible.
>
> Today, the is no released version of "apache xmpbox" because it was
> added after version 1.6.0. It could be great if the merge is finished
> before 1.7.0.
>
> If some of you could read through the source codes and give some idea
> on the methodology it would help. I really don't known what is the
> best way to do it.
If xmpbox provides the same functionality than jempbox, we can use it as drop-in 
replacement for jempbox.

> Having some notice on how jempbox is used today could help.
I guess the class o.a.p.jempbox.xmp.XMPMetaData is a good point to start. Is is 
used in

- org.apache.pdfbox.examples.pdmodel.AddMetadataFromDocInfo
- org.apache.pdfbox.examples.pdmodel.ExtractMetadata
- org.apache.pdfbox.pdmodel.common.PDMetaData

> Thanks for your help
>
> Guillaume

BR
Andreas Lehmkühler

Re: Merging jempbox and xmpbox

Posted by Guillaume Bailleul <gb...@gmail.com>.
Hi,

What is meant with "drop in" replacement ?

Functionality are the same in xmpbox and jempbox. I guess there are
more things in xmpbox, for instance it is easily possible to have
"user namespaces" with annotation in xmpbox. Methods names are
different but it could be a good idea the make evolution in xmpbox to
"look like" jempbox.

As said in [1], there are very few dependencies between pdfbox and
jempbox and it could be easily cutted.

I propose that scenario :
* change xmpbox package to org.apache.xmpbox because this module is
independent of padaf/preflight
* do the work on xmpbox to make it "jempbox like"
* cut the dependency link between pdfbox and jempbox
* mark jempbox as deprecated (so users will no be disappointed wiht
PDFBox 1.7.0)
* change examples to use xmpbox.

So, in next release, there will have the two implementations of xmp
parser and the old one could be removed in a next release.

What do you think about that ?

BR,

Guillaume


[1] : https://issues.apache.org/jira/browse/PDFBOX-1187





On Thu, Dec 1, 2011 at 6:12 AM, Andreas Lehmkuehler <an...@lehmi.de> wrote:
> Hi,
>
> Am 30.11.2011 20:49, schrieb Guillaume Bailleul:
>
>> Hi all,
>>
>> I need opinions and ideas.
>>
>> Today, there are 2 implementations of xmp parsers in PDFBox. Jempbox,
>> the historical one and Xmpbox which joined the project stuck to
>> preflight.
>>
>> When we developed preflight in 2010, we started using Jempbox. But
>> after some patches, we decided to build an other library because there
>> was too many things to change to be able to do strict validation, and
>> we add to few time to make it in JempBox. Xmpbox was really inspire by
>> Jempbox so making one of the two should be possible.
>>
>> Today, the is no released version of "apache xmpbox" because it was
>> added after version 1.6.0. It could be great if the merge is finished
>> before 1.7.0.
>>
>> If some of you could read through the source codes and give some idea
>> on the methodology it would help. I really don't known what is the
>> best way to do it.
>
> If xmpbox provides the same functionality than jempbox, we can use it as
> drop-in replacement for jempbox.
>
>
>> Having some notice on how jempbox is used today could help.
>
> I guess the class o.a.p.jempbox.xmp.XMPMetaData is a good point to start. Is
> is used in
>
> - org.apache.pdfbox.examples.pdmodel.AddMetadataFromDocInfo
> - org.apache.pdfbox.examples.pdmodel.ExtractMetadata
> - org.apache.pdfbox.pdmodel.common.PDMetaData
>
>
>> Thanks for your help
>>
>> Guillaume
>
>
> BR
> Andreas Lehmkühler