You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2017/03/13 18:18:33 UTC

[VOTE] Release Apache PDFBox 2.0.5

Hi,

a candidate for the PDFBox 2.0.5 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/

The SHA1 checksum of the archive is 9521349be859498dfdd0e0f2a5d02b082f097ab1.

Please vote on releasing this package as Apache PDFBox 2.0.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.5
     [ ] -1 Do not release this package because...


Here is my +1

BR
Andreas Lehmk�hler

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Tilman Hausherr <TH...@t-online.de>.
+1

Tilman

Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
> Hi,
>
> a candidate for the PDFBox 2.0.5 release is available at:
>
>     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>
> The release candidate is a zip archive of the sources in:
>
>     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>
> The SHA1 checksum of the archive is 
> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>
> Please vote on releasing this package as Apache PDFBox 2.0.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
>
>     [ ] +1 Release this package as Apache PDFBox 2.0.5
>     [ ] -1 Do not release this package because...
>
>
> Here is my +1
>
> BR
> Andreas Lehmk�hler
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


[RESULT][VOTE] Release Apache PDFBox 2.0.5

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
> Please vote on releasing this package as Apache PDFBox 2.0.5.
   +1 Tilman Hausherr
   +1 Maruan Sahyoun
   +1 Timo Boehme
   +1 Tim Allison
   +1 Andreas Lehmk�hler

Thanks for your support and help!! I'm going to push the release out.

BR
Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.03.2017 um 14:10 schrieb Timo Boehme:
>
> Maybe we should add the
>   -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
> setting (introduced with 2.0.4) to the Migration/Getting Started 
> Web-Pages. I had to look through my emails in order to find it and it 
> really makes a difference (at least on some systems) if there are a 
> lot of images on a page - so far we only have the
>   -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
> setting documented (which did not help in my case). At least the user 
> may try it out if rendering gets slow on some pages; it may not be a 
> good general setting as it also may slow rendering down a bit on pages 
> with few large images.
>


I'd prefer to have this option also in the FAQ, in the rendering 
segment, as part of the answer to "Why am I getting a poor performance 
with some pages?", as part of a couple of solutions. I.e. that cmyk 
option (and mention that it seems to be related to virtual systems), and 
that KCMS option.

Tilman




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 16.03.2017 um 14:45 schrieb Allison, Timothy B.:
> Tilman and Andreas, thank you for taking a look!
>
> I agree no need to stop the release.  The improvements far outweigh the small regression.
>
>> I had a look at content_diffs_with_exceptions.xlsx, then looking only
>> at govdocs there, all are similar or better.
> Y, agreed.  Do we care about these likely broken PDFs from which 2.0.4 appears to be able to extract more "common words" than 2.0.5?
>
> commoncrawl2_likely_broken/OV/OVWMJPQGCK2AQZYVWJWYUPTERPXOGIAD
> commoncrawl2_likely_broken/R4/R4P75EJNMNXZC2DQYUFB6BSXQ2CWGVG7.pdf
> commoncrawl2_likely_broken/BI/BIVJLJ4QULQQ4VHKKNMBUTKWXAMMN53N.pdf
> commoncrawl2_likely_broken/LB/LB6LEZ75Y6OL7SGW7SV6JNO4G6FS7HAS
> commoncrawl2_likely_broken/LQ/LQQFDYEI7XTOBMFPSL3IDVKRMUB6YIGU
> commoncrawl2_likely_broken/OB/OBQTIKQW3MIEYJPGE4NR5WGPDUZC3ULY
> commoncrawl2_likely_broken/BC/BCZSFNQAB62TUBURWG6B3ZOZCG5IH46P
> commoncrawl2_likely_broken/TV/TVMANAJVH2VQVABYX6LCVO5KTERLFS2I.pdf
>
> Out of 543,805 PDFs in our test set, and given that they're broken, I'm not overly concerned.

Neither am I. If the issue I opened is solved, I'd expect that some of 
them will work again.

Tilman


>
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
> Sent: Wednesday, March 15, 2017 5:30 PM
> To: dev@pdfbox.apache.org
> Subject: Re: [VOTE] Release Apache PDFBox 2.0.5
>
> Am 15.03.2017 um 19:07 schrieb Tilman Hausherr:
>> Thanks Tim!
>>
>> I looked at newExceptionsInBDetails.xlsx (247 entries). IMHO no need
>> to stop the release, the number of entries in
>> fixedExceptionsInBDetails.xlsx (506) is larger, and the files with exceptions are cut off.
> I agree. However, I've checked one of the files 015664.pdf and it looks like an regression. I can open it using 2.0.4 but get the described exception with 2.0.5 :-(
>
> BR
> Andreas
>
>> I'll create an issue about these.
>>
>> I had a look at content_diffs_with_exceptions.xlsx, then looking only
>> at govdocs there, all are similar or better.
>>
>> Tilman
>>
>> Am 15.03.2017 um 00:03 schrieb Allison, Timothy B.:
>>> +1
>>>
>>> I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k
>>> files from our regression corpus.
>>>
>>> I haven't had a chance to do much digging, but I wanted to share what
>>> I had as soon as I had it.
>>>
>>> Reports are here:
>>> https://github.com/tballison/share/blob/master/pdfbox_comparisons/rep
>>> orts_pdfbox_2.0.5-rc1.zip
>>>
>>>
>>> Lots more "common words".  Many fewer exceptions.  There may be a
>>> regression that is causing 244 new exceptions, but on balance, the
>>> improvements are impressive.
>>>
>>>
>>> java.io.IOException: Missing root object specification in trailer.
>>>      at
>>> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(C
>>> OSParser.java:2169)
>>>
>>>      at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222)
>>>      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
>>>      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
>>>      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922)
>>>      at
>>> ...
>>>
>>> -----Original Message-----
>>> From: Timo Boehme [mailto:timo.boehme@ontochem.com]
>>> Sent: Tuesday, March 14, 2017 9:11 AM
>>> To: dev@pdfbox.apache.org
>>> Subject: Re: [VOTE] Release Apache PDFBox 2.0.5
>>>
>>> Hi,
>>>
>>> +1
>>>
>>> Maybe we should add the
>>>      -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
>>> setting (introduced with 2.0.4) to the Migration/Getting Started
>>> Web-Pages. I had to look through my emails in order to find it and it
>>> really makes a difference (at least on some systems) if there are a
>>> lot of images on a page - so far we only have the
>>>      -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
>>> setting documented (which did not help in my case). At least the user
>>> may try it out if rendering gets slow on some pages; it may not be a
>>> good general setting as it also may slow rendering down a bit on pages with few large images.
>>>
>>>
>>> Best,
>>> Timo
>>>
>>>
>>> Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
>>>> Hi,
>>>>
>>>> a candidate for the PDFBox 2.0.5 release is available at:
>>>>
>>>>       https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>>>>
>>>> The release candidate is a zip archive of the sources in:
>>>>
>>>>       http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>>>>
>>>> The SHA1 checksum of the archive is
>>>> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>>>>
>>>> Please vote on releasing this package as Apache PDFBox 2.0.5.
>>>> The vote is open for the next 72 hours and passes if a majority of
>>>> at least three +1 PDFBox PMC votes are cast.
>>>>
>>>>       [ ] +1 Release this package as Apache PDFBox 2.0.5
>>>>       [ ] -1 Do not release this package because...
>>>>
>>>>
>>>> Here is my +1
>>>>
>>>> BR
>>>> Andreas Lehmk�hler
>>>>
>>>> --------------------------------------------------------------------
>>>> - To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>> --
>>> Timo Boehme
>>> OntoChem IT Solutions GmbH
>>> Bl�cherstra�e 24
>>> 06120 Halle (Saale)
>>> Germany
>>>
>>> phone: +49 345 478 047 4        | fax: +49 345 478 047 1
>>> email: timo.boehme@ontochem.com | web: www.ontochem.com
>>> HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
>>> managing director : Lutz Weber
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: [VOTE] Release Apache PDFBox 2.0.5

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Tilman and Andreas, thank you for taking a look!

I agree no need to stop the release.  The improvements far outweigh the small regression.

> I had a look at content_diffs_with_exceptions.xlsx, then looking only 
> at govdocs there, all are similar or better.

Y, agreed.  Do we care about these likely broken PDFs from which 2.0.4 appears to be able to extract more "common words" than 2.0.5?  

commoncrawl2_likely_broken/OV/OVWMJPQGCK2AQZYVWJWYUPTERPXOGIAD
commoncrawl2_likely_broken/R4/R4P75EJNMNXZC2DQYUFB6BSXQ2CWGVG7.pdf
commoncrawl2_likely_broken/BI/BIVJLJ4QULQQ4VHKKNMBUTKWXAMMN53N.pdf
commoncrawl2_likely_broken/LB/LB6LEZ75Y6OL7SGW7SV6JNO4G6FS7HAS
commoncrawl2_likely_broken/LQ/LQQFDYEI7XTOBMFPSL3IDVKRMUB6YIGU
commoncrawl2_likely_broken/OB/OBQTIKQW3MIEYJPGE4NR5WGPDUZC3ULY
commoncrawl2_likely_broken/BC/BCZSFNQAB62TUBURWG6B3ZOZCG5IH46P
commoncrawl2_likely_broken/TV/TVMANAJVH2VQVABYX6LCVO5KTERLFS2I.pdf

Out of 543,805 PDFs in our test set, and given that they're broken, I'm not overly concerned.

-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
Sent: Wednesday, March 15, 2017 5:30 PM
To: dev@pdfbox.apache.org
Subject: Re: [VOTE] Release Apache PDFBox 2.0.5

Am 15.03.2017 um 19:07 schrieb Tilman Hausherr:
> Thanks Tim!
>
> I looked at newExceptionsInBDetails.xlsx (247 entries). IMHO no need 
> to stop the release, the number of entries in 
> fixedExceptionsInBDetails.xlsx (506) is larger, and the files with exceptions are cut off.
I agree. However, I've checked one of the files 015664.pdf and it looks like an regression. I can open it using 2.0.4 but get the described exception with 2.0.5 :-(

BR
Andreas

> I'll create an issue about these.
>
> I had a look at content_diffs_with_exceptions.xlsx, then looking only 
> at govdocs there, all are similar or better.
>
> Tilman
>
> Am 15.03.2017 um 00:03 schrieb Allison, Timothy B.:
>> +1
>>
>> I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k 
>> files from our regression corpus.
>>
>> I haven't had a chance to do much digging, but I wanted to share what 
>> I had as soon as I had it.
>>
>> Reports are here:
>> https://github.com/tballison/share/blob/master/pdfbox_comparisons/rep
>> orts_pdfbox_2.0.5-rc1.zip
>>
>>
>> Lots more "common words".  Many fewer exceptions.  There may be a 
>> regression that is causing 244 new exceptions, but on balance, the 
>> improvements are impressive.
>>
>>
>> java.io.IOException: Missing root object specification in trailer.
>>     at
>> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(C
>> OSParser.java:2169)
>>
>>     at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222)
>>     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
>>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
>>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922)
>>     at
>> ...
>>
>> -----Original Message-----
>> From: Timo Boehme [mailto:timo.boehme@ontochem.com]
>> Sent: Tuesday, March 14, 2017 9:11 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: [VOTE] Release Apache PDFBox 2.0.5
>>
>> Hi,
>>
>> +1
>>
>> Maybe we should add the
>>     -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
>> setting (introduced with 2.0.4) to the Migration/Getting Started 
>> Web-Pages. I had to look through my emails in order to find it and it 
>> really makes a difference (at least on some systems) if there are a 
>> lot of images on a page - so far we only have the
>>     -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
>> setting documented (which did not help in my case). At least the user 
>> may try it out if rendering gets slow on some pages; it may not be a 
>> good general setting as it also may slow rendering down a bit on pages with few large images.
>>
>>
>> Best,
>> Timo
>>
>>
>> Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
>>> Hi,
>>>
>>> a candidate for the PDFBox 2.0.5 release is available at:
>>>
>>>      https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>>>
>>> The release candidate is a zip archive of the sources in:
>>>
>>>      http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>>>
>>> The SHA1 checksum of the archive is
>>> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>>>
>>> Please vote on releasing this package as Apache PDFBox 2.0.5.
>>> The vote is open for the next 72 hours and passes if a majority of 
>>> at least three +1 PDFBox PMC votes are cast.
>>>
>>>      [ ] +1 Release this package as Apache PDFBox 2.0.5
>>>      [ ] -1 Do not release this package because...
>>>
>>>
>>> Here is my +1
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> --
>> Timo Boehme
>> OntoChem IT Solutions GmbH
>> Blücherstraße 24
>> 06120 Halle (Saale)
>> Germany
>>
>> phone: +49 345 478 047 4        | fax: +49 345 478 047 1
>> email: timo.boehme@ontochem.com | web: www.ontochem.com
>> HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
>> managing director : Lutz Weber
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 15.03.2017 um 19:07 schrieb Tilman Hausherr:
> Thanks Tim!
>
> I looked at newExceptionsInBDetails.xlsx (247 entries). IMHO no need to stop the
> release, the number of entries in fixedExceptionsInBDetails.xlsx (506) is
> larger, and the files with exceptions are cut off.
I agree. However, I've checked one of the files 015664.pdf and it looks like an 
regression. I can open it using 2.0.4 but get the described exception with 2.0.5 :-(

BR
Andreas

> I'll create an issue about these.
>
> I had a look at content_diffs_with_exceptions.xlsx, then looking only at govdocs
> there, all are similar or better.
>
> Tilman
>
> Am 15.03.2017 um 00:03 schrieb Allison, Timothy B.:
>> +1
>>
>> I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k files from
>> our regression corpus.
>>
>> I haven't had a chance to do much digging, but I wanted to share what I had as
>> soon as I had it.
>>
>> Reports are here:
>> https://github.com/tballison/share/blob/master/pdfbox_comparisons/reports_pdfbox_2.0.5-rc1.zip
>>
>>
>> Lots more "common words".  Many fewer exceptions.  There may be a regression
>> that is causing 244 new exceptions, but on balance, the improvements are
>> impressive.
>>
>>
>> java.io.IOException: Missing root object specification in trailer.
>>     at
>> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2169)
>>
>>     at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222)
>>     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
>>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
>>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922)
>>     at
>> ...
>>
>> -----Original Message-----
>> From: Timo Boehme [mailto:timo.boehme@ontochem.com]
>> Sent: Tuesday, March 14, 2017 9:11 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: [VOTE] Release Apache PDFBox 2.0.5
>>
>> Hi,
>>
>> +1
>>
>> Maybe we should add the
>>     -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
>> setting (introduced with 2.0.4) to the Migration/Getting Started Web-Pages. I
>> had to look through my emails in order to find it and it really makes a
>> difference (at least on some systems) if there are a lot of images on a page -
>> so far we only have the
>>     -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
>> setting documented (which did not help in my case). At least the user may try
>> it out if rendering gets slow on some pages; it may not be a good general
>> setting as it also may slow rendering down a bit on pages with few large images.
>>
>>
>> Best,
>> Timo
>>
>>
>> Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
>>> Hi,
>>>
>>> a candidate for the PDFBox 2.0.5 release is available at:
>>>
>>>      https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>>>
>>> The release candidate is a zip archive of the sources in:
>>>
>>>      http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>>>
>>> The SHA1 checksum of the archive is
>>> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>>>
>>> Please vote on releasing this package as Apache PDFBox 2.0.5.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 PDFBox PMC votes are cast.
>>>
>>>      [ ] +1 Release this package as Apache PDFBox 2.0.5
>>>      [ ] -1 Do not release this package because...
>>>
>>>
>>> Here is my +1
>>>
>>> BR
>>> Andreas Lehmk�hler
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>> --
>> Timo Boehme
>> OntoChem IT Solutions GmbH
>> Bl�cherstra�e 24
>> 06120 Halle (Saale)
>> Germany
>>
>> phone: +49 345 478 047 4        | fax: +49 345 478 047 1
>> email: timo.boehme@ontochem.com | web: www.ontochem.com
>> HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
>> managing director : Lutz Weber
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional
>> commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Tilman Hausherr <TH...@t-online.de>.
Thanks Tim!

I looked at newExceptionsInBDetails.xlsx (247 entries). IMHO no need to 
stop the release, the number of entries in 
fixedExceptionsInBDetails.xlsx (506) is larger, and the files with 
exceptions are cut off.

I'll create an issue about these.

I had a look at content_diffs_with_exceptions.xlsx, then looking only at 
govdocs there, all are similar or better.

Tilman

Am 15.03.2017 um 00:03 schrieb Allison, Timothy B.:
> +1
>
> I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k files from our regression corpus.
>
> I haven't had a chance to do much digging, but I wanted to share what I had as soon as I had it.
>
> Reports are here: https://github.com/tballison/share/blob/master/pdfbox_comparisons/reports_pdfbox_2.0.5-rc1.zip
>
> Lots more "common words".  Many fewer exceptions.  There may be a regression that is causing 244 new exceptions, but on balance, the improvements are impressive.
>
>
> java.io.IOException: Missing root object specification in trailer.
> 	at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2169)
> 	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922)
> 	at
> ...
>
> -----Original Message-----
> From: Timo Boehme [mailto:timo.boehme@ontochem.com]
> Sent: Tuesday, March 14, 2017 9:11 AM
> To: dev@pdfbox.apache.org
> Subject: Re: [VOTE] Release Apache PDFBox 2.0.5
>
> Hi,
>
> +1
>
> Maybe we should add the
>     -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
> setting (introduced with 2.0.4) to the Migration/Getting Started Web-Pages. I had to look through my emails in order to find it and it really makes a difference (at least on some systems) if there are a lot of images on a page - so far we only have the
>     -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
> setting documented (which did not help in my case). At least the user may try it out if rendering gets slow on some pages; it may not be a good general setting as it also may slow rendering down a bit on pages with few large images.
>
>
> Best,
> Timo
>
>
> Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
>> Hi,
>>
>> a candidate for the PDFBox 2.0.5 release is available at:
>>
>>      https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>>
>> The release candidate is a zip archive of the sources in:
>>
>>      http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>>
>> The SHA1 checksum of the archive is
>> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>>
>> Please vote on releasing this package as Apache PDFBox 2.0.5.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 PDFBox PMC votes are cast.
>>
>>      [ ] +1 Release this package as Apache PDFBox 2.0.5
>>      [ ] -1 Do not release this package because...
>>
>>
>> Here is my +1
>>
>> BR
>> Andreas Lehmk�hler
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For
>> additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> --
> Timo Boehme
> OntoChem IT Solutions GmbH
> Bl�cherstra�e 24
> 06120 Halle (Saale)
> Germany
>
> phone: +49 345 478 047 4        | fax: +49 345 478 047 1
> email: timo.boehme@ontochem.com | web: www.ontochem.com
> HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
> managing director : Lutz Weber
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: [VOTE] Release Apache PDFBox 2.0.5

Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1

I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k files from our regression corpus.

I haven't had a chance to do much digging, but I wanted to share what I had as soon as I had it.

Reports are here: https://github.com/tballison/share/blob/master/pdfbox_comparisons/reports_pdfbox_2.0.5-rc1.zip 

Lots more "common words".  Many fewer exceptions.  There may be a regression that is causing 244 new exceptions, but on balance, the improvements are impressive.


java.io.IOException: Missing root object specification in trailer.
	at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2169)
	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922)
	at 
...

-----Original Message-----
From: Timo Boehme [mailto:timo.boehme@ontochem.com] 
Sent: Tuesday, March 14, 2017 9:11 AM
To: dev@pdfbox.apache.org
Subject: Re: [VOTE] Release Apache PDFBox 2.0.5

Hi,

+1

Maybe we should add the
   -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
setting (introduced with 2.0.4) to the Migration/Getting Started Web-Pages. I had to look through my emails in order to find it and it really makes a difference (at least on some systems) if there are a lot of images on a page - so far we only have the
   -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
setting documented (which did not help in my case). At least the user may try it out if rendering gets slow on some pages; it may not be a good general setting as it also may slow rendering down a bit on pages with few large images.


Best,
Timo


Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
> Hi,
>
> a candidate for the PDFBox 2.0.5 release is available at:
>
>     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>
> The release candidate is a zip archive of the sources in:
>
>     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>
> The SHA1 checksum of the archive is
> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>
> Please vote on releasing this package as Apache PDFBox 2.0.5.
> The vote is open for the next 72 hours and passes if a majority of at 
> least three +1 PDFBox PMC votes are cast.
>
>     [ ] +1 Release this package as Apache PDFBox 2.0.5
>     [ ] -1 Do not release this package because...
>
>
> Here is my +1
>
> BR
> Andreas Lehmkühler
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
>


--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4        | fax: +49 345 478 047 1
email: timo.boehme@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Timo Boehme <ti...@ontochem.com>.
Hi,

+1

Maybe we should add the
   -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true
setting (introduced with 2.0.4) to the Migration/Getting Started 
Web-Pages. I had to look through my emails in order to find it and it 
really makes a difference (at least on some systems) if there are a lot 
of images on a page - so far we only have the
   -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
setting documented (which did not help in my case). At least the user 
may try it out if rendering gets slow on some pages; it may not be a 
good general setting as it also may slow rendering down a bit on pages 
with few large images.


Best,
Timo


Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler:
> Hi,
>
> a candidate for the PDFBox 2.0.5 release is available at:
>
>     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
>
> The release candidate is a zip archive of the sources in:
>
>     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
>
> The SHA1 checksum of the archive is
> 9521349be859498dfdd0e0f2a5d02b082f097ab1.
>
> Please vote on releasing this package as Apache PDFBox 2.0.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
>
>     [ ] +1 Release this package as Apache PDFBox 2.0.5
>     [ ] -1 Do not release this package because...
>
>
> Here is my +1
>
> BR
> Andreas Lehmk�hler
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


-- 
Timo Boehme
OntoChem IT Solutions GmbH
Bl�cherstra�e 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4        | fax: +49 345 478 047 1
email: timo.boehme@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: [VOTE] Release Apache PDFBox 2.0.5

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
+1 Maruan

> Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler <an...@lehmi.de>:
> 
> Hi,
> 
> a candidate for the PDFBox 2.0.5 release is available at:
> 
>    https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/
> 
> The release candidate is a zip archive of the sources in:
> 
>    http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/
> 
> The SHA1 checksum of the archive is 9521349be859498dfdd0e0f2a5d02b082f097ab1.
> 
> Please vote on releasing this package as Apache PDFBox 2.0.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>    [ ] +1 Release this package as Apache PDFBox 2.0.5
>    [ ] -1 Do not release this package because...
> 
> 
> Here is my +1
> 
> BR
> Andreas Lehmkühler
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org