You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2022/04/07 05:41:31 UTC

2.0.26 release

Hi,

sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the 
day after tomorrow, if nobody objects.

Andreas

P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 17.04.22 um 20:25 schrieb Tilman Hausherr:
> new regression tests results at
> 
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.25_vs_2.0.26.tar.xz
> 
> IMHO we're fine now!
Thanks for the fast re-test!

I'm going to cut the 2.0.26 release tomorrow

Andreas

> 
> Tilman
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
new regression tests results at

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.25_vs_2.0.26.tar.xz

IMHO we're fine now!

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 15.04.2022 um 10:42 schrieb Tilman Hausherr:
> Am 14.04.2022 um 08:59 schrieb Tilman Hausherr:
>>
>> I will rerun the tests myself on the long weekend, if I have the time. 
>
>
> https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.25_vs_2.0.26.tar.xz 

This time I looked at content_diffs_with_exceptions.xlsx , I thought 
this is about the difference with PDFs that fail with exceptions but I 
suspect it is something different.

TOP_10_MORE_IN_A has 7 that are relevant, but I believe this is a 
similar problem because the language looks similar, so I created only 1 
issue.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.04.2022 um 08:59 schrieb Tilman Hausherr:
>
> I will rerun the tests myself on the long weekend, if I have the time. 


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.25_vs_2.0.26.tar.xz



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.04.2022 um 08:13 schrieb Andreas Lehmkuehler:
> Cool, thanks for the feedback. I've set the ticket to resolved.
>
> Do we need to re-run the tests?
>
> BTW, what about PDFBOX-5394? Is there anything left to do? Do we have 
> to wait for the feedback of the user? 

I've set that one to resolved.

I will rerun the tests myself on the long weekend, if I have the time.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Cool, thanks for the feedback. I've set the ticket to resolved.

Do we need to re-run the tests?

BTW, what about PDFBOX-5394? Is there anything left to do? Do we have to wait 
for the feedback of the user?

Andreas

Am 13.04.22 um 08:29 schrieb Tilman Hausherr:
> Yeah, PDFBOX-5413 fixes that one as well. 👍
> 
> Tilman
> 
> Am 12.04.2022 um 19:26 schrieb Tilman Hausherr:
>> Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf .
>>
>> There is some sort of problem with an incremental save, a part of the 
>> multi-content stream is missing / has a new object number. Lets wait whether 
>> it is related to PDFBOX-5413 .
>>
>> (The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an improvement, I'll 
>> add it to my own tests)
>>
>> Tilman
>>
>> Am 12.04.2022 um 18:25 schrieb Tilman Hausherr:
>>> Only
>>> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
>>> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
>>> have a different text extraction
>>>
>>> With the other two it's attachment file names or doc info.
>>>
>>> Tilman
>>>
>>> Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
>>>> After having looked at the content differences and trying to rule out the 
>>>> /Names differences, there are 4 files with content in TOP_10_MORE_IN_A that 
>>>> feel suspicious and IMHO need investigation.
>>>>
>>>> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
>>>> govdocs1/365/365260.pdf
>>>> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
>>>> govdocs1/150/150282.pdf
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>> Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:
>>>>> Thanks Tim!
>>>>>
>>>>> Looks like there are 5 new exceptions left.
>>>>>
>>>>> I'm going to check the first two ones
>>>>>
>>>>> commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
>>>>> commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
>>>>>
>>>>> The others are thrown within Jempbox ....
>>>>>
>>>>>
>>>>> Andreas
>>>>>
>>>>> Am 11.04.22 um 12:40 schrieb Tim Allison:
>>>>>> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>>>>>>
>>>>>> Haven't had a chance to review.  Hot off the vm.
>>>>>>
>>>>>> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> wrote:
>>>>>>>
>>>>>>> Will try to kick off today…first thing Monday morning (EDT) at the latest.
>>>>>>>
>>>>>>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler <an...@lehmi.de> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>>>>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).
>>>>>>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
>>>>>>>> flatten test.
>>>>>>>>
>>>>>>>> @Tim Is there any chance to re-run the tests?
>>>>>>>>
>>>>>>>> Andreas
>>>>>>>>
>>>>>>>>>
>>>>>>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
>>>>>>>>>
>>>>>>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>>>>>>> java.io.IOException: javax.crypto.BadPaddingException: Given final 
>>>>>>>>> block not
>>>>>>>>> properly padded. Such issues can arise if a bad key is used during 
>>>>>>>>> decryption.
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Caused by: javax.crypto.BadPaddingException: Given final block not 
>>>>>>>>> properly
>>>>>>>>> padded. Such issues can arise if a bad key is used during decryption.
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       at
>>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not creating an issue this time in case this is also related to 
>>>>>>>>> another
>>>>>>>>> known problem.
>>>>>>>>>
>>>>>>>>> Tilman
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Yeah, PDFBOX-5413 fixes that one as well. 👍

Tilman

Am 12.04.2022 um 19:26 schrieb Tilman Hausherr:
> Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf .
>
> There is some sort of problem with an incremental save, a part of the 
> multi-content stream is missing / has a new object number. Lets wait 
> whether it is related to PDFBOX-5413 .
>
> (The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an 
> improvement, I'll add it to my own tests)
>
> Tilman
>
> Am 12.04.2022 um 18:25 schrieb Tilman Hausherr:
>> Only
>> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
>> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
>> have a different text extraction
>>
>> With the other two it's attachment file names or doc info.
>>
>> Tilman
>>
>> Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
>>> After having looked at the content differences and trying to rule 
>>> out the /Names differences, there are 4 files with content in 
>>> TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation.
>>>
>>> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
>>> govdocs1/365/365260.pdf
>>> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
>>> govdocs1/150/150282.pdf
>>>
>>> Tilman
>>>
>>>
>>>
>>> Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:
>>>> Thanks Tim!
>>>>
>>>> Looks like there are 5 new exceptions left.
>>>>
>>>> I'm going to check the first two ones
>>>>
>>>> commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
>>>> commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
>>>>
>>>> The others are thrown within Jempbox ....
>>>>
>>>>
>>>> Andreas
>>>>
>>>> Am 11.04.22 um 12:40 schrieb Tim Allison:
>>>>> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>>>>>
>>>>> Haven't had a chance to review.  Hot off the vm.
>>>>>
>>>>> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> 
>>>>> wrote:
>>>>>>
>>>>>> Will try to kick off today…first thing Monday morning (EDT) at 
>>>>>> the latest.
>>>>>>
>>>>>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler 
>>>>>> <an...@lehmi.de> wrote:
>>>>>>>
>>>>>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>>>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled 
>>>>>>>> by default).
>>>>>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the 
>>>>>>> disabled
>>>>>>> flatten test.
>>>>>>>
>>>>>>> @Tim Is there any chance to re-run the tests?
>>>>>>>
>>>>>>> Andreas
>>>>>>>
>>>>>>>>
>>>>>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
>>>>>>>>
>>>>>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>>>>>> java.io.IOException: javax.crypto.BadPaddingException: Given 
>>>>>>>> final block not
>>>>>>>> properly padded. Such issues can arise if a bad key is used 
>>>>>>>> during decryption.
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>>
>>>>>>>>
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>>
>>>>>>>>
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>>
>>>>>>>>
>>>>>>>> Caused by: javax.crypto.BadPaddingException: Given final block 
>>>>>>>> not properly
>>>>>>>> padded. Such issues can arise if a bad key is used during 
>>>>>>>> decryption.
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>>
>>>>>>>>
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>>
>>>>>>>>
>>>>>>>>       at
>>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not creating an issue this time in case this is also 
>>>>>>>> related to another
>>>>>>>> known problem.
>>>>>>>>
>>>>>>>> Tilman
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf .

There is some sort of problem with an incremental save, a part of the 
multi-content stream is missing / has a new object number. Lets wait 
whether it is related to PDFBOX-5413 .

(The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an improvement, 
I'll add it to my own tests)

Tilman

Am 12.04.2022 um 18:25 schrieb Tilman Hausherr:
> Only
> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
> have a different text extraction
>
> With the other two it's attachment file names or doc info.
>
> Tilman
>
> Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
>> After having looked at the content differences and trying to rule out 
>> the /Names differences, there are 4 files with content in 
>> TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation.
>>
>> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
>> govdocs1/365/365260.pdf
>> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
>> govdocs1/150/150282.pdf
>>
>> Tilman
>>
>>
>>
>> Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:
>>> Thanks Tim!
>>>
>>> Looks like there are 5 new exceptions left.
>>>
>>> I'm going to check the first two ones
>>>
>>> commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
>>> commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
>>>
>>> The others are thrown within Jempbox ....
>>>
>>>
>>> Andreas
>>>
>>> Am 11.04.22 um 12:40 schrieb Tim Allison:
>>>> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>>>>
>>>> Haven't had a chance to review.  Hot off the vm.
>>>>
>>>> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> 
>>>> wrote:
>>>>>
>>>>> Will try to kick off today…first thing Monday morning (EDT) at the 
>>>>> latest.
>>>>>
>>>>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler 
>>>>> <an...@lehmi.de> wrote:
>>>>>>
>>>>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by 
>>>>>>> default).
>>>>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the 
>>>>>> disabled
>>>>>> flatten test.
>>>>>>
>>>>>> @Tim Is there any chance to re-run the tests?
>>>>>>
>>>>>> Andreas
>>>>>>
>>>>>>>
>>>>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
>>>>>>>
>>>>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>>>>> java.io.IOException: javax.crypto.BadPaddingException: Given 
>>>>>>> final block not
>>>>>>> properly padded. Such issues can arise if a bad key is used 
>>>>>>> during decryption.
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>
>>>>>>>
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>
>>>>>>>
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>
>>>>>>>
>>>>>>> Caused by: javax.crypto.BadPaddingException: Given final block 
>>>>>>> not properly
>>>>>>> padded. Such issues can arise if a bad key is used during 
>>>>>>> decryption.
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>>
>>>>>>>
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>>
>>>>>>>
>>>>>>>       at
>>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm not creating an issue this time in case this is also related 
>>>>>>> to another
>>>>>>> known problem.
>>>>>>>
>>>>>>> Tilman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Only
commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
have a different text extraction

With the other two it's attachment file names or doc info.

Tilman

Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
> After having looked at the content differences and trying to rule out 
> the /Names differences, there are 4 files with content in 
> TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation.
>
> commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
> govdocs1/365/365260.pdf
> commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
> govdocs1/150/150282.pdf
>
> Tilman
>
>
>
> Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:
>> Thanks Tim!
>>
>> Looks like there are 5 new exceptions left.
>>
>> I'm going to check the first two ones
>>
>> commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
>> commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
>>
>> The others are thrown within Jempbox ....
>>
>>
>> Andreas
>>
>> Am 11.04.22 um 12:40 schrieb Tim Allison:
>>> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>>>
>>> Haven't had a chance to review.  Hot off the vm.
>>>
>>> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> 
>>> wrote:
>>>>
>>>> Will try to kick off today…first thing Monday morning (EDT) at the 
>>>> latest.
>>>>
>>>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler 
>>>> <an...@lehmi.de> wrote:
>>>>>
>>>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by 
>>>>>> default).
>>>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the 
>>>>> disabled
>>>>> flatten test.
>>>>>
>>>>> @Tim Is there any chance to re-run the tests?
>>>>>
>>>>> Andreas
>>>>>
>>>>>>
>>>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
>>>>>>
>>>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>>>> java.io.IOException: javax.crypto.BadPaddingException: Given 
>>>>>> final block not
>>>>>> properly padded. Such issues can arise if a bad key is used 
>>>>>> during decryption.
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>
>>>>>>
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>
>>>>>>
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>
>>>>>>
>>>>>> Caused by: javax.crypto.BadPaddingException: Given final block 
>>>>>> not properly
>>>>>> padded. Such issues can arise if a bad key is used during 
>>>>>> decryption.
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>>
>>>>>>
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>>
>>>>>>
>>>>>>       at
>>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not creating an issue this time in case this is also related 
>>>>>> to another
>>>>>> known problem.
>>>>>>
>>>>>> Tilman
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
After having looked at the content differences and trying to rule out 
the /Names differences, there are 4 files with content in 
TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation.

commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
govdocs1/365/365260.pdf
commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
govdocs1/150/150282.pdf

Tilman



Am 12.04.2022 um 08:09 schrieb Andreas Lehmkuehler:
> Thanks Tim!
>
> Looks like there are 5 new exceptions left.
>
> I'm going to check the first two ones
>
> commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
> commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
>
> The others are thrown within Jempbox ....
>
>
> Andreas
>
> Am 11.04.22 um 12:40 schrieb Tim Allison:
>> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>>
>> Haven't had a chance to review.  Hot off the vm.
>>
>> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> wrote:
>>>
>>> Will try to kick off today…first thing Monday morning (EDT) at the 
>>> latest.
>>>
>>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler 
>>> <an...@lehmi.de> wrote:
>>>>
>>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by 
>>>>> default).
>>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the 
>>>> disabled
>>>> flatten test.
>>>>
>>>> @Tim Is there any chance to re-run the tests?
>>>>
>>>> Andreas
>>>>
>>>>>
>>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
>>>>>
>>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>>> java.io.IOException: javax.crypto.BadPaddingException: Given final 
>>>>> block not
>>>>> properly padded. Such issues can arise if a bad key is used during 
>>>>> decryption.
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>
>>>>>
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>
>>>>>
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>
>>>>>
>>>>> Caused by: javax.crypto.BadPaddingException: Given final block not 
>>>>> properly
>>>>> padded. Such issues can arise if a bad key is used during decryption.
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
>>>>>
>>>>>
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
>>>>>
>>>>>
>>>>>       at
>>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
>>>>>
>>>>>
>>>>>
>>>>> I'm not creating an issue this time in case this is also related 
>>>>> to another
>>>>> known problem.
>>>>>
>>>>> Tilman
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Thanks Tim!

Looks like there are 5 new exceptions left.

I'm going to check the first two ones

commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H

The others are thrown within Jempbox ....


Andreas

Am 11.04.22 um 12:40 schrieb Tim Allison:
> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
> 
> Haven't had a chance to review.  Hot off the vm.
> 
> On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> wrote:
>>
>> Will try to kick off today…first thing Monday morning (EDT) at the latest.
>>
>> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:
>>>
>>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>>>> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).
>>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
>>> flatten test.
>>>
>>> @Tim Is there any chance to re-run the tests?
>>>
>>> Andreas
>>>
>>>>
>>>> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest)
>>>> Time elapsed: 1.083 s  <<< ERROR!
>>>> java.io.IOException: javax.crypto.BadPaddingException: Given final block not
>>>> properly padded. Such issues can arise if a bad key is used during decryption.
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>>>>
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>>>>
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>>>>
>>>> Caused by: javax.crypto.BadPaddingException: Given final block not properly
>>>> padded. Such issues can arise if a bad key is used during decryption.
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>>>>
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>>>>
>>>>       at
>>>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>>>>
>>>>
>>>> I'm not creating an issue this time in case this is also related to another
>>>> known problem.
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tim Allison <ta...@apache.org>.
Yes. Sorry. That's my fault.  I did something stupid in 2.3.0 and then
fixed it in 2.4.0-SNAPSHOT (TIKA-3711).

On Mon, Apr 11, 2022 at 1:35 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Am 11.04.2022 um 12:40 schrieb Tim Allison:
>
> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>
> Haven't had a chance to review.  Hot off the vm.
>
>
> Thanks Tim!
>
> I looked at govdocs1/198/198191.pdf , this one has "high: 1 |
> quality.joboptions: 1" more in A. This is a name of a file attachment at
> Root/Names/EmbeddedFiles/Kids/[0]/Names/[0]  but I can see it both in
> 2.0.25 and 2.0.26. Could it be that the test ran with different tika
> versions?
>
> Tilman
>

Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 11.04.2022 um 12:40 schrieb Tim Allison:
> https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz
>
> Haven't had a chance to review.  Hot off the vm.


Thanks Tim!

I looked at govdocs1/198/198191.pdf , this one has "high: 1 | 
quality.joboptions: 1" more in A. This is a name of a file attachment 
at  Root/Names/EmbeddedFiles/Kids/[0]/Names/[0]  but I can see it both 
in 2.0.25 and 2.0.26. Could it be that the test ran with different tika 
versions?

Tilman

Re: 2.0.26 release

Posted by Tim Allison <ta...@apache.org>.
https://corpora.tika.apache.org/base/reports/tika-2.4-20220410.tgz

Haven't had a chance to review.  Hot off the vm.

On Sun, Apr 10, 2022 at 9:58 AM Tim Allison <ta...@apache.org> wrote:
>
> Will try to kick off today…first thing Monday morning (EDT) at the latest.
>
> On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:
>>
>> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
>> > testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).
>> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
>> flatten test.
>>
>> @Tim Is there any chance to re-run the tests?
>>
>> Andreas
>>
>> >
>> > testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest)
>> > Time elapsed: 1.083 s  <<< ERROR!
>> > java.io.IOException: javax.crypto.BadPaddingException: Given final block not
>> > properly padded. Such issues can arise if a bad key is used during decryption.
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>> >
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>> >
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>> >
>> > Caused by: javax.crypto.BadPaddingException: Given final block not properly
>> > padded. Such issues can arise if a bad key is used during decryption.
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>> >
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>> >
>> >      at
>> > org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>> >
>> >
>> > I'm not creating an issue this time in case this is also related to another
>> > known problem.
>> >
>> > Tilman
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> > For additional commands, e-mail: dev-help@pdfbox.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tim Allison <ta...@apache.org>.
Will try to kick off today…first thing Monday morning (EDT) at the latest.

On Sun, Apr 10, 2022 at 9:05 AM Andreas Lehmkuehler <an...@lehmi.de>
wrote:

> Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
> > testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by
> default).
> I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled
> flatten test.
>
> @Tim Is there any chance to re-run the tests?
>
> Andreas
>
> >
> >
> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest)
>
> > Time elapsed: 1.083 s  <<< ERROR!
> > java.io.IOException: javax.crypto.BadPaddingException: Given final block
> not
> > properly padded. Such issues can arise if a bad key is used during
> decryption.
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>
> >
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>
> >
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>
> >
> > Caused by: javax.crypto.BadPaddingException: Given final block not
> properly
> > padded. Such issues can arise if a bad key is used during decryption.
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
>
> >
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
>
> >
> >      at
> >
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
>
> >
> >
> > I'm not creating an issue this time in case this is also related to
> another
> > known problem.
> >
> > Tilman
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>

Re: 2.0.26 release

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 09.04.22 um 19:00 schrieb Tilman Hausherr:
> testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).
I've fixed all new tickets. PDFBOX-5413 fixes the issue with the disabled 
flatten test.

@Tim Is there any chance to re-run the tests?

Andreas

> 
> testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
> Time elapsed: 1.083 s  <<< ERROR!
> java.io.IOException: javax.crypto.BadPaddingException: Given final block not 
> properly padded. Such issues can arise if a bad key is used during decryption.
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
> 
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
> 
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
> 
> Caused by: javax.crypto.BadPaddingException: Given final block not properly 
> padded. Such issues can arise if a bad key is used during decryption.
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345) 
> 
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309) 
> 
>      at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105) 
> 
> 
> I'm not creating an issue this time in case this is also related to another 
> known problem.
> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tilman Hausherr <TH...@t-online.de>.
testFlattenPDFBOX2469Filled also fails in 2.0 (it is disabled by default).

testFlattenPDFBOX2469Filled(org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest) 
Time elapsed: 1.083 s  <<< ERROR!
java.io.IOException: javax.crypto.BadPaddingException: Given final block 
not properly padded. Such issues can arise if a bad key is used during 
decryption.
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)
Caused by: javax.crypto.BadPaddingException: Given final block not 
properly padded. Such issues can arise if a bad key is used during 
decryption.
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.generateSamples(PDAcroFormFlattenTest.java:345)
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.flattenAndCompare(PDAcroFormFlattenTest.java:309)
     at 
org.apache.pdfbox.pdmodel.interactive.form.PDAcroFormFlattenTest.testFlattenPDFBOX2469Filled(PDAcroFormFlattenTest.java:105)

I'm not creating an issue this time in case this is also related to 
another known problem.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Thanks Tim!

I've checked the first files of the new exceptions and there seems to be at 
least one new regression

commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
commoncrawl3/YI/YIEMGIQYGXCQ5AZOE35ESXYCZHWR3V57
commoncrawl3_refetched/5C/5CWAUHFCZMK42IHSMSKNIR3MHXHR4IRN

All render fine using 2.0.25 but throw an exception using 2.0.26

I'm going to have a deeper look later

Am 07.04.22 um 20:27 schrieb Tim Allison:
> https://corpora.tika.apache.org/base/reports/pdfbox-2.0.26-snapshot-reports.tgz
> 
> I haven't had a chance to look at them yet.
> 
> On Thu, Apr 7, 2022 at 9:07 AM Andreas Lehmkühler <an...@lehmi.de> wrote:
>>
>> Yes, please
>>
>> Thanks in advance
>> Andreas
>>
>> 07.04.2022 11:44:38 Tim Allison <ta...@apache.org>:
>>
>>> Sounds great! Should I rerun the regression tests today?
>>>
>>> On Thu, Apr 7, 2022 at 1:41 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry for the delay.  I'm planning to cut the 2.0.26 release next
>>>> Saturday, the
>>>> day after tomorrow, if nobody objects.
>>>>
>>>> Andreas
>>>>
>>>> P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is
>>>> out
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tim Allison <ta...@apache.org>.
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.26-snapshot-reports.tgz

I haven't had a chance to look at them yet.

On Thu, Apr 7, 2022 at 9:07 AM Andreas Lehmkühler <an...@lehmi.de> wrote:
>
> Yes, please
>
> Thanks in advance
> Andreas
>
> 07.04.2022 11:44:38 Tim Allison <ta...@apache.org>:
>
> > Sounds great! Should I rerun the regression tests today?
> >
> > On Thu, Apr 7, 2022 at 1:41 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:
> >
> >> Hi,
> >>
> >> sorry for the delay.  I'm planning to cut the 2.0.26 release next
> >> Saturday, the
> >> day after tomorrow, if nobody objects.
> >>
> >> Andreas
> >>
> >> P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is
> >> out
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: dev-help@pdfbox.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Yes, please

Thanks in advance
Andreas

07.04.2022 11:44:38 Tim Allison <ta...@apache.org>:

> Sounds great! Should I rerun the regression tests today?
> 
> On Thu, Apr 7, 2022 at 1:41 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:
> 
>> Hi,
>> 
>> sorry for the delay.  I'm planning to cut the 2.0.26 release next
>> Saturday, the
>> day after tomorrow, if nobody objects.
>> 
>> Andreas
>> 
>> P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is
>> out
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: 2.0.26 release

Posted by Tim Allison <ta...@apache.org>.
Sounds great! Should I rerun the regression tests today?

On Thu, Apr 7, 2022 at 1:41 AM Andreas Lehmkuehler <an...@lehmi.de> wrote:

> Hi,
>
> sorry for the delay.  I'm planning to cut the 2.0.26 release next
> Saturday, the
> day after tomorrow, if nobody objects.
>
> Andreas
>
> P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is
> out
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>

Re: Next releases WAS: Re: 2.4.0 release?

Posted by Tim Allison <ta...@apache.org>.
https://repository.apache.org is having a bad day.  Requests are
timing out left and right.  I'll try to perform the release of
2.4.0-rc1 later today or tomorrow when the repo is happier.

On Thu, Apr 28, 2022 at 9:47 AM Tim Allison <ta...@apache.org> wrote:
>
> I've upgraded junrar in both branches, and the regression results look good.
>
> I'll start 1.28.2-rc2 shortly, and then follow up with 2.4.0-rc1 if
> there aren't any objections.
>
> On Tue, Apr 26, 2022 at 9:10 AM Tim Allison <ta...@apache.org> wrote:
> >
> > All,
> >
> > I'm prepping rc1 for 1.28.2 now.
> >
> > I'm running the regression tests for 2.4.0, and I hope to have results
> > today with possibly an rc later today or early tomorrow if there are
> > no surprises.
> >
> > Please let me know if there are any blockers.
> >
> > Best,
> >
> >         Tim
> >
> > On Thu, Apr 7, 2022 at 9:50 AM Tim Allison <ta...@apache.org> wrote:
> > >
> > > All,
> > >   Once the new PDFBox is out, we should probably kick off the 2.4.0
> > > release.  If I'm release manager, given my schedule, that'll probably
> > > be the week of April 18th.
> > >   I want to fix TIKA-3711 (embedded file names), but other than that,
> > > I don't think there are any blockers.
> > >
> > >   WDYT?
> > >
> > >          Best,
> > >
> > >                  Tim
> > >
> > > ---------- Forwarded message ---------
> > > From: Andreas Lehmkuehler <an...@lehmi.de>
> > > Date: Thu, Apr 7, 2022 at 1:41 AM
> > > Subject: 2.0.26 release
> > > To: <de...@pdfbox.apache.org>
> > >
> > >
> > > Hi,
> > >
> > > sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the
> > > day after tomorrow, if nobody objects.
> > >
> > > Andreas
> > >
> > > P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > > For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: Next releases WAS: Re: 2.4.0 release?

Posted by Tim Allison <ta...@apache.org>.
I've upgraded junrar in both branches, and the regression results look good.

I'll start 1.28.2-rc2 shortly, and then follow up with 2.4.0-rc1 if
there aren't any objections.

On Tue, Apr 26, 2022 at 9:10 AM Tim Allison <ta...@apache.org> wrote:
>
> All,
>
> I'm prepping rc1 for 1.28.2 now.
>
> I'm running the regression tests for 2.4.0, and I hope to have results
> today with possibly an rc later today or early tomorrow if there are
> no surprises.
>
> Please let me know if there are any blockers.
>
> Best,
>
>         Tim
>
> On Thu, Apr 7, 2022 at 9:50 AM Tim Allison <ta...@apache.org> wrote:
> >
> > All,
> >   Once the new PDFBox is out, we should probably kick off the 2.4.0
> > release.  If I'm release manager, given my schedule, that'll probably
> > be the week of April 18th.
> >   I want to fix TIKA-3711 (embedded file names), but other than that,
> > I don't think there are any blockers.
> >
> >   WDYT?
> >
> >          Best,
> >
> >                  Tim
> >
> > ---------- Forwarded message ---------
> > From: Andreas Lehmkuehler <an...@lehmi.de>
> > Date: Thu, Apr 7, 2022 at 1:41 AM
> > Subject: 2.0.26 release
> > To: <de...@pdfbox.apache.org>
> >
> >
> > Hi,
> >
> > sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the
> > day after tomorrow, if nobody objects.
> >
> > Andreas
> >
> > P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org

Next releases WAS: Re: 2.4.0 release?

Posted by Tim Allison <ta...@apache.org>.
All,

I'm prepping rc1 for 1.28.2 now.

I'm running the regression tests for 2.4.0, and I hope to have results
today with possibly an rc later today or early tomorrow if there are
no surprises.

Please let me know if there are any blockers.

Best,

        Tim

On Thu, Apr 7, 2022 at 9:50 AM Tim Allison <ta...@apache.org> wrote:
>
> All,
>   Once the new PDFBox is out, we should probably kick off the 2.4.0
> release.  If I'm release manager, given my schedule, that'll probably
> be the week of April 18th.
>   I want to fix TIKA-3711 (embedded file names), but other than that,
> I don't think there are any blockers.
>
>   WDYT?
>
>          Best,
>
>                  Tim
>
> ---------- Forwarded message ---------
> From: Andreas Lehmkuehler <an...@lehmi.de>
> Date: Thu, Apr 7, 2022 at 1:41 AM
> Subject: 2.0.26 release
> To: <de...@pdfbox.apache.org>
>
>
> Hi,
>
> sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the
> day after tomorrow, if nobody objects.
>
> Andreas
>
> P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org

2.4.0 release?

Posted by Tim Allison <ta...@apache.org>.
All,
  Once the new PDFBox is out, we should probably kick off the 2.4.0
release.  If I'm release manager, given my schedule, that'll probably
be the week of April 18th.
  I want to fix TIKA-3711 (embedded file names), but other than that,
I don't think there are any blockers.

  WDYT?

         Best,

                 Tim

---------- Forwarded message ---------
From: Andreas Lehmkuehler <an...@lehmi.de>
Date: Thu, Apr 7, 2022 at 1:41 AM
Subject: 2.0.26 release
To: <de...@pdfbox.apache.org>


Hi,

sorry for the delay.  I'm planning to cut the 2.0.26 release next Saturday, the
day after tomorrow, if nobody objects.

Andreas

P.S.: I'm targeting a new 3.0.0 alpha release once the 2.0.26 release is out

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org