You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Andreas Lehmkuehler <an...@lehmi.de> on 2017/10/01 10:30:48 UTC

Re: 2.0.8?

Am 25.09.2017 um 18:39 schrieb Andreas Lehmkuehler:
> Am 25.09.2017 um 12:30 schrieb Maruan Sahyoun:
>> Hi,
>>>> Andreas Lehmkuehler <an...@lehmi.de> hat am 13. September 2017 um 20:33 
>>>> geschrieben:
>>>>
>>>>
>>>> Due to the responses I'm planning to cut the release on Monday the 25th
>>>
>>> I'm still working on a solution for PDFBOX-3934 to avoid the regression with 
>>> PDFBOX-3318. Should we postpone the release for a couple of days or a week 
>>> max? Or should I simply revert my changes?
>>
>> I'd go for postponing in order to fix that regression - what about
>> setting the date to next Monday?
> OK, let's postpone, I'm targeting next Monday. Thanks for your patience ;-)
Just a friendly reminder, I'm going to cut the release in about 30 hours from now.

Andreas

> 
> Andreas
>>
>> BR
>> Maruan
>>
>>>
>>> WDYT?
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: 2.0.8?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Sounds good.  

I kicked off the eval process yesterday, but because of a bug in our config-file reader and/or user error in modifying the config file, I wound up with 500k pdfs parsed by our EmptyParser....no results.

I restarted the eval process just now. I should have results in 6 hours.



-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
Sent: Sunday, October 1, 2017 6:31 AM
To: dev@pdfbox.apache.org
Subject: Re: 2.0.8?

Am 25.09.2017 um 18:39 schrieb Andreas Lehmkuehler:
> Am 25.09.2017 um 12:30 schrieb Maruan Sahyoun:
>> Hi,
>>>> Andreas Lehmkuehler <an...@lehmi.de> hat am 13. September 2017 um 
>>>> 20:33
>>>> geschrieben:
>>>>
>>>>
>>>> Due to the responses I'm planning to cut the release on Monday the 
>>>> 25th
>>>
>>> I'm still working on a solution for PDFBOX-3934 to avoid the 
>>> regression with PDFBOX-3318. Should we postpone the release for a 
>>> couple of days or a week max? Or should I simply revert my changes?
>>
>> I'd go for postponing in order to fix that regression - what about 
>> setting the date to next Monday?
> OK, let's postpone, I'm targeting next Monday. Thanks for your 
> patience ;-)
Just a friendly reminder, I'm going to cut the release in about 30 hours from now.

Andreas

> 
> Andreas
>>
>> BR
>> Maruan
>>
>>>
>>> WDYT?
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org



Re: 2.0.8?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.10.2017 um 23:48 schrieb Allison, Timothy B.:
>> Re 308576.pdf: the text extraction has a huge loss, but a manual check shows it is identical. However that file has the NPE from PDActionURI.getURI(), could it be that this results in an abort of text extraction?
> Same for 569017.pdf.
>
> Likely.  There are two "per file pair contents" files.  The one ending with "_ignore_exceptions.xlsx" means that results are not reported if there was an exception caught for one of the files (308576.pdf and 569017.pdf aren't in that file).  The other one "*_with_exceptions" includes both.  Based on your feedback, I should add 2 boolean cols to "*_with_exceptions.xlsx" for exceptionInA and exceptionInB?

Sorry, I had forgotten that. Yes, the two columns would be useful.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: 2.0.8?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
> Re 308576.pdf: the text extraction has a huge loss, but a manual check shows it is identical. However that file has the NPE from PDActionURI.getURI(), could it be that this results in an abort of text extraction?
Same for 569017.pdf.

Likely.  There are two "per file pair contents" files.  The one ending with "_ignore_exceptions.xlsx" means that results are not reported if there was an exception caught for one of the files (308576.pdf and 569017.pdf aren't in that file).  The other one "*_with_exceptions" includes both.  Based on your feedback, I should add 2 boolean cols to "*_with_exceptions.xlsx" for exceptionInA and exceptionInB?

Re: 2.0.8?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.10.2017 um 21:58 schrieb Allison, Timothy B.:
> Reports are here:
> http://162.242.228.174/reports/pdfbox_2_0_7_Vs_2_0_8_take2.tar.gz
>
> Looks like some new NPEs.  I'll take a look at the metadata diffs.


Re 308576.pdf: the text extraction has a huge loss, but a manual check 
shows it is identical. However that file has the NPE from 
PDActionURI.getURI(), could it be that this results in an abort of text 
extraction?
Same for 569017.pdf.

Some meta diffs are because of a bug fix. In 074031.pdf, some fields had 
"þÿ". This has been fixed and it's now empty.


Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: 2.0.8?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Sorry all for taking longer than expected!  File under "this information would have been useful..." ☹

-----Original Message-----
From: Allison, Timothy B. 
Sent: Monday, October 2, 2017 3:59 PM
To: dev@pdfbox.apache.org
Subject: RE: 2.0.8?

Reports are here:
http://162.242.228.174/reports/pdfbox_2_0_7_Vs_2_0_8_take2.tar.gz

Looks like some new NPEs.  I'll take a look at the metadata diffs.

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, October 2, 2017 9:24 AM
To: dev@pdfbox.apache.org
Subject: RE: 2.0.8?

>>>Email originates from a non-MITRE system. Use caution.<<<

Sounds good.  

I kicked off the eval process yesterday, but because of a bug in our config-file reader and/or user error in modifying the config file, I wound up with 500k pdfs parsed by our EmptyParser....no results.

I restarted the eval process just now. I should have results in 6 hours.



-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
Sent: Sunday, October 1, 2017 6:31 AM
To: dev@pdfbox.apache.org
Subject: Re: 2.0.8?

Am 25.09.2017 um 18:39 schrieb Andreas Lehmkuehler:
> Am 25.09.2017 um 12:30 schrieb Maruan Sahyoun:
>> Hi,
>>>> Andreas Lehmkuehler <an...@lehmi.de> hat am 13. September 2017 um
>>>> 20:33
>>>> geschrieben:
>>>>
>>>>
>>>> Due to the responses I'm planning to cut the release on Monday the 
>>>> 25th
>>>
>>> I'm still working on a solution for PDFBOX-3934 to avoid the 
>>> regression with PDFBOX-3318. Should we postpone the release for a 
>>> couple of days or a week max? Or should I simply revert my changes?
>>
>> I'd go for postponing in order to fix that regression - what about 
>> setting the date to next Monday?
> OK, let's postpone, I'm targeting next Monday. Thanks for your 
> patience ;-)
Just a friendly reminder, I'm going to cut the release in about 30 hours from now.

Andreas

> 
> Andreas
>>
>> BR
>> Maruan
>>
>>>
>>> WDYT?
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org



RE: 2.0.8?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Reports are here:
http://162.242.228.174/reports/pdfbox_2_0_7_Vs_2_0_8_take2.tar.gz

Looks like some new NPEs.  I'll take a look at the metadata diffs.

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, October 2, 2017 9:24 AM
To: dev@pdfbox.apache.org
Subject: RE: 2.0.8?

>>>Email originates from a non-MITRE system. Use caution.<<<

Sounds good.  

I kicked off the eval process yesterday, but because of a bug in our config-file reader and/or user error in modifying the config file, I wound up with 500k pdfs parsed by our EmptyParser....no results.

I restarted the eval process just now. I should have results in 6 hours.



-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
Sent: Sunday, October 1, 2017 6:31 AM
To: dev@pdfbox.apache.org
Subject: Re: 2.0.8?

Am 25.09.2017 um 18:39 schrieb Andreas Lehmkuehler:
> Am 25.09.2017 um 12:30 schrieb Maruan Sahyoun:
>> Hi,
>>>> Andreas Lehmkuehler <an...@lehmi.de> hat am 13. September 2017 um
>>>> 20:33
>>>> geschrieben:
>>>>
>>>>
>>>> Due to the responses I'm planning to cut the release on Monday the 
>>>> 25th
>>>
>>> I'm still working on a solution for PDFBOX-3934 to avoid the 
>>> regression with PDFBOX-3318. Should we postpone the release for a 
>>> couple of days or a week max? Or should I simply revert my changes?
>>
>> I'd go for postponing in order to fix that regression - what about 
>> setting the date to next Monday?
> OK, let's postpone, I'm targeting next Monday. Thanks for your 
> patience ;-)
Just a friendly reminder, I'm going to cut the release in about 30 hours from now.

Andreas

> 
> Andreas
>>
>> BR
>> Maruan
>>
>>>
>>> WDYT?
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For 
> additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org