You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Tim Allison <ta...@apache.org> on 2021/04/15 23:08:57 UTC

PDFBox 3.0.0-SNAPSHOT reports

Latest here: https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz

I haven't had a chance to look yet.  Will dig in tomorrow.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 3.0.0-SNAPSHOT reports

Posted by Tilman Hausherr <TH...@t-online.de>.
I'm done and created two issues. One difference I didn't report because 
the file fails in different ways (had 0 pages in one run and an 
exception in the other).

I haven't really understood the "new_catastrophic_exceptions_in_b" file. 
I can extract text from the files I tried. But the first file, 
bug_trackers/libvips/libvips-LINK-1721-0.pdf has problems rendering if 
memory is set to -Xmx8g. No problems when set to -Xmx4g.

Tilman

Am 16.04.2021 um 23:16 schrieb Tim Allison:
> Hi All,
>   I reran 2.0.23 with our added handling for flash files against the
> 3.0.0-SNAPSHOT that I ran yesterday.  The diffs look almost the same
> as the reports I created yesterday, so I think those are accurate:
> https://corpora.tika.apache.org/base/reports/pdfbox-2.0.23-richmedia.tgz
>
> There are a handful of files that "lose" attachments going into
> 3.0.0-SNAPSHOT because I haven't added the richmedia handling in our
> 3.0.0 branch.
>
>       Best,
>
>             Tim
>
> On Thu, Apr 15, 2021 at 7:15 PM Tim Allison <ta...@apache.org> wrote:
>> Diffs look suspiciously small...I may have to rerun the analyses.
>>
>> On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <ta...@apache.org> wrote:
>>> Latest here: https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
>>>
>>> I haven't had a chance to look yet.  Will dig in tomorrow.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 3.0.0-SNAPSHOT reports

Posted by Tim Allison <ta...@apache.org>.
Hi All,
 I reran 2.0.23 with our added handling for flash files against the
3.0.0-SNAPSHOT that I ran yesterday.  The diffs look almost the same
as the reports I created yesterday, so I think those are accurate:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.23-richmedia.tgz

There are a handful of files that "lose" attachments going into
3.0.0-SNAPSHOT because I haven't added the richmedia handling in our
3.0.0 branch.

     Best,

           Tim

On Thu, Apr 15, 2021 at 7:15 PM Tim Allison <ta...@apache.org> wrote:
>
> Diffs look suspiciously small...I may have to rerun the analyses.
>
> On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <ta...@apache.org> wrote:
> >
> > Latest here: https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
> >
> > I haven't had a chance to look yet.  Will dig in tomorrow.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 3.0.0-SNAPSHOT reports

Posted by Tim Allison <ta...@apache.org>.
Trust me... the doubt was on me, not you! :D

On Sat, Apr 17, 2021 at 5:15 AM Andreas Lehmkuehler <an...@lehmi.de>
wrote:

> Am 16.04.21 um 01:15 schrieb Tim Allison:
> > Diffs look suspiciously small...I may have to rerun the analyses.
> We simply did a good job! ;-)
>
> Andreas
>
> >
> > On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <ta...@apache.org> wrote:
> >>
> >> Latest here:
> https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
> >>
> >> I haven't had a chance to look yet.  Will dig in tomorrow.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>

Re: PDFBox 3.0.0-SNAPSHOT reports

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 16.04.21 um 01:15 schrieb Tim Allison:
> Diffs look suspiciously small...I may have to rerun the analyses.
We simply did a good job! ;-)

Andreas

> 
> On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <ta...@apache.org> wrote:
>>
>> Latest here: https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
>>
>> I haven't had a chance to look yet.  Will dig in tomorrow.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 3.0.0-SNAPSHOT reports

Posted by Tim Allison <ta...@apache.org>.
Diffs look suspiciously small...I may have to rerun the analyses.

On Thu, Apr 15, 2021 at 7:08 PM Tim Allison <ta...@apache.org> wrote:
>
> Latest here: https://corpora.tika.apache.org/base/reports/pdfbox-3.0.0-20210415_reports.tgz
>
> I haven't had a chance to look yet.  Will dig in tomorrow.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org