You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Tim Allison <ta...@apache.org> on 2021/04/06 15:22:03 UTC
3.0.0-RC1 regression tests?
Hi All,
Would it be useful for me to run regression tests comparing 2.x with
3.0.0-RC1 now or should I wait? Or, has someone already done this?
See https://issues.apache.org/jira/browse/TIKA-3347 for integration
with Tika. Many thanks!
Cheers,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: 3.0.0-RC1 regression tests?
Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Tim,
Could you please start another "B" batch + eval? I think we've fixed
most, maybe all.
Thanks
Tilman
Am 09.04.2021 um 20:11 schrieb Tim Allison:
> Apologies for my delay...
>
> Reports are here:
> https://corpora.tika.apache.org/base/reports/pdfbox-3.x-snapshot-reports.tgz
>
> I added two new reports new_catastrophic_exceptions_in_b and
> fixed_catastrophic_exceptions_in_b. The former shows which files had
> a missing or 0-byte extract in B but not A. The latter shows the
> opposite. We can get missing or 0-byte extracts when the app crashes
> (timeout or oom or other fatal crash). Given that this is
> multithreaded, all files that are currently being parsed during a
> catastrophic event will have a 0-byte or missing extract. So, there
> are likely some files in there that are ok.
>
> I ran the comparison before the fix for the infinite loop that Tilman
> made this morning. Note that that was a regular IOException because
> TikaInputStream identified it because of too many EOFs...that did not
> cause catastrophic problems.
>
> Let me know if you have questions. I haven't looked in great detail yet...
>
> There's every chance that I need to make some more changes on the Tika side. :D
>
> Cheers and happy 3.x!
>
> Best,
>
> Tim
>
> On Wed, Apr 7, 2021 at 9:23 AM Tim Allison <ta...@apache.org> wrote:
>> LOL... K. I'll build locally with the PDFBOX-5153 fix and kick it
>> off today or tomorrow.
>>
>> On Wed, Apr 7, 2021 at 1:40 AM Tilman Hausherr <TH...@t-online.de> wrote:
>>> Yes it would be useful and no I haven't done it. I'm optimistic about
>>> the results despite PDFBOX-5153.
>>>
>>> Tilman
>>>
>>> Am 06.04.2021 um 17:22 schrieb Tim Allison:
>>>> Hi All,
>>>>
>>>> Would it be useful for me to run regression tests comparing 2.x with
>>>> 3.0.0-RC1 now or should I wait? Or, has someone already done this?
>>>>
>>>> See https://issues.apache.org/jira/browse/TIKA-3347 for integration
>>>> with Tika. Many thanks!
>>>>
>>>> Cheers,
>>>>
>>>> Tim
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: 3.0.0-RC1 regression tests?
Posted by Tim Allison <ta...@apache.org>.
Apologies for my delay...
Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-3.x-snapshot-reports.tgz
I added two new reports new_catastrophic_exceptions_in_b and
fixed_catastrophic_exceptions_in_b. The former shows which files had
a missing or 0-byte extract in B but not A. The latter shows the
opposite. We can get missing or 0-byte extracts when the app crashes
(timeout or oom or other fatal crash). Given that this is
multithreaded, all files that are currently being parsed during a
catastrophic event will have a 0-byte or missing extract. So, there
are likely some files in there that are ok.
I ran the comparison before the fix for the infinite loop that Tilman
made this morning. Note that that was a regular IOException because
TikaInputStream identified it because of too many EOFs...that did not
cause catastrophic problems.
Let me know if you have questions. I haven't looked in great detail yet...
There's every chance that I need to make some more changes on the Tika side. :D
Cheers and happy 3.x!
Best,
Tim
On Wed, Apr 7, 2021 at 9:23 AM Tim Allison <ta...@apache.org> wrote:
>
> LOL... K. I'll build locally with the PDFBOX-5153 fix and kick it
> off today or tomorrow.
>
> On Wed, Apr 7, 2021 at 1:40 AM Tilman Hausherr <TH...@t-online.de> wrote:
> >
> > Yes it would be useful and no I haven't done it. I'm optimistic about
> > the results despite PDFBOX-5153.
> >
> > Tilman
> >
> > Am 06.04.2021 um 17:22 schrieb Tim Allison:
> > > Hi All,
> > >
> > > Would it be useful for me to run regression tests comparing 2.x with
> > > 3.0.0-RC1 now or should I wait? Or, has someone already done this?
> > >
> > > See https://issues.apache.org/jira/browse/TIKA-3347 for integration
> > > with Tika. Many thanks!
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > > For additional commands, e-mail: dev-help@pdfbox.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: 3.0.0-RC1 regression tests?
Posted by Tim Allison <ta...@apache.org>.
LOL... K. I'll build locally with the PDFBOX-5153 fix and kick it
off today or tomorrow.
On Wed, Apr 7, 2021 at 1:40 AM Tilman Hausherr <TH...@t-online.de> wrote:
>
> Yes it would be useful and no I haven't done it. I'm optimistic about
> the results despite PDFBOX-5153.
>
> Tilman
>
> Am 06.04.2021 um 17:22 schrieb Tim Allison:
> > Hi All,
> >
> > Would it be useful for me to run regression tests comparing 2.x with
> > 3.0.0-RC1 now or should I wait? Or, has someone already done this?
> >
> > See https://issues.apache.org/jira/browse/TIKA-3347 for integration
> > with Tika. Many thanks!
> >
> > Cheers,
> >
> > Tim
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: dev-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: 3.0.0-RC1 regression tests?
Posted by Tilman Hausherr <TH...@t-online.de>.
Yes it would be useful and no I haven't done it. I'm optimistic about
the results despite PDFBOX-5153.
Tilman
Am 06.04.2021 um 17:22 schrieb Tim Allison:
> Hi All,
>
> Would it be useful for me to run regression tests comparing 2.x with
> 3.0.0-RC1 now or should I wait? Or, has someone already done this?
>
> See https://issues.apache.org/jira/browse/TIKA-3347 for integration
> with Tika. Many thanks!
>
> Cheers,
>
> Tim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: 3.0.0-RC1 regression tests?
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
that would be nice, I guess the last comparison is quite some time ago.
Cheers
Andreas
Am 06.04.21 um 17:22 schrieb Tim Allison:
> Hi All,
>
> Would it be useful for me to run regression tests comparing 2.x with
> 3.0.0-RC1 now or should I wait? Or, has someone already done this?
>
> See https://issues.apache.org/jira/browse/TIKA-3347 for integration
> with Tika. Many thanks!
>
> Cheers,
>
> Tim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org