You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2020/04/14 14:36:22 UTC

1.24.1?

All,
  We've made some important bug fixes since 1.24.  I recently ran the
regression tests locally.  The reports are here:

https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz

  We're getting more exceptions with .tar on "read the rest of the block".
I'll look into this; my initial impression is that these files are not
truncated.

  We're also getting more exceptions on mp4 with 0-length records, which, I
think, is a side effect of truncation.

  Let me know what else you see.

       Cheers,

                  Tim

Re: 1.24.1?

Posted by Tim Allison <ta...@apache.org>.
W00t!  I'll roll the release candidate tomorrow (Thursday) unless there are
objections.

Best,

     Tim

On Wed, Apr 15, 2020 at 5:30 AM Oleg Tikhonov <ol...@apache.org> wrote:

> +1. Seems ok to me.
> Thanks,
> Oleg
>
> On Wed, Apr 15, 2020, 00:18 Tim Allison <ta...@apache.org> wrote:
>
>> I fixed the hwp5 multithreading problem.
>>
>> I looked into tar files, and the handful I reviewed had a "skip the rest
>> of
>> the final block with x bytes", but there weren't actually x bytes.  This
>> didn't harm extraction because this happened on the last block.  Folks
>> will
>> get more exceptions, but will get the same content.  I think this is ok on
>> balance given the improved safety we're getting with skip->skipFully in
>> TikaInputStream.
>>
>> We do have more exceptions in mp4, but I think that is mostly on truncated
>> files.
>>
>> In short, I _think_ we're ready to go for 1.24.1.  Please take a look at
>> the reports and let me know what you think.
>>
>> Best,
>>
>>          Tim
>>
>> On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:
>>
>> > All,
>> >   We've made some important bug fixes since 1.24.  I recently ran the
>> > regression tests locally.  The reports are here:
>> >
>> >
>> >
>> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
>> >
>> >   We're getting more exceptions with .tar on "read the rest of the
>> > block".  I'll look into this; my initial impression is that these files
>> are
>> > not truncated.
>> >
>> >   We're also getting more exceptions on mp4 with 0-length records,
>> which,
>> > I think, is a side effect of truncation.
>> >
>> >   Let me know what else you see.
>> >
>> >        Cheers,
>> >
>> >                   Tim
>> >
>>
>

Re: 1.24.1?

Posted by Oleg Tikhonov <ol...@apache.org>.
+1. Seems ok to me.
Thanks,
Oleg

On Wed, Apr 15, 2020, 00:18 Tim Allison <ta...@apache.org> wrote:

> I fixed the hwp5 multithreading problem.
>
> I looked into tar files, and the handful I reviewed had a "skip the rest of
> the final block with x bytes", but there weren't actually x bytes.  This
> didn't harm extraction because this happened on the last block.  Folks will
> get more exceptions, but will get the same content.  I think this is ok on
> balance given the improved safety we're getting with skip->skipFully in
> TikaInputStream.
>
> We do have more exceptions in mp4, but I think that is mostly on truncated
> files.
>
> In short, I _think_ we're ready to go for 1.24.1.  Please take a look at
> the reports and let me know what you think.
>
> Best,
>
>          Tim
>
> On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:
>
> > All,
> >   We've made some important bug fixes since 1.24.  I recently ran the
> > regression tests locally.  The reports are here:
> >
> >
> >
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
> >
> >   We're getting more exceptions with .tar on "read the rest of the
> > block".  I'll look into this; my initial impression is that these files
> are
> > not truncated.
> >
> >   We're also getting more exceptions on mp4 with 0-length records, which,
> > I think, is a side effect of truncation.
> >
> >   Let me know what else you see.
> >
> >        Cheers,
> >
> >                   Tim
> >
>

Re: 1.24.1?

Posted by Tim Allison <ta...@apache.org>.
I fixed the hwp5 multithreading problem.

I looked into tar files, and the handful I reviewed had a "skip the rest of
the final block with x bytes", but there weren't actually x bytes.  This
didn't harm extraction because this happened on the last block.  Folks will
get more exceptions, but will get the same content.  I think this is ok on
balance given the improved safety we're getting with skip->skipFully in
TikaInputStream.

We do have more exceptions in mp4, but I think that is mostly on truncated
files.

In short, I _think_ we're ready to go for 1.24.1.  Please take a look at
the reports and let me know what you think.

Best,

         Tim

On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:

> All,
>   We've made some important bug fixes since 1.24.  I recently ran the
> regression tests locally.  The reports are here:
>
>
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
>
>   We're getting more exceptions with .tar on "read the rest of the
> block".  I'll look into this; my initial impression is that these files are
> not truncated.
>
>   We're also getting more exceptions on mp4 with 0-length records, which,
> I think, is a side effect of truncation.
>
>   Let me know what else you see.
>
>        Cheers,
>
>                   Tim
>