You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2020/04/14 14:36:22 UTC
1.24.1?
All,
We've made some important bug fixes since 1.24. I recently ran the
regression tests locally. The reports are here:
https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
We're getting more exceptions with .tar on "read the rest of the block".
I'll look into this; my initial impression is that these files are not
truncated.
We're also getting more exceptions on mp4 with 0-length records, which, I
think, is a side effect of truncation.
Let me know what else you see.
Cheers,
Tim
Re: 1.24.1?
Posted by Tim Allison <ta...@apache.org>.
W00t! I'll roll the release candidate tomorrow (Thursday) unless there are
objections.
Best,
Tim
On Wed, Apr 15, 2020 at 5:30 AM Oleg Tikhonov <ol...@apache.org> wrote:
> +1. Seems ok to me.
> Thanks,
> Oleg
>
> On Wed, Apr 15, 2020, 00:18 Tim Allison <ta...@apache.org> wrote:
>
>> I fixed the hwp5 multithreading problem.
>>
>> I looked into tar files, and the handful I reviewed had a "skip the rest
>> of
>> the final block with x bytes", but there weren't actually x bytes. This
>> didn't harm extraction because this happened on the last block. Folks
>> will
>> get more exceptions, but will get the same content. I think this is ok on
>> balance given the improved safety we're getting with skip->skipFully in
>> TikaInputStream.
>>
>> We do have more exceptions in mp4, but I think that is mostly on truncated
>> files.
>>
>> In short, I _think_ we're ready to go for 1.24.1. Please take a look at
>> the reports and let me know what you think.
>>
>> Best,
>>
>> Tim
>>
>> On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:
>>
>> > All,
>> > We've made some important bug fixes since 1.24. I recently ran the
>> > regression tests locally. The reports are here:
>> >
>> >
>> >
>> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
>> >
>> > We're getting more exceptions with .tar on "read the rest of the
>> > block". I'll look into this; my initial impression is that these files
>> are
>> > not truncated.
>> >
>> > We're also getting more exceptions on mp4 with 0-length records,
>> which,
>> > I think, is a side effect of truncation.
>> >
>> > Let me know what else you see.
>> >
>> > Cheers,
>> >
>> > Tim
>> >
>>
>
Re: 1.24.1?
Posted by Oleg Tikhonov <ol...@apache.org>.
+1. Seems ok to me.
Thanks,
Oleg
On Wed, Apr 15, 2020, 00:18 Tim Allison <ta...@apache.org> wrote:
> I fixed the hwp5 multithreading problem.
>
> I looked into tar files, and the handful I reviewed had a "skip the rest of
> the final block with x bytes", but there weren't actually x bytes. This
> didn't harm extraction because this happened on the last block. Folks will
> get more exceptions, but will get the same content. I think this is ok on
> balance given the improved safety we're getting with skip->skipFully in
> TikaInputStream.
>
> We do have more exceptions in mp4, but I think that is mostly on truncated
> files.
>
> In short, I _think_ we're ready to go for 1.24.1. Please take a look at
> the reports and let me know what you think.
>
> Best,
>
> Tim
>
> On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:
>
> > All,
> > We've made some important bug fixes since 1.24. I recently ran the
> > regression tests locally. The reports are here:
> >
> >
> >
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
> >
> > We're getting more exceptions with .tar on "read the rest of the
> > block". I'll look into this; my initial impression is that these files
> are
> > not truncated.
> >
> > We're also getting more exceptions on mp4 with 0-length records, which,
> > I think, is a side effect of truncation.
> >
> > Let me know what else you see.
> >
> > Cheers,
> >
> > Tim
> >
>
Re: 1.24.1?
Posted by Tim Allison <ta...@apache.org>.
I fixed the hwp5 multithreading problem.
I looked into tar files, and the handful I reviewed had a "skip the rest of
the final block with x bytes", but there weren't actually x bytes. This
didn't harm extraction because this happened on the last block. Folks will
get more exceptions, but will get the same content. I think this is ok on
balance given the improved safety we're getting with skip->skipFully in
TikaInputStream.
We do have more exceptions in mp4, but I think that is mostly on truncated
files.
In short, I _think_ we're ready to go for 1.24.1. Please take a look at
the reports and let me know what you think.
Best,
Tim
On Tue, Apr 14, 2020 at 10:36 AM Tim Allison <ta...@apache.org> wrote:
> All,
> We've made some important bug fixes since 1.24. I recently ran the
> regression tests locally. The reports are here:
>
>
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_24_1_reports.tgz
>
> We're getting more exceptions with .tar on "read the rest of the
> block". I'll look into this; my initial impression is that these files are
> not truncated.
>
> We're also getting more exceptions on mp4 with 0-length records, which,
> I think, is a side effect of truncation.
>
> Let me know what else you see.
>
> Cheers,
>
> Tim
>