You are viewing a plain text version of this content. The canonical link for it is here.
Posted to corpora-dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2020/11/06 15:05:42 UTC
updating bug corpus
All,
With many thanks to Apache's infra, I was unbanned after a few too many
requests to Apache's JIRA/bugzilla.
I'm currently doing some post processing cleanup on the refreshed
corpus. I'm planning to remove .diff files and zero-byte files. If there
are any objections, let me know soon.
Thank you, all.
Best,
Tim
Re: updating bug corpus
Posted by Tim Allison <ta...@apache.org>.
Will repackage PDFs as I did before for the PDF enthusiasts in the crowd. :D
Have a great weekend!
Cheers,
Tim
On Fri, Nov 6, 2020 at 4:39 PM Tim Allison <ta...@apache.org> wrote:
> Files are updated under
> https://corpora.tika.apache.org/base/docs/bug_trackers/
>
> I updated the README:
> https://corpora.tika.apache.org/base/docs/bug_trackers/README.txt
>
> Let me know if you find any surprises.
>
> On Fri, Nov 6, 2020 at 10:05 AM Tim Allison <ta...@apache.org> wrote:
>
>> All,
>> With many thanks to Apache's infra, I was unbanned after a few too many
>> requests to Apache's JIRA/bugzilla.
>> I'm currently doing some post processing cleanup on the refreshed
>> corpus. I'm planning to remove .diff files and zero-byte files. If there
>> are any objections, let me know soon.
>> Thank you, all.
>>
>> Best,
>>
>> Tim
>>
>>
Re: updating bug corpus
Posted by Tim Allison <ta...@apache.org>.
Files are updated under
https://corpora.tika.apache.org/base/docs/bug_trackers/
I updated the README:
https://corpora.tika.apache.org/base/docs/bug_trackers/README.txt
Let me know if you find any surprises.
On Fri, Nov 6, 2020 at 10:05 AM Tim Allison <ta...@apache.org> wrote:
> All,
> With many thanks to Apache's infra, I was unbanned after a few too many
> requests to Apache's JIRA/bugzilla.
> I'm currently doing some post processing cleanup on the refreshed
> corpus. I'm planning to remove .diff files and zero-byte files. If there
> are any objections, let me know soon.
> Thank you, all.
>
> Best,
>
> Tim
>
>