You are viewing a plain text version of this content. The canonical link for it is here.
Posted to corpora-dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2020/11/06 15:05:42 UTC

updating bug corpus

All,
  With many thanks to Apache's infra, I was unbanned after a few too many
requests to Apache's JIRA/bugzilla.
  I'm currently doing some post processing cleanup on the refreshed
corpus.  I'm planning to remove .diff files and zero-byte files.  If there
are any objections, let me know soon.
  Thank you, all.

       Best,

            Tim

Re: updating bug corpus

Posted by Tim Allison <ta...@apache.org>.
Will repackage PDFs as I did before for the PDF enthusiasts in the crowd. :D

Have a great weekend!

Cheers,

       Tim

On Fri, Nov 6, 2020 at 4:39 PM Tim Allison <ta...@apache.org> wrote:

> Files are updated under
> https://corpora.tika.apache.org/base/docs/bug_trackers/
>
> I updated the README:
> https://corpora.tika.apache.org/base/docs/bug_trackers/README.txt
>
> Let me know if you find any surprises.
>
> On Fri, Nov 6, 2020 at 10:05 AM Tim Allison <ta...@apache.org> wrote:
>
>> All,
>>   With many thanks to Apache's infra, I was unbanned after a few too many
>> requests to Apache's JIRA/bugzilla.
>>   I'm currently doing some post processing cleanup on the refreshed
>> corpus.  I'm planning to remove .diff files and zero-byte files.  If there
>> are any objections, let me know soon.
>>   Thank you, all.
>>
>>        Best,
>>
>>             Tim
>>
>>

Re: updating bug corpus

Posted by Tim Allison <ta...@apache.org>.
Files are updated under
https://corpora.tika.apache.org/base/docs/bug_trackers/

I updated the README:
https://corpora.tika.apache.org/base/docs/bug_trackers/README.txt

Let me know if you find any surprises.

On Fri, Nov 6, 2020 at 10:05 AM Tim Allison <ta...@apache.org> wrote:

> All,
>   With many thanks to Apache's infra, I was unbanned after a few too many
> requests to Apache's JIRA/bugzilla.
>   I'm currently doing some post processing cleanup on the refreshed
> corpus.  I'm planning to remove .diff files and zero-byte files.  If there
> are any objections, let me know soon.
>   Thank you, all.
>
>        Best,
>
>             Tim
>
>