You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2016/10/05 17:56:40 UTC
Apache Tika's public regression corpus
All,
I recently blogged about some of the work we're doing with a large scale regression corpus to make Tika, POI and PDFBox more robust and to identify regressions before release. If you'd like to chip in with recommendations, requests or Hadoop/Spark clusters (why not shoot for the stars), please do!
http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/
Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 for most of our files!
Cheers,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
RE: Apache Tika's public regression corpus
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank _you_ Dominik for your tools and collaboration!
-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
Sent: Wednesday, October 5, 2016 3:33 PM
To: dev@tika.apache.org
Subject: Re: Apache Tika's public regression corpus
Great writeup, Tim, thanks for taking the time to tell people about things that we do!
Dominik.
On Wed, Oct 5, 2016 at 7:56 PM, Allison, Timothy B. <ta...@mitre.org>
wrote:
> All,
>
> I recently blogged about some of the work we're doing with a large
> scale regression corpus to make Tika, POI and PDFBox more robust and
> to identify regressions before release. If you'd like to chip in with
> recommendations, requests or Hadoop/Spark clusters (why not shoot for the stars), please do!
>
> http://openpreservation.org/blog/2016/10/04/apache-tikas-
> regression-corpus-tika-1302/
>
> Many thanks, again, to Rackspace for our vm and to Common Crawl and
> govdocs1 for most of our files!
>
> Cheers,
>
> Tim
>
Re: Apache Tika's public regression corpus
Posted by Dominik Stadler <do...@gmx.at>.
Great writeup, Tim, thanks for taking the time to tell people about things
that we do!
Dominik.
On Wed, Oct 5, 2016 at 7:56 PM, Allison, Timothy B. <ta...@mitre.org>
wrote:
> All,
>
> I recently blogged about some of the work we're doing with a large scale
> regression corpus to make Tika, POI and PDFBox more robust and to identify
> regressions before release. If you'd like to chip in with recommendations,
> requests or Hadoop/Spark clusters (why not shoot for the stars), please do!
>
> http://openpreservation.org/blog/2016/10/04/apache-tikas-
> regression-corpus-tika-1302/
>
> Many thanks, again, to Rackspace for our vm and to Common Crawl and
> govdocs1 for most of our files!
>
> Cheers,
>
> Tim
>