You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2010/01/31 18:41:05 UTC

[ANNOUNCE] Apache Tika 0.6 released

(...apologies for the cross posting...)

The Apache Lucene project is pleased to announce the release of Apache Tika
0.6. The release contents have been pushed out to the main Apache release
site and the m2 ibiblio sync, so the releases should be available as soon as
the mirrors get the syncs.

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
extracting metadata and structured text content from various documents using
existing parser libraries.

Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
be found in the changes file:

http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.6-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Maven Repositories:
http://repo1.maven.org/maven2/org/apache/tika/
http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:
http://www.apache.org/dist/lucene/tika/KEYS-0.6.txt

For more information on Apache Tika, visit the project home page:
http://lucene.apache.org/tika

-- Chris Mattmann (on behalf of the Apache Lucene community)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: [ANNOUNCE] Apache Tika 0.6 released

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 4, 2010 at 5:07 PM, Baldwin, David <Da...@bmc.com> wrote:
> Thanks, I checked the dev@, What I could not seem to find is where I can look at
> (get for review) the RC for PDFBox 1.0.0.  Is that available to me?

For now you'll need to build it from PDFBox trunk. I'll probably cut
the 1.0.0 release candidate tonight and any help in testing it will be
much appreciated. I'll post more at dev@pdfbox later today.

BR,

Jukka Zitting

RE: [ANNOUNCE] Apache Tika 0.6 released

Posted by "Baldwin, David" <Da...@bmc.com>.
Jukka, 

Thanks, I checked the dev@, What I could not seem to find is where I can look at (get for review) the RC for PDFBox 1.0.0.  Is that available to me?

-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Sent: Wednesday, February 03, 2010 12:26 AM
To: tika-user@lucene.apache.org
Subject: Re: [ANNOUNCE] Apache Tika 0.6 released

Hi,

On Tue, Feb 2, 2010 at 8:07 PM, Baldwin, David <Da...@bmc.com> wrote:
> We were hoping to see this issue
> https://issues.apache.org/jira/browse/PDFBOX-536 which is marked as fixed in
> PDFBox 1.0.0 be part of the Tika 0.6 release.

Tika 0.6 was already released, so obviously this change didn't make it
in that release. I'm quite confident that we'll get it in to Tika 0.7.

> Is PDFBox 1.0.0 close to prime time? (i.e being released)

Yes, I'm expecting a PDFBox 1.0.0 release within two weeks. See the
PDFBox dev@ list for more details.

BR,

Jukka Zitting


Re: [ANNOUNCE] Apache Tika 0.6 released

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Feb 2, 2010 at 8:07 PM, Baldwin, David <Da...@bmc.com> wrote:
> We were hoping to see this issue
> https://issues.apache.org/jira/browse/PDFBOX-536 which is marked as fixed in
> PDFBox 1.0.0 be part of the Tika 0.6 release.

Tika 0.6 was already released, so obviously this change didn't make it
in that release. I'm quite confident that we'll get it in to Tika 0.7.

> Is PDFBox 1.0.0 close to prime time? (i.e being released)

Yes, I'm expecting a PDFBox 1.0.0 release within two weeks. See the
PDFBox dev@ list for more details.

BR,

Jukka Zitting

RE: [ANNOUNCE] Apache Tika 0.6 released

Posted by "Baldwin, David" <Da...@bmc.com>.
Chris,

We were hoping to see this issue https://issues.apache.org/jira/browse/PDFBOX-536 which is marked as fixed in PDFBox 1.0.0 be part of the Tika 0.6 release.

Of course we can fix it locally, but if possible, on known issues, we prefere to use 'official' releases.

Is PDFBox 1.0.0 close to prime time? (i.e being released)

From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, February 01, 2010 11:44 PM
To: tika-user@lucene.apache.org
Subject: Re: [ANNOUNCE] Apache Tika 0.6 released

Hi David,

If you create an issue via our issue tracking system (http://issues.apache.org/jira/browse/TIKA), we can track progress towards including PDFBox 1.0.0. What issue are looking for a fix for?

If you can't wait for an official Tika release, you can always mod your local copy by modifying any of the pom.xml files for the tika-parser module and update the dependency there.

Thanks,
Chris



On 2/1/10 11:24 AM, "Baldwin, David" <Da...@bmc.com> wrote:
Is there a target Tika release that will include the next version of PDFBox?  I notice the 0.8.0-incubating is still being used as we are looking for a fix with PDFBox that inidicates that it is fixed in PDFBox 1.0.0

-David

-----Original Message-----
From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Sunday, January 31, 2010 10:41 AM
To: announce@apache.org
Cc: Lucene mailing list; tika-user@lucene.apache.org; tika-dev@lucene.apache.org
Subject: [ANNOUNCE] Apache Tika 0.6 released

(...apologies for the cross posting...)

The Apache Lucene project is pleased to announce the release of Apache Tika
0.6. The release contents have been pushed out to the main Apache release
site and the m2 ibiblio sync, so the releases should be available as soon as
the mirrors get the syncs.

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
extracting metadata and structured text content from various documents using
existing parser libraries.

Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
be found in the changes file:

http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.6-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Maven Repositories:
http://repo1.maven.org/maven2/org/apache/tika/
http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:
http://www.apache.org/dist/lucene/tika/KEYS-0.6.txt

For more information on Apache Tika, visit the project home page:
http://lucene.apache.org/tika

-- Chris Mattmann (on behalf of the Apache Lucene community)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: [ANNOUNCE] Apache Tika 0.6 released

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi David,

If you create an issue via our issue tracking system (http://issues.apache.org/jira/browse/TIKA), we can track progress towards including PDFBox 1.0.0. What issue are looking for a fix for?

If you can't wait for an official Tika release, you can always mod your local copy by modifying any of the pom.xml files for the tika-parser module and update the dependency there.

Thanks,
Chris



On 2/1/10 11:24 AM, "Baldwin, David" <Da...@bmc.com> wrote:

Is there a target Tika release that will include the next version of PDFBox?  I notice the 0.8.0-incubating is still being used as we are looking for a fix with PDFBox that inidicates that it is fixed in PDFBox 1.0.0

-David

-----Original Message-----
From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Sunday, January 31, 2010 10:41 AM
To: announce@apache.org
Cc: Lucene mailing list; tika-user@lucene.apache.org; tika-dev@lucene.apache.org
Subject: [ANNOUNCE] Apache Tika 0.6 released

(...apologies for the cross posting...)

The Apache Lucene project is pleased to announce the release of Apache Tika
0.6. The release contents have been pushed out to the main Apache release
site and the m2 ibiblio sync, so the releases should be available as soon as
the mirrors get the syncs.

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
extracting metadata and structured text content from various documents using
existing parser libraries.

Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
be found in the changes file:

http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.6-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Maven Repositories:
http://repo1.maven.org/maven2/org/apache/tika/
http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:
http://www.apache.org/dist/lucene/tika/KEYS-0.6.txt

For more information on Apache Tika, visit the project home page:
http://lucene.apache.org/tika

-- Chris Mattmann (on behalf of the Apache Lucene community)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


RE: [ANNOUNCE] Apache Tika 0.6 released

Posted by "Baldwin, David" <Da...@bmc.com>.
Is there a target Tika release that will include the next version of PDFBox?  I notice the 0.8.0-incubating is still being used as we are looking for a fix with PDFBox that inidicates that it is fixed in PDFBox 1.0.0

-David

-----Original Message-----
From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Sunday, January 31, 2010 10:41 AM
To: announce@apache.org
Cc: Lucene mailing list; tika-user@lucene.apache.org; tika-dev@lucene.apache.org
Subject: [ANNOUNCE] Apache Tika 0.6 released

(...apologies for the cross posting...)

The Apache Lucene project is pleased to announce the release of Apache Tika
0.6. The release contents have been pushed out to the main Apache release
site and the m2 ibiblio sync, so the releases should be available as soon as
the mirrors get the syncs.

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
extracting metadata and structured text content from various documents using
existing parser libraries.

Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
be found in the changes file:

http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.6-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Maven Repositories:
http://repo1.maven.org/maven2/org/apache/tika/
http://mirrors.ibiblio.org/pub/mirrors/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:
http://www.apache.org/dist/lucene/tika/KEYS-0.6.txt

For more information on Apache Tika, visit the project home page:
http://lucene.apache.org/tika

-- Chris Mattmann (on behalf of the Apache Lucene community)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++